Nalazite se na CroRIS probnoj okolini. Ovdje evidentirani podaci neće biti pohranjeni u Informacijskom sustavu znanosti RH. Ako je ovo greška, CroRIS produkcijskoj okolini moguće je pristupi putem poveznice www.croris.hr
izvor podataka: crosbi

The MARCELL Legislative Corpus (CROSBI ID 690826)

Prilog sa skupa u zborniku | ostalo | međunarodna recenzija

Váradi, Tamás ; Koeva, Svetla ; Yamalov, Martin ; Tadić, Marko ; Sass, Bálint ; Nitoń, Bartłomiej ; Ogrodniczuk, Maciej ; Pęzik, Piotr ; Barbu Mititelu, Verginica ; Ion, Radu et al. The MARCELL Legislative Corpus // Proceedings of The 12th Language Resources and Evaluation Conference / Calzolari, Nicoletta ; Béchet, Frédéric ; Blache, Philippe et al. (ur.). Marseille: European Language Resources Association (ELRA), 2020. str. 3761-3768

Podaci o odgovornosti

Váradi, Tamás ; Koeva, Svetla ; Yamalov, Martin ; Tadić, Marko ; Sass, Bálint ; Nitoń, Bartłomiej ; Ogrodniczuk, Maciej ; Pęzik, Piotr ; Barbu Mititelu, Verginica ; Ion, Radu ; Irimia, Elena ; Mitrofan, Maria ; Păiș, Vasile ; Tufiș, Dan ; Garabík, Radovan ; Krek, Simon ; Repar, Andraz ; Rihtar, Matjaž ; Brank, Janez

engleski

The MARCELL Legislative Corpus

This article presents the current outcomes of the MARCELL CEF Telecom project aiming to collect and deeply annotate a large comparable corpus of legal documents. The MARCELL corpus includes 7 monolingual sub-corpora (Bulgarian, Croatian, Hungarian, Polish, Romanian, Slovak and Slovenian) containing the total body of respective national legislative documents. These sub-corpora are automatically sentence split, tokenized, lemmatized and morphologically and syntactically annotated. The monolingual sub- corpora are complemented by a thematically related parallel corpus (Croatian-English). The metadata and the annotations are uniformly provided for each language specific sub-corpus. Besides the standard morphosyntactic analysis plus named entity and dependency and/or noun phrase annotation, the corpus is enriched with the IATE and EuroVoc labels. The file format is CoNLL-U Plus Format, containing the ten columns specific to the CoNLL-U format and four extra columns specific to our corpora. The MARCELL corpora represent a rich and valuable source for further studies and developments in machine learning, cross-lingual terminological data extraction and classification.

law corpus ; comparable corpus ; under-resourced languages

Zbog pandemije krunastoga virusa, kongres nije održan, ali je zbornik radova objavljen 2020-05- 15.

nije evidentirano

nije evidentirano

nije evidentirano

nije evidentirano

nije evidentirano

Podaci o prilogu

3761-3768.

2020.

objavljeno

Podaci o matičnoj publikaciji

Calzolari, Nicoletta ; Béchet, Frédéric ; Blache, Philippe ; Choukri, Khalid ; Cieri, Christopher ; Declerck, Thierry ; Goggi, Sara ; Isahara, Hitoshi ; Maegaard, Bente ; Mariani, Joseph ; Mazo, Hélène ; Moreno, Asuncion ; Odijk, Jan ; Piperidis, Stelios

Marseille: European Language Resources Association (ELRA)

Podaci o skupu

The 12th Language Resources and Evaluation Conference (LREC2020)

predavanje

11.05.2020-16.05.2020

Marseille, Francuska

Povezanost rada

Filologija, Informacijske i komunikacijske znanosti

Poveznice