The MARCELL Legislative Corpus (CROSBI ID 690826)
Prilog sa skupa u zborniku | ostalo | međunarodna recenzija
Podaci o odgovornosti
Váradi, Tamás ; Koeva, Svetla ; Yamalov, Martin ; Tadić, Marko ; Sass, Bálint ; Nitoń, Bartłomiej ; Ogrodniczuk, Maciej ; Pęzik, Piotr ; Barbu Mititelu, Verginica ; Ion, Radu ; Irimia, Elena ; Mitrofan, Maria ; Păiș, Vasile ; Tufiș, Dan ; Garabík, Radovan ; Krek, Simon ; Repar, Andraz ; Rihtar, Matjaž ; Brank, Janez
engleski
The MARCELL Legislative Corpus
This article presents the current outcomes of the MARCELL CEF Telecom project aiming to collect and deeply annotate a large comparable corpus of legal documents. The MARCELL corpus includes 7 monolingual sub-corpora (Bulgarian, Croatian, Hungarian, Polish, Romanian, Slovak and Slovenian) containing the total body of respective national legislative documents. These sub-corpora are automatically sentence split, tokenized, lemmatized and morphologically and syntactically annotated. The monolingual sub- corpora are complemented by a thematically related parallel corpus (Croatian-English). The metadata and the annotations are uniformly provided for each language specific sub-corpus. Besides the standard morphosyntactic analysis plus named entity and dependency and/or noun phrase annotation, the corpus is enriched with the IATE and EuroVoc labels. The file format is CoNLL-U Plus Format, containing the ten columns specific to the CoNLL-U format and four extra columns specific to our corpora. The MARCELL corpora represent a rich and valuable source for further studies and developments in machine learning, cross-lingual terminological data extraction and classification.
law corpus ; comparable corpus ; under-resourced languages
Zbog pandemije krunastoga virusa, kongres nije održan, ali je zbornik radova objavljen 2020-05- 15.
nije evidentirano
nije evidentirano
nije evidentirano
nije evidentirano
nije evidentirano
Podaci o prilogu
3761-3768.
2020.
objavljeno
Podaci o matičnoj publikaciji
Calzolari, Nicoletta ; Béchet, Frédéric ; Blache, Philippe ; Choukri, Khalid ; Cieri, Christopher ; Declerck, Thierry ; Goggi, Sara ; Isahara, Hitoshi ; Maegaard, Bente ; Mariani, Joseph ; Mazo, Hélène ; Moreno, Asuncion ; Odijk, Jan ; Piperidis, Stelios
Marseille: European Language Resources Association (ELRA)
Podaci o skupu
The 12th Language Resources and Evaluation Conference (LREC2020)
predavanje
11.05.2020-16.05.2020
Marseille, Francuska
Povezanost rada
Filologija, Informacijske i komunikacijske znanosti