The MARCELL Legislative Corpus

Váradi, Tamás; Koeva, Svetla; Yamalov, Martin; Tadić, Marko; Sass, Bálint; Nitoń, Bartłomiej; Ogrodniczuk, Maciej; Pęzik, Piotr; Barbu Mititelu, Verginica; Ion, Radu; Irimia, Elena; Mitrofan, Maria; Păiș, Vasile; Tufiș, Dan; Garabík, Radovan; Krek, Simon; Repar, Andraz; Rihtar, Matjaž; Brank, Janez

Pregled bibliografske jedinice broj: 1062869

The MARCELL Legislative Corpus

Váradi, Tamás; Koeva, Svetla; Yamalov, Martin; Tadić, Marko; Sass, Bálint; Nitoń, Bartłomiej; Ogrodniczuk, Maciej; Pęzik, Piotr; Barbu Mititelu, Verginica; Ion, Radu et al.

The MARCELL Legislative Corpus // Proceedings of The 12th Language Resources and Evaluation Conference / Calzolari, Nicoletta ; Béchet, Frédéric ; Blache, Philippe ; Choukri, Khalid ; Cieri, Christopher ; Declerck, Thierry ; Goggi, Sara ; Isahara, Hitoshi ; Maegaard, Bente ; Mariani, Joseph ; Mazo, Hélène ; Moreno, Asuncion ; Odijk, Jan ; Piperidis, Stelios (ur.).
Marseille: European Language Resources Association (ELRA), 2020. str. 3761-3768 (predavanje, međunarodna recenzija, cjeloviti rad (in extenso), ostalo)

CROSBI ID: 1062869 Za ispravke kontaktirajte CROSBI podršku putem web obrasca

Naslov
The MARCELL Legislative Corpus

Autori
Váradi, Tamás ; Koeva, Svetla ; Yamalov, Martin ; Tadić, Marko ; Sass, Bálint ; Nitoń, Bartłomiej ; Ogrodniczuk, Maciej ; Pęzik, Piotr ; Barbu Mititelu, Verginica ; Ion, Radu ; Irimia, Elena ; Mitrofan, Maria ; Păiș, Vasile ; Tufiș, Dan ; Garabík, Radovan ; Krek, Simon ; Repar, Andraz ; Rihtar, Matjaž ; Brank, Janez

Vrsta, podvrsta i kategorija rada
Radovi u zbornicima skupova, cjeloviti rad (in extenso), ostalo

Izvornik
Proceedings of The 12th Language Resources and Evaluation Conference / Calzolari, Nicoletta ; Béchet, Frédéric ; Blache, Philippe ; Choukri, Khalid ; Cieri, Christopher ; Declerck, Thierry ; Goggi, Sara ; Isahara, Hitoshi ; Maegaard, Bente ; Mariani, Joseph ; Mazo, Hélène ; Moreno, Asuncion ; Odijk, Jan ; Piperidis, Stelios - Marseille : European Language Resources Association (ELRA), 2020, 3761-3768

Skup
The 12th Language Resources and Evaluation Conference (LREC2020)

Mjesto i datum
Marseille, Francuska, 11.05.2020. - 16.05.2020

Vrsta sudjelovanja
Predavanje

Vrsta recenzije
Međunarodna recenzija

Ključne riječi
law corpus ; comparable corpus ; under-resourced languages

Sažetak
This article presents the current outcomes of the MARCELL CEF Telecom project aiming to collect and deeply annotate a large comparable corpus of legal documents. The MARCELL corpus includes 7 monolingual sub-corpora (Bulgarian, Croatian, Hungarian, Polish, Romanian, Slovak and Slovenian) containing the total body of respective national legislative documents. These sub-corpora are automatically sentence split, tokenized, lemmatized and morphologically and syntactically annotated. The monolingual sub- corpora are complemented by a thematically related parallel corpus (Croatian-English). The metadata and the annotations are uniformly provided for each language specific sub-corpus. Besides the standard morphosyntactic analysis plus named entity and dependency and/or noun phrase annotation, the corpus is enriched with the IATE and EuroVoc labels. The file format is CoNLL-U Plus Format, containing the ten columns specific to the CoNLL-U format and four extra columns specific to our corpora. The MARCELL corpora represent a rich and valuable source for further studies and developments in machine learning, cross-lingual terminological data extraction and classification.

Izvorni jezik
Engleski

Znanstvena područja
Informacijske i komunikacijske znanosti, Filologija

Napomena
Zbog pandemije krunastoga virusa, kongres nije
održan, ali je zbornik radova objavljen 2020-05-
15.

POVEZANOST RADA

Ustanove:
Filozofski fakultet, Zagreb

Profili:

Marko Tadić (autor)

Poveznice na cjeloviti tekst rada:

www.lrec-conf.org

Citiraj ovu publikaciju:

Časopis indeksira:

Web of Science Core Collection (WoSCC)

Conference Proceedings Citation Index - Science (CPCI-S)
Conference Proceedings Citation Index - Social Sciences & Humanities (CPCI-SSH)

Pregled bibliografske jedinice broj: 1062869

The MARCELL Legislative Corpus

Poveznice na cjeloviti tekst rada:

Citiraj ovu publikaciju:

Časopis indeksira:

Podijeli: