Pretražite po imenu i prezimenu autora, mentora, urednika, prevoditelja

Napredna pretraga

Pregled bibliografske jedinice broj: 1062869

The MARCELL Legislative Corpus


Váradi, Tamás; Koeva, Svetla; Yamalov, Martin; Tadić, Marko; Sass, Bálint; Nitoń, Bartłomiej; Ogrodniczuk, Maciej; Pęzik, Piotr; Barbu Mititelu, Verginica; Ion, Radu et al.
The MARCELL Legislative Corpus // Proceedings of The 12th Language Resources and Evaluation Conference / Calzolari, Nicoletta ; Béchet, Frédéric ; Blache, Philippe ; Choukri, Khalid ; Cieri, Christopher ; Declerck, Thierry ; Goggi, Sara ; Isahara, Hitoshi ; Maegaard, Bente ; Mariani, Joseph ; Mazo, Hélène ; Moreno, Asuncion ; Odijk, Jan ; Piperidis, Stelios (ur.).
Marseille: European Language Resources Association (ELRA), 2020. str. 3761-3768 (predavanje, međunarodna recenzija, cjeloviti rad (in extenso), ostalo)


CROSBI ID: 1062869 Za ispravke kontaktirajte CROSBI podršku putem web obrasca

Naslov
The MARCELL Legislative Corpus

Autori
Váradi, Tamás ; Koeva, Svetla ; Yamalov, Martin ; Tadić, Marko ; Sass, Bálint ; Nitoń, Bartłomiej ; Ogrodniczuk, Maciej ; Pęzik, Piotr ; Barbu Mititelu, Verginica ; Ion, Radu ; Irimia, Elena ; Mitrofan, Maria ; Păiș, Vasile ; Tufiș, Dan ; Garabík, Radovan ; Krek, Simon ; Repar, Andraz ; Rihtar, Matjaž ; Brank, Janez

Vrsta, podvrsta i kategorija rada
Radovi u zbornicima skupova, cjeloviti rad (in extenso), ostalo

Izvornik
Proceedings of The 12th Language Resources and Evaluation Conference / Calzolari, Nicoletta ; Béchet, Frédéric ; Blache, Philippe ; Choukri, Khalid ; Cieri, Christopher ; Declerck, Thierry ; Goggi, Sara ; Isahara, Hitoshi ; Maegaard, Bente ; Mariani, Joseph ; Mazo, Hélène ; Moreno, Asuncion ; Odijk, Jan ; Piperidis, Stelios - Marseille : European Language Resources Association (ELRA), 2020, 3761-3768

Skup
The 12th Language Resources and Evaluation Conference (LREC2020)

Mjesto i datum
Marseille, Francuska, 11.05.2020. - 16.05.2020

Vrsta sudjelovanja
Predavanje

Vrsta recenzije
Međunarodna recenzija

Ključne riječi
law corpus ; comparable corpus ; under-resourced languages

Sažetak
This article presents the current outcomes of the MARCELL CEF Telecom project aiming to collect and deeply annotate a large comparable corpus of legal documents. The MARCELL corpus includes 7 monolingual sub-corpora (Bulgarian, Croatian, Hungarian, Polish, Romanian, Slovak and Slovenian) containing the total body of respective national legislative documents. These sub-corpora are automatically sentence split, tokenized, lemmatized and morphologically and syntactically annotated. The monolingual sub- corpora are complemented by a thematically related parallel corpus (Croatian-English). The metadata and the annotations are uniformly provided for each language specific sub-corpus. Besides the standard morphosyntactic analysis plus named entity and dependency and/or noun phrase annotation, the corpus is enriched with the IATE and EuroVoc labels. The file format is CoNLL-U Plus Format, containing the ten columns specific to the CoNLL-U format and four extra columns specific to our corpora. The MARCELL corpora represent a rich and valuable source for further studies and developments in machine learning, cross-lingual terminological data extraction and classification.

Izvorni jezik
Engleski

Znanstvena područja
Informacijske i komunikacijske znanosti, Filologija

Napomena
Zbog pandemije krunastoga virusa, kongres nije
održan, ali je zbornik radova objavljen 2020-05-
15.



POVEZANOST RADA


Ustanove:
Filozofski fakultet, Zagreb

Profili:

Avatar Url Marko Tadić (autor)

Poveznice na cjeloviti tekst rada:

www.lrec-conf.org

Citiraj ovu publikaciju:

Váradi, Tamás; Koeva, Svetla; Yamalov, Martin; Tadić, Marko; Sass, Bálint; Nitoń, Bartłomiej; Ogrodniczuk, Maciej; Pęzik, Piotr; Barbu Mititelu, Verginica; Ion, Radu et al.
The MARCELL Legislative Corpus // Proceedings of The 12th Language Resources and Evaluation Conference / Calzolari, Nicoletta ; Béchet, Frédéric ; Blache, Philippe ; Choukri, Khalid ; Cieri, Christopher ; Declerck, Thierry ; Goggi, Sara ; Isahara, Hitoshi ; Maegaard, Bente ; Mariani, Joseph ; Mazo, Hélène ; Moreno, Asuncion ; Odijk, Jan ; Piperidis, Stelios (ur.).
Marseille: European Language Resources Association (ELRA), 2020. str. 3761-3768 (predavanje, međunarodna recenzija, cjeloviti rad (in extenso), ostalo)
Váradi, T., Koeva, S., Yamalov, M., Tadić, M., Sass, B., Nitoń, B., Ogrodniczuk, M., Pęzik, P., Barbu Mititelu, V. & Ion, R. (2020) The MARCELL Legislative Corpus. U: Calzolari, N., Béchet, F., Blache, P., Choukri, K., Cieri, C., Declerck, T., Goggi, S., Isahara, H., Maegaard, B., Mariani, J., Mazo, H., Moreno, A., Odijk, J. & Piperidis, S. (ur.)Proceedings of The 12th Language Resources and Evaluation Conference.
@article{article, author = {V\'{a}radi, Tam\'{a}s and Koeva, Svetla and Yamalov, Martin and Tadi\'{c}, Marko and Sass, B\'{a}lint and Nito\'{n}, Bart\lomiej and Ogrodniczuk, Maciej and P\k{e}zik, Piotr and Barbu Mititelu, Verginica and Ion, Radu and Irimia, Elena and Mitrofan, Maria and P\u{a}iș, Vasile and Tufiș, Dan and Garab\'{\i}k, Radovan and Krek, Simon and Repar, Andraz and Rihtar, Matja\v{z} and Brank, Janez}, year = {2020}, pages = {3761-3768}, keywords = {law corpus, comparable corpus, under-resourced languages}, title = {The MARCELL Legislative Corpus}, keyword = {law corpus, comparable corpus, under-resourced languages}, publisher = {European Language Resources Association (ELRA)}, publisherplace = {Marseille, Francuska} }
@article{article, author = {V\'{a}radi, Tam\'{a}s and Koeva, Svetla and Yamalov, Martin and Tadi\'{c}, Marko and Sass, B\'{a}lint and Nito\'{n}, Bart\lomiej and Ogrodniczuk, Maciej and P\k{e}zik, Piotr and Barbu Mititelu, Verginica and Ion, Radu and Irimia, Elena and Mitrofan, Maria and P\u{a}iș, Vasile and Tufiș, Dan and Garab\'{\i}k, Radovan and Krek, Simon and Repar, Andraz and Rihtar, Matja\v{z} and Brank, Janez}, year = {2020}, pages = {3761-3768}, keywords = {law corpus, comparable corpus, under-resourced languages}, title = {The MARCELL Legislative Corpus}, keyword = {law corpus, comparable corpus, under-resourced languages}, publisher = {European Language Resources Association (ELRA)}, publisherplace = {Marseille, Francuska} }

Časopis indeksira:


  • Web of Science Core Collection (WoSCC)
    • Conference Proceedings Citation Index - Science (CPCI-S)
    • Conference Proceedings Citation Index - Social Sciences & Humanities (CPCI-SSH)





Contrast
Increase Font
Decrease Font
Dyslexic Font