Building the Spanish-Croatian Parallel Corpus

Mikelenić, Bojana; Tadić, Marko

Pregled bibliografske jedinice broj: 1062871

Building the Spanish-Croatian Parallel Corpus

Mikelenić, Bojana; Tadić, Marko

Building the Spanish-Croatian Parallel Corpus // Proceedings of The 12th Language Resources and Evaluation Conference / Calzolari, Nicoletta ; Béchet, Frédéric ; Blache, Philippe ; Choukri, Khalid ; Cieri, Christopher ; Declerck, Thierry ; Goggi, Sara ; Isahara, Hitoshi ; Maegaard, Bente ; Mariani, Joseph ; Mazo, Hélène ; Moreno, Asuncion ; Odijk, Jan ; Piperidis, Stelios (ur.).
Marseille: European Language Resources Association (ELRA), 2020. str. 3932-3936 (poster, međunarodna recenzija, cjeloviti rad (in extenso), ostalo)

CROSBI ID: 1062871 Za ispravke kontaktirajte CROSBI podršku putem web obrasca

Naslov
Building the Spanish-Croatian Parallel Corpus

Autori
Mikelenić, Bojana ; Tadić, Marko

Vrsta, podvrsta i kategorija rada
Radovi u zbornicima skupova, cjeloviti rad (in extenso), ostalo

Izvornik
Proceedings of The 12th Language Resources and Evaluation Conference / Calzolari, Nicoletta ; Béchet, Frédéric ; Blache, Philippe ; Choukri, Khalid ; Cieri, Christopher ; Declerck, Thierry ; Goggi, Sara ; Isahara, Hitoshi ; Maegaard, Bente ; Mariani, Joseph ; Mazo, Hélène ; Moreno, Asuncion ; Odijk, Jan ; Piperidis, Stelios - Marseille : European Language Resources Association (ELRA), 2020, 3932-3936

ISBN
979-10-95546-34-4

Skup
The 12th Language Resources and Evaluation Conference (LREC2020)

Mjesto i datum
Marseille, Francuska, 11.05.2020. - 16.05.2020

Vrsta sudjelovanja
Poster

Vrsta recenzije
Međunarodna recenzija

Ključne riječi
written corpus ; parallel corpus ; Spanish ; Croatian

Sažetak
This paper describes the building of the first Spanish-Croatian unidirectional parallel corpus, which has been constructed at the Faculty of Humanities and Social Sciences of the University of Zagreb. The corpus is comprised of eleven Spanish novels and their translations to Croatian done by six different professional translators. All the texts were published between 1999 and 2012. The corpus has more than 2 Mw, with approximately 1 Mw for each language. It was automatically sentence segmented and aligned, as well as manually post-corrected, and contains 71, 778 translation units. In order to protect the copyright and to make the corpus available under permissive CC-BY licence, the aligned translation units are shuffled. This limits the usability of the corpus for research of language units at sentence and lower language levels only. There are two versions of the corpus in TMX format that will be available for download through META-SHARE and CLARIN ERIC infrastructure. The former contains plain TMX, while the latter is lemmatised and POS-tagged and stored in the aTMX format.

Izvorni jezik
Engleski

Znanstvena područja
Informacijske i komunikacijske znanosti, Filologija

Napomena
Zbog pandemije krunastoga virusa, kongres nije
održan, ali je zbornik
radova objavljen 2020-05-15.

POVEZANOST RADA

Ustanove:
Filozofski fakultet, Zagreb

Profili:

Bojana Mikelenić (autor)

Marko Tadić (autor)

Poveznice na cjeloviti tekst rada:

www.lrec-conf.org

Citiraj ovu publikaciju:

Časopis indeksira:

Web of Science Core Collection (WoSCC)

Conference Proceedings Citation Index - Science (CPCI-S)
Conference Proceedings Citation Index - Social Sciences & Humanities (CPCI-SSH)

Pregled bibliografske jedinice broj: 1062871

Building the Spanish-Croatian Parallel Corpus

Poveznice na cjeloviti tekst rada:

Citiraj ovu publikaciju:

Časopis indeksira:

Podijeli: