Pregled bibliografske jedinice broj: 1062871
Building the Spanish-Croatian Parallel Corpus
Building the Spanish-Croatian Parallel Corpus // Proceedings of The 12th Language Resources and Evaluation Conference / Calzolari, Nicoletta ; Béchet, Frédéric ; Blache, Philippe ; Choukri, Khalid ; Cieri, Christopher ; Declerck, Thierry ; Goggi, Sara ; Isahara, Hitoshi ; Maegaard, Bente ; Mariani, Joseph ; Mazo, Hélène ; Moreno, Asuncion ; Odijk, Jan ; Piperidis, Stelios (ur.).
Marseille: European Language Resources Association (ELRA), 2020. str. 3932-3936 (poster, međunarodna recenzija, cjeloviti rad (in extenso), ostalo)
CROSBI ID: 1062871 Za ispravke kontaktirajte CROSBI podršku putem web obrasca
Naslov
Building the Spanish-Croatian Parallel Corpus
Autori
Mikelenić, Bojana ; Tadić, Marko
Vrsta, podvrsta i kategorija rada
Radovi u zbornicima skupova, cjeloviti rad (in extenso), ostalo
Izvornik
Proceedings of The 12th Language Resources and Evaluation Conference
/ Calzolari, Nicoletta ; Béchet, Frédéric ; Blache, Philippe ; Choukri, Khalid ; Cieri, Christopher ; Declerck, Thierry ; Goggi, Sara ; Isahara, Hitoshi ; Maegaard, Bente ; Mariani, Joseph ; Mazo, Hélène ; Moreno, Asuncion ; Odijk, Jan ; Piperidis, Stelios - Marseille : European Language Resources Association (ELRA), 2020, 3932-3936
ISBN
979-10-95546-34-4
Skup
The 12th Language Resources and Evaluation Conference (LREC2020)
Mjesto i datum
Marseille, Francuska, 11.05.2020. - 16.05.2020
Vrsta sudjelovanja
Poster
Vrsta recenzije
Međunarodna recenzija
Ključne riječi
written corpus ; parallel corpus ; Spanish ; Croatian
Sažetak
This paper describes the building of the first Spanish-Croatian unidirectional parallel corpus, which has been constructed at the Faculty of Humanities and Social Sciences of the University of Zagreb. The corpus is comprised of eleven Spanish novels and their translations to Croatian done by six different professional translators. All the texts were published between 1999 and 2012. The corpus has more than 2 Mw, with approximately 1 Mw for each language. It was automatically sentence segmented and aligned, as well as manually post-corrected, and contains 71, 778 translation units. In order to protect the copyright and to make the corpus available under permissive CC-BY licence, the aligned translation units are shuffled. This limits the usability of the corpus for research of language units at sentence and lower language levels only. There are two versions of the corpus in TMX format that will be available for download through META-SHARE and CLARIN ERIC infrastructure. The former contains plain TMX, while the latter is lemmatised and POS-tagged and stored in the aTMX format.
Izvorni jezik
Engleski
Znanstvena područja
Informacijske i komunikacijske znanosti, Filologija
Napomena
Zbog pandemije krunastoga virusa, kongres nije
održan, ali je zbornik
radova objavljen 2020-05-15.
POVEZANOST RADA
Ustanove:
Filozofski fakultet, Zagreb
Citiraj ovu publikaciju:
Časopis indeksira:
- Web of Science Core Collection (WoSCC)
- Conference Proceedings Citation Index - Science (CPCI-S)
- Conference Proceedings Citation Index - Social Sciences & Humanities (CPCI-SSH)