Building the Macedonian-Croatian Parallel Corpus

Cebović, Ines; Tadić, Marko

Pregled bibliografske jedinice broj: 907490

Building the Macedonian-Croatian Parallel Corpus

Cebović, Ines; Tadić, Marko

Building the Macedonian-Croatian Parallel Corpus // Proceedings of the Tenth International Conference on Language Resources and Evaluation (LREC 2016) / Calzolari, Nicoletta ; Choukri, Khalid ; Declerck, Thierry ; Goggi, Sara ; Grobelnik, Marko ; Maegaard, Bente ; Mariani, Joseph ; Mazo, Helene ; Moreno, Asuncion ; Odijk, Jan ; Piperidis, Stelios (ur.).
Portorož : Pariz: European Language Resources Association (ELRA), 2016. str. 4241-4244 (poster, međunarodna recenzija, cjeloviti rad (in extenso), znanstveni)

CROSBI ID: 907490 Za ispravke kontaktirajte CROSBI podršku putem web obrasca

Naslov
Building the Macedonian-Croatian Parallel Corpus

Autori
Cebović, Ines ; Tadić, Marko

Vrsta, podvrsta i kategorija rada
Radovi u zbornicima skupova, cjeloviti rad (in extenso), znanstveni

Izvornik
Proceedings of the Tenth International Conference on Language Resources and Evaluation (LREC 2016) / Calzolari, Nicoletta ; Choukri, Khalid ; Declerck, Thierry ; Goggi, Sara ; Grobelnik, Marko ; Maegaard, Bente ; Mariani, Joseph ; Mazo, Helene ; Moreno, Asuncion ; Odijk, Jan ; Piperidis, Stelios - Portorož : Pariz : European Language Resources Association (ELRA), 2016, 4241-4244

ISBN
978-2-9517408-9-1

Skup
Tenth International Conference on Language Resources and Evaluation (LREC2016)

Mjesto i datum
Portorož, Slovenija, 23.05.2016. - 28.05.2016

Vrsta sudjelovanja
Poster

Vrsta recenzije
Međunarodna recenzija

Ključne riječi
written corpus ; parallel corpus ; Macedonian ; Croatian

Sažetak
In this paper we present the newly created parallel corpus of two under-resourced languages, namely, Macedonian-Croatian Parallel Corpus (mk-hr_pcorp) that has been collected during 2015 at the Faculty of Humanities and Social Sciences, University of Zagreb. The mk- hr_pcorp is a unidirectional (mk -> hr) parallel corpus composed of synchronic fictional prose texts received already in digital form with over 500 Kw in each language. The corpus was sentence segmented and provides 39, 735 aligned sentences. The alignment was done automatically and then post-corrected manually. The alignments order was shuffled and this enabled the corpus to be available under CC-BY license through META-SHARE. However, this prevents the research in language units over the sentence level.

Izvorni jezik
Engleski

Znanstvena područja
Informacijske i komunikacijske znanosti, Filologija, Interdisciplinarne humanističke znanosti

POVEZANOST RADA

Projekti:
HR4EU

Ustanove:
Filozofski fakultet, Zagreb

Profili:

Marko Tadić (autor)

Poveznice na cjeloviti tekst rada:

www.lrec-conf.org www.lrec-conf.org

CROSBI Hrvatska znanstvena bibliografija

Pregled bibliografske jedinice broj: 907490

Building the Macedonian-Croatian Parallel Corpus

Poveznice na cjeloviti tekst rada:

Citiraj ovu publikaciju:

Pregled bibliografske jedinice broj: 907490

Building the Macedonian-Croatian Parallel Corpus

Poveznice na cjeloviti tekst rada:

Citiraj ovu publikaciju:

Podijeli: