Building the Macedonian-Croatian Parallel Corpus

Cebović, Ines; Tadić, Marko

izvor podataka: crosbi ✓

Building the Macedonian-Croatian Parallel Corpus (CROSBI ID 655324)

Prilog sa skupa u zborniku | izvorni znanstveni rad | međunarodna recenzija

Cebović, Ines ; Tadić, Marko Building the Macedonian-Croatian Parallel Corpus // Proceedings of the Tenth International Conference on Language Resources and Evaluation (LREC 2016) / Calzolari, Nicoletta ; Choukri, Khalid ; Declerck, Thierry et al. (ur.). Portorož : Pariz: European Language Resources Association (ELRA), 2016. str. 4241-4244

Podaci o odgovornosti

Autori

Cebović, Ines ; Tadić, Marko

Osnovni podaci na izvornom jeziku
Osnovni podaci na ostalim jezicima

Jezik

engleski

Naslov

Building the Macedonian-Croatian Parallel Corpus

Sažetak

In this paper we present the newly created parallel corpus of two under-resourced languages, namely, Macedonian-Croatian Parallel Corpus (mk-hr_pcorp) that has been collected during 2015 at the Faculty of Humanities and Social Sciences, University of Zagreb. The mk- hr_pcorp is a unidirectional (mk -> hr) parallel corpus composed of synchronic fictional prose texts received already in digital form with over 500 Kw in each language. The corpus was sentence segmented and provides 39, 735 aligned sentences. The alignment was done automatically and then post-corrected manually. The alignments order was shuffled and this enabled the corpus to be available under CC-BY license through META-SHARE. However, this prevents the research in language units over the sentence level.

Ključne riječi

written corpus ; parallel corpus ; Macedonian ; Croatian

Napomena

nije evidentirano

Jezik

nije evidentirano

Naslov

nije evidentirano

Sažetak

nije evidentirano

Ključne riječi

nije evidentirano

Napomena

nije evidentirano

Podaci o prilogu

Stranice rada

4241-4244.

Godina izdavanja

2016.

Status objave rada

objavljeno

Podaci o matičnoj publikaciji

Naslov

Proceedings of the Tenth International Conference on Language Resources and Evaluation (LREC 2016)

Urednici

Calzolari, Nicoletta ; Choukri, Khalid ; Declerck, Thierry ; Goggi, Sara ; Grobelnik, Marko ; Maegaard, Bente ; Mariani, Joseph ; Mazo, Helene ; Moreno, Asuncion ; Odijk, Jan ; Piperidis, Stelios

Izdavač

Portorož : Pariz: European Language Resources Association (ELRA)

ISBN

978-2-9517408-9-1

Podaci o skupu

Skup

Tenth International Conference on Language Resources and Evaluation (LREC 2016)

Vrsta sudjelovanja

poster

Datum održavanja skupa

23.05.2016-28.05.2016

Mjesto održavanja skupa

Portorož, Slovenija

Povezanost rada

Povezane osobe

Marko Tadić (autor/i)

Povezane ustanove

Filozofski fakultet u Zagrebu (130) (autorova ustanova)

Područje

Filologija, Informacijske i komunikacijske znanosti, Interdisciplinarne humanističke znanosti

Poveznice

lrec-conf.org