Napredna pretraga

Pregled bibliografske jedinice broj: 174994

Making Monolingual Corpora Comparable: a Case Study of Bulgarian and Croatian


Bekavac, Božo; Osenova, Petya; Simov, Kiril; Tadić, Marko
Making Monolingual Corpora Comparable: a Case Study of Bulgarian and Croatian // Fourth International Conference on Language Resources and Evaluation LREC2004 / Lino, Maria Teresa ; Xavier, Maria Francesca ; Ferreira, Fátima ; Costa, Rute ; Silva, Raquel (ur.).
Pariz-Lisabon: ELRA, 2004. str. 1187-1190


Naslov
Making Monolingual Corpora Comparable: a Case Study of Bulgarian and Croatian

Autori
Bekavac, Božo ; Osenova, Petya ; Simov, Kiril ; Tadić, Marko

Vrsta, podvrsta i kategorija rada
Poglavlja u knjigama, znanstveni

Knjiga
Fourth International Conference on Language Resources and Evaluation LREC2004

Urednik/ci
Lino, Maria Teresa ; Xavier, Maria Francesca ; Ferreira, Fátima ; Costa, Rute ; Silva, Raquel

Izdavač
ELRA

Grad
Pariz-Lisabon

Godina
2004

Raspon stranica
1187-1190

ISBN
2-9517408-1-6

Ključne riječi
Corpus linguistics, comparable corpora, Croatian, Bulgarian

Sažetak
This paper describes the first steps towards the creation of a Bulgarian-Croatian comparable corpus. Its base are two newspaper subcorpora from larger reference corpora of Bulgarian and Croatian. In the beginning we rely on more extralinguistically-oriented, but methodologically cleaner parameters of similarity like: specific topics, pre-defined time span and data size. The idea of ‘ light’ and ‘ hard’ comparable corpora is introduced. At this stage we aim at producing a ‘ light’ bilingual comparable corpus. The algorithm for identifying lexical similarity and aligning linguistic units is presented, and the initial experiments are outlined.

Izvorni jezik
Engleski

Znanstvena područja
Informacijske i komunikacijske znanosti, Filologija, Etnologija i antropologija



POVEZANOST RADA


Projekt / tema
0130418

Ustanove
Filozofski fakultet, Zagreb