Pregled bibliografske jedinice broj: 174994
Making Monolingual Corpora Comparable: a Case Study of Bulgarian and Croatian
Making Monolingual Corpora Comparable: a Case Study of Bulgarian and Croatian // Fourth International Conference on Language Resources and Evaluation LREC2004 / Lino, Maria Teresa ; Xavier, Maria Francesca ; Ferreira, Fátima ; Costa, Rute ; Silva, Raquel (ur.).
Pariz : Lisabon: European Language Resources Association (ELRA), 2004. str. 1187-1190
CROSBI ID: 174994 Za ispravke kontaktirajte CROSBI podršku putem web obrasca
Naslov
Making Monolingual Corpora Comparable: a Case Study of Bulgarian and Croatian
Autori
Bekavac, Božo ; Osenova, Petya ; Simov, Kiril ; Tadić, Marko
Vrsta, podvrsta i kategorija rada
Poglavlja u knjigama, znanstveni
Knjiga
Fourth International Conference on Language Resources and Evaluation LREC2004
Urednik/ci
Lino, Maria Teresa ; Xavier, Maria Francesca ; Ferreira, Fátima ; Costa, Rute ; Silva, Raquel
Izdavač
European Language Resources Association (ELRA)
Grad
Pariz : Lisabon
Godina
2004
Raspon stranica
1187-1190
ISBN
2-9517408-1-6
Ključne riječi
corpus linguistics, comparable corpora, Croatian, Bulgarian
Sažetak
This paper describes the first steps towards the creation of a Bulgarian-Croatian comparable corpus. Its base are two newspaper subcorpora from larger reference corpora of Bulgarian and Croatian. In the beginning we rely on more extralinguistically-oriented, but methodologically cleaner parameters of similarity like: specific topics, pre-defined time span and data size. The idea of ‘ light’ and ‘ hard’ comparable corpora is introduced. At this stage we aim at producing a ‘ light’ bilingual comparable corpus. The algorithm for identifying lexical similarity and aligning linguistic units is presented, and the initial experiments are outlined.
Izvorni jezik
Engleski
Znanstvena područja
Informacijske i komunikacijske znanosti, Filologija, Etnologija i antropologija