Pregled bibliografske jedinice broj: 125424
Building the Croatian National Corpus
Building the Croatian National Corpus // Third International Conference on Language Resources and Evaluation LREC2002 / González Rodriguez, M. ; Suarez Araujo, C. P. (ur.).
Pariz : Las Palmas de Gran Canaria: European Language Resources Association (ELRA), 2002. str. 441-446
CROSBI ID: 125424 Za ispravke kontaktirajte CROSBI podršku putem web obrasca
Naslov
Building the Croatian National Corpus
Autori
Tadić, Marko
Vrsta, podvrsta i kategorija rada
Poglavlja u knjigama, znanstveni
Knjiga
Third International Conference on Language Resources and Evaluation LREC2002
Urednik/ci
González Rodriguez, M. ; Suarez Araujo, C. P.
Izdavač
European Language Resources Association (ELRA)
Grad
Pariz : Las Palmas de Gran Canaria
Godina
2002
Raspon stranica
441-446
ISBN
2-9517408-0-8
Ključne riječi
Croatian language, Corpus building, Croatian national corpus, Pos tagging
Sažetak
The paper presents the work being done so far on the building of the Croatian National Corpus (HNK). It's being collected since 1998 at the Institute of Linguistics, Faculty of Philosophy, University of Zagreb. The size, time-span, its composition and criteria for text selection are being presented. The HNK consists of two parts: 1) 30-million corpus of contemporary Croatian language, 2) Croatian Electronic Textual Archive. The procedures of the corpus mark-up and processing are being discussed. One of the most interesting features of this corpus since its launch in 1998 is its availability for querying through the WWW. The future directions of 30m corpus enlargement to 100m in next few years, enhanced corpus management and querying as well as annotation and processing are being discussed at the end.
Izvorni jezik
Engleski
Znanstvena područja
Filologija
POVEZANOST RADA