Napredna pretraga

Pregled bibliografske jedinice broj: 125424

Building the Croatian National Corpus


Tadić, Marko
Building the Croatian National Corpus // Third International Conference on Language Resources and Evaluation LREC2002 / González Rodriguez, M. ; Suarez Araujo, C. P. (ur.).
Pariz-Las Palmas: ELRA, 2002. str. 441-446


Naslov
Building the Croatian National Corpus

Autori
Tadić, Marko

Vrsta, podvrsta i kategorija rada
Poglavlja u knjigama, znanstveni

Knjiga
Third International Conference on Language Resources and Evaluation LREC2002

Urednik/ci
González Rodriguez, M. ; Suarez Araujo, C. P.

Izdavač
ELRA

Grad
Pariz-Las Palmas

Godina
2002

Raspon stranica
441-446

ISBN
2-9517408-0-8

Ključne riječi
Croatian language, Corpus building, Croatian national corpus, Pos tagging

Sažetak
The paper presents the work being done so far on the building of the Croatian National Corpus (HNK). It's being collected since 1998 at the Institute of Linguistics, Faculty of Philosophy, University of Zagreb. The size, time-span, its composition and criteria for text selection are being presented. The HNK consists of two parts: 1) 30-million corpus of contemporary Croatian language, 2) Croatian Electronic Textual Archive. The procedures of the corpus mark-up and processing are being discussed. One of the most interesting features of this corpus since its launch in 1998 is its availability for querying through the WWW. The future directions of 30m corpus enlargement to 100m in next few years, enhanced corpus management and querying as well as annotation and processing are being discussed at the end.

Izvorni jezik
Engleski

Znanstvena područja
Filologija



POVEZANOST RADA


Projekt / tema
0130418

Ustanove
Filozofski fakultet, Zagreb

Autor s matičnim brojem:
Marko Tadić, (157043)