Pregled bibliografske jedinice broj: 643539
Domain-aware Evaluation of Named Entity Recognition Systems for Croatian
Domain-aware Evaluation of Named Entity Recognition Systems for Croatian // CIT. Journal of computing and information technology, 21 (2013), 3; 195-209 doi::10.2498/cit.1002190 (međunarodna recenzija, članak, znanstveni)
CROSBI ID: 643539 Za ispravke kontaktirajte CROSBI podršku putem web obrasca
Naslov
Domain-aware Evaluation of Named Entity Recognition Systems for Croatian
Autori
Agić, Željko ; Bekavac, Božo
Izvornik
CIT. Journal of computing and information technology (1330-1136) 21
(2013), 3;
195-209
Vrsta, podvrsta i kategorija rada
Radovi u časopisima, članak, znanstveni
Ključne riječi
named entity recognition; Croatian language; text domain; domain dependence; evaluation
Sažetak
We provide an evaluation of the currently available named entity recognition systems for Croatian. The evaluation puts special emphasis on domain dependence. To this goal, we manually annotated a dataset of approximately 1 million tokens of Croatian text from various domains within the newspaper text genre. The dataset was annotated using a three-class named entity tagset -- denoting personal names, locations and organizations. We give insight to feature selection, domain sensitivity and effects of increase in training set size for statistical named entity recognition using the state-of-the- art Stanford NER system. We also sketch a comparison of publicly available named entity recognition systems for Croatian considering domain dependence, regardless of their underlying paradigms. Our top-performing system achieved an F1 - score of 0.884 in a mixed-domain testing scenario, scoring 0.925 and 0.843 in the two domains separated for the experiment. The system shows consistency in state-of-the-art scores for detecting names of persons, locations and organizations.
Izvorni jezik
Engleski
Znanstvena područja
Informacijske i komunikacijske znanosti
POVEZANOST RADA
Projekti:
130-1300646-1776 - Računalna sintaksa hrvatskoga jezika (Dovedan Han, Zdravko, MZOS ) ( CroRIS)
Ustanove:
Filozofski fakultet, Zagreb
Citiraj ovu publikaciju:
Časopis indeksira:
- Scopus
Uključenost u ostale bibliografske baze podataka::
- Compendex (EI Village)
- INSPEC
- LISA: Library and Information Science Abstracts
- Scopus
- DOAJ
- EI Compendex
- EBSCO
- DBLP