Nalazite se na CroRIS probnoj okolini. Ovdje evidentirani podaci neće biti pohranjeni u Informacijskom sustavu znanosti RH. Ako je ovo greška, CroRIS produkcijskoj okolini moguće je pristupi putem poveznice www.croris.hr
izvor podataka: crosbi !

Domain-aware Evaluation of Named Entity Recognition Systems for Croatian (CROSBI ID 195694)

Prilog u časopisu | izvorni znanstveni rad | međunarodna recenzija

Agić, Željko ; Bekavac, Božo Domain-aware Evaluation of Named Entity Recognition Systems for Croatian // CIT. Journal of computing and information technology, 21 (2013), 3; 195-209. doi: 10.2498/cit.1002190

Podaci o odgovornosti

Agić, Željko ; Bekavac, Božo

engleski

Domain-aware Evaluation of Named Entity Recognition Systems for Croatian

We provide an evaluation of the currently available named entity recognition systems for Croatian. The evaluation puts special emphasis on domain dependence. To this goal, we manually annotated a dataset of approximately 1 million tokens of Croatian text from various domains within the newspaper text genre. The dataset was annotated using a three-class named entity tagset -- denoting personal names, locations and organizations. We give insight to feature selection, domain sensitivity and effects of increase in training set size for statistical named entity recognition using the state-of-the- art Stanford NER system. We also sketch a comparison of publicly available named entity recognition systems for Croatian considering domain dependence, regardless of their underlying paradigms. Our top-performing system achieved an F1 - score of 0.884 in a mixed-domain testing scenario, scoring 0.925 and 0.843 in the two domains separated for the experiment. The system shows consistency in state-of-the-art scores for detecting names of persons, locations and organizations.

named entity recognition; Croatian language; text domain; domain dependence; evaluation

nije evidentirano

nije evidentirano

nije evidentirano

nije evidentirano

nije evidentirano

nije evidentirano

Podaci o izdanju

21 (3)

2013.

195-209

objavljeno

1330-1136

10.2498/cit.1002190

Povezanost rada

Informacijske i komunikacijske znanosti

Poveznice
Indeksiranost