Pretražite po imenu i prezimenu autora, mentora, urednika, prevoditelja

Napredna pretraga

Pregled bibliografske jedinice broj: 643539

Domain-aware Evaluation of Named Entity Recognition Systems for Croatian


Agić, Željko; Bekavac, Božo
Domain-aware Evaluation of Named Entity Recognition Systems for Croatian // CIT. Journal of computing and information technology, 21 (2013), 3; 195-209 doi::10.2498/cit.1002190 (međunarodna recenzija, članak, znanstveni)


CROSBI ID: 643539 Za ispravke kontaktirajte CROSBI podršku putem web obrasca

Naslov
Domain-aware Evaluation of Named Entity Recognition Systems for Croatian

Autori
Agić, Željko ; Bekavac, Božo

Izvornik
CIT. Journal of computing and information technology (1330-1136) 21 (2013), 3; 195-209

Vrsta, podvrsta i kategorija rada
Radovi u časopisima, članak, znanstveni

Ključne riječi
named entity recognition; Croatian language; text domain; domain dependence; evaluation

Sažetak
We provide an evaluation of the currently available named entity recognition systems for Croatian. The evaluation puts special emphasis on domain dependence. To this goal, we manually annotated a dataset of approximately 1 million tokens of Croatian text from various domains within the newspaper text genre. The dataset was annotated using a three-class named entity tagset -- denoting personal names, locations and organizations. We give insight to feature selection, domain sensitivity and effects of increase in training set size for statistical named entity recognition using the state-of-the- art Stanford NER system. We also sketch a comparison of publicly available named entity recognition systems for Croatian considering domain dependence, regardless of their underlying paradigms. Our top-performing system achieved an F1 - score of 0.884 in a mixed-domain testing scenario, scoring 0.925 and 0.843 in the two domains separated for the experiment. The system shows consistency in state-of-the-art scores for detecting names of persons, locations and organizations.

Izvorni jezik
Engleski

Znanstvena područja
Informacijske i komunikacijske znanosti



POVEZANOST RADA


Projekti:
130-1300646-1776 - Računalna sintaksa hrvatskoga jezika (Dovedan Han, Zdravko, MZOS ) ( CroRIS)

Ustanove:
Filozofski fakultet, Zagreb

Profili:

Avatar Url Božo Bekavac (autor)

Avatar Url Željko Agić (autor)

Poveznice na cjeloviti tekst rada:

doi cit.srce.unizg.hr

Citiraj ovu publikaciju:

Agić, Željko; Bekavac, Božo
Domain-aware Evaluation of Named Entity Recognition Systems for Croatian // CIT. Journal of computing and information technology, 21 (2013), 3; 195-209 doi::10.2498/cit.1002190 (međunarodna recenzija, članak, znanstveni)
Agić, Ž. & Bekavac, B. (2013) Domain-aware Evaluation of Named Entity Recognition Systems for Croatian. CIT. Journal of computing and information technology, 21 (3), 195-209 doi::10.2498/cit.1002190.
@article{article, author = {Agi\'{c}, \v{Z}eljko and Bekavac, Bo\v{z}o}, year = {2013}, pages = {195-209}, DOI = {doi:10.2498/cit.1002190}, keywords = {named entity recognition, Croatian language, text domain, domain dependence, evaluation}, journal = {CIT. Journal of computing and information technology}, doi = {doi:10.2498/cit.1002190}, volume = {21}, number = {3}, issn = {1330-1136}, title = {Domain-aware Evaluation of Named Entity Recognition Systems for Croatian}, keyword = {named entity recognition, Croatian language, text domain, domain dependence, evaluation} }
@article{article, author = {Agi\'{c}, \v{Z}eljko and Bekavac, Bo\v{z}o}, year = {2013}, pages = {195-209}, DOI = {doi:10.2498/cit.1002190}, keywords = {named entity recognition, Croatian language, text domain, domain dependence, evaluation}, journal = {CIT. Journal of computing and information technology}, doi = {doi:10.2498/cit.1002190}, volume = {21}, number = {3}, issn = {1330-1136}, title = {Domain-aware Evaluation of Named Entity Recognition Systems for Croatian}, keyword = {named entity recognition, Croatian language, text domain, domain dependence, evaluation} }

Časopis indeksira:


  • Scopus


Uključenost u ostale bibliografske baze podataka::


  • Compendex (EI Village)
  • INSPEC
  • LISA: Library and Information Science Abstracts
  • Scopus
  • DOAJ
  • EI Compendex
  • EBSCO
  • DBLP


Citati:





    Contrast
    Increase Font
    Decrease Font
    Dyslexic Font