Building Named Entity Recognition Models For Croatian And Slovene (CROSBI ID 594346)
Prilog sa skupa u zborniku | izvorni znanstveni rad | međunarodna recenzija
Podaci o odgovornosti
Ljubešić, Nikola ; Stupar, Marija ; Jurić, Tereza
engleski
Building Named Entity Recognition Models For Croatian And Slovene
The paper presents efforts in developing freely available models for named entity recognition and classification for Croatian and Slovene. Our experiments focus on the most informative set of linguistic features taking into account the availability of language tools for the lan- guages in question. Beside the classic linguistic features, distributional similarity features calculated from large unannotated monolingual corpora are exploited as well. Using distributional information improves the results for 7-8 points in F1 while adding morphological infor- mation improves the results for additional 3-4 points in both languages. The best performing models, along with test sets for comparison with future and existing systems and a HunPos part-of-speech model for Croatian are available for download for academic usage.
named entity recognition; distributional similarity; Croatian language; Slovene language
nije evidentirano
nije evidentirano
nije evidentirano
nije evidentirano
nije evidentirano
nije evidentirano
Podaci o prilogu
129-134.
2012.
objavljeno
Podaci o matičnoj publikaciji
Proceedings of the Eighth LANGUAGE TECHNOLOGIES Conference
Erjavec, Tomaž ; Žganec Gros, Jerneja
Ljubljana:
Podaci o skupu
Eighth Language Technologies Conference
predavanje
08.10.2012-09.10.2012
Ljubljana, Slovenija