Named Entity Recognition in Croatian Tweets (CROSBI ID 619157)
Prilog sa skupa u zborniku | izvorni znanstveni rad | međunarodna recenzija
Podaci o odgovornosti
Baksa, Krešimir ; Dolović, Dino ; Glavaš, Goran ; Šnajder, Jan
engleski
Named Entity Recognition in Croatian Tweets
Existing named entity extraction tools, typically designed for formal texts written in standard language (e.g., news stories, essays, or legal texts), do not perform well on user-generated content (e.g., tweets). In this paper we present a supervised approach for named entity recognition and classification for Croatian tweets. Comparison of three different sequence labeling models (HMM, CRF, and SVM) revealed that CRF is the best model for the task, achieving a micro-averaged F1-score of over 87%. We also demonstrate that the state-of-the-art NER model designed for Croatian standard language texts performs much worse than our Twitter-specific NER models.
Named entity recognition; information extraction; twitter data; Croatian language
nije evidentirano
nije evidentirano
nije evidentirano
nije evidentirano
nije evidentirano
nije evidentirano
Podaci o prilogu
85-89.
2014.
objavljeno
Podaci o matičnoj publikaciji
Proceedings of the Ninth Language Technologies Conference, Information Society (IS-JT 2014)
Ljubljana:
Podaci o skupu
Ninth Language Technologies Conference, Information Society (IS-JT 2014)
predavanje
09.10.2014-10.10.2014
Ljubljana, Slovenija