Pregled bibliografske jedinice broj: 737162
Named Entity Recognition in Croatian Tweets
Named Entity Recognition in Croatian Tweets // Proceedings of the Ninth Language Technologies Conference, Information Society (IS-JT 2014)
Ljubljana, 2014. str. 85-89 (predavanje, međunarodna recenzija, cjeloviti rad (in extenso), znanstveni)
CROSBI ID: 737162 Za ispravke kontaktirajte CROSBI podršku putem web obrasca
Naslov
Named Entity Recognition in Croatian Tweets
Autori
Baksa, Krešimir ; Dolović, Dino ; Glavaš, Goran ; Šnajder, Jan
Vrsta, podvrsta i kategorija rada
Radovi u zbornicima skupova, cjeloviti rad (in extenso), znanstveni
Izvornik
Proceedings of the Ninth Language Technologies Conference, Information Society (IS-JT 2014)
/ - Ljubljana, 2014, 85-89
Skup
Ninth Language Technologies Conference, Information Society (IS-JT 2014)
Mjesto i datum
Ljubljana, Slovenija, 09.10.2014. - 10.10.2014
Vrsta sudjelovanja
Predavanje
Vrsta recenzije
Međunarodna recenzija
Ključne riječi
Named entity recognition; information extraction; twitter data; Croatian language
Sažetak
Existing named entity extraction tools, typically designed for formal texts written in standard language (e.g., news stories, essays, or legal texts), do not perform well on user-generated content (e.g., tweets). In this paper we present a supervised approach for named entity recognition and classification for Croatian tweets. Comparison of three different sequence labeling models (HMM, CRF, and SVM) revealed that CRF is the best model for the task, achieving a micro-averaged F1-score of over 87%. We also demonstrate that the state-of-the-art NER model designed for Croatian standard language texts performs much worse than our Twitter-specific NER models.
Izvorni jezik
Engleski
Znanstvena područja
Računarstvo
POVEZANOST RADA
Ustanove:
Fakultet elektrotehnike i računarstva, Zagreb