Nalazite se na CroRIS probnoj okolini. Ovdje evidentirani podaci neće biti pohranjeni u Informacijskom sustavu znanosti RH. Ako je ovo greška, CroRIS produkcijskoj okolini moguće je pristupi putem poveznice www.croris.hr
izvor podataka: crosbi !

Lemmatization and Morphosyntactic Tagging of Croatian and Serbian (CROSBI ID 599073)

Prilog sa skupa u zborniku | izvorni znanstveni rad | međunarodna recenzija

Agić, Željko ; Ljubešić, Nikola ; Merkler, Danijela Lemmatization and Morphosyntactic Tagging of Croatian and Serbian // Proceedings of the 4th Biennial International Workshop on Balto-Slavic Natural Language Processing. Sofija: Association for Computational Linguistics (ACL), 2013. str. 48-57

Podaci o odgovornosti

Agić, Željko ; Ljubešić, Nikola ; Merkler, Danijela

engleski

Lemmatization and Morphosyntactic Tagging of Croatian and Serbian

We investigate state-of-the-art statistical models for lemmatization and morphosyntactic tagging of Croatian and Serbian. The models stem from a new manually annotated SETIMES.HR corpus of Croatian, based on the SETimes parallel corpus. We train models on Croatian text and evaluate them on samples of Croatian and Serbian from the SETimes corpus and the two Wikipedias. Lemmatization accuracy for the two languages reaches 97.87% and 96.30%, while full morphosyntactic tagging accuracy using a 600-tag tagset peaks at 87.72% and 85.56%, respectively. Part of speech tagging accuracies reach 97.13% and 96.46%. Results indicate that more complex methods of Croatian-to- Serbian annotation projection are not required on such dataset sizes for these particular tasks. The SETIMES.HR corpus, its resulting models and test sets are all made freely available .

lemmatization; tagging; Croatian; Serbian

nije evidentirano

nije evidentirano

nije evidentirano

nije evidentirano

nije evidentirano

nije evidentirano

Podaci o prilogu

48-57.

2013.

objavljeno

Podaci o matičnoj publikaciji

Proceedings of the 4th Biennial International Workshop on Balto-Slavic Natural Language Processing

Sofija: Association for Computational Linguistics (ACL)

Podaci o skupu

4th Biennial International Workshop on Balto-Slavic Natural Language Processing (BSNLP 2013)

predavanje

08.08.2013-09.08.2013

Sofija, Bugarska

Povezanost rada

Informacijske i komunikacijske znanosti

Poveznice