Pregled bibliografske jedinice broj: 815830
New inflectional lexicons and training corpora for improved morphosyntactic annotation of Croatian and Serbian
New inflectional lexicons and training corpora for improved morphosyntactic annotation of Croatian and Serbian // Proceedings of the Tenth International conference on language resources and evaluation (LREC 2016)
Portorož: European Language Resources Association (ELRA), 2016. str. 4264-4270 (poster, međunarodna recenzija, cjeloviti rad (in extenso), znanstveni)
CROSBI ID: 815830 Za ispravke kontaktirajte CROSBI podršku putem web obrasca
Naslov
New inflectional lexicons and training corpora
for improved morphosyntactic annotation of
Croatian and Serbian
Autori
Ljubešić, Nikola ; Klubička, Filip ; Agić, Željko ; Jazbec, Ivo-Pavao
Vrsta, podvrsta i kategorija rada
Radovi u zbornicima skupova, cjeloviti rad (in extenso), znanstveni
Izvornik
Proceedings of the Tenth International conference on language resources and evaluation (LREC 2016)
/ - Portorož : European Language Resources Association (ELRA), 2016, 4264-4270
ISBN
978-2-9517408-9-1
Skup
Tenth International conference on language resources and evaluation - LREC 2016
Mjesto i datum
Portorož, Slovenija, 23.05.2016. - 28.05.2016
Vrsta sudjelovanja
Poster
Vrsta recenzije
Međunarodna recenzija
Ključne riječi
inflectional lexicon ; morphosyntactic annotation ; Croatian ; Serbian
Sažetak
In this paper we present newly developed inflectional lexcions and manually annotated corpora of Croatian and Serbian. We introducehrLexandsrLex—two freely available inflectional lexicons of Croatian and Serbian—and describe the process of building theselexicons, supported by supervised machine learning techniques for lemma and paradigm prediction. Furthermore, we introducehr500k, a manually annotated corpus of Croatian, 500 thousand tokens in size. We showcase the three newly developed resources on the task ofmorphosyntactic annotation of both languages by using a recently developed CRF tagger. We achieve best results yet reported on thetask for both languages, beating the HunPos baseline trained on the same datasets by a wide margin.
Izvorni jezik
Engleski
Znanstvena područja
Računarstvo, Informacijske i komunikacijske znanosti
POVEZANOST RADA
Ustanove:
Filozofski fakultet, Zagreb