New inflectional lexicons and training corpora for improved morphosyntactic annotation of Croatian and Serbian

Ljubešić, Nikola; Klubička, Filip; Agić, Željko; Jazbec, Ivo-Pavao

Pregled bibliografske jedinice broj: 815830

New inflectional lexicons and training corpora for improved morphosyntactic annotation of Croatian and Serbian

Ljubešić, Nikola; Klubička, Filip; Agić, Željko; Jazbec, Ivo-Pavao

New inflectional lexicons and training corpora for improved morphosyntactic annotation of Croatian and Serbian // Proceedings of the Tenth International conference on language resources and evaluation (LREC 2016)
Portorož: European Language Resources Association (ELRA), 2016. str. 4264-4270 (poster, međunarodna recenzija, cjeloviti rad (in extenso), znanstveni)

CROSBI ID: 815830 Za ispravke kontaktirajte CROSBI podršku putem web obrasca

Naslov
New inflectional lexicons and training corpora for improved morphosyntactic annotation of Croatian and Serbian

Autori
Ljubešić, Nikola ; Klubička, Filip ; Agić, Željko ; Jazbec, Ivo-Pavao

Vrsta, podvrsta i kategorija rada
Radovi u zbornicima skupova, cjeloviti rad (in extenso), znanstveni

Izvornik
Proceedings of the Tenth International conference on language resources and evaluation (LREC 2016) / - Portorož : European Language Resources Association (ELRA), 2016, 4264-4270

ISBN
978-2-9517408-9-1

Skup
Tenth International conference on language resources and evaluation - LREC 2016

Mjesto i datum
Portorož, Slovenija, 23.05.2016. - 28.05.2016

Vrsta sudjelovanja
Poster

Vrsta recenzije
Međunarodna recenzija

Ključne riječi
inflectional lexicon ; morphosyntactic annotation ; Croatian ; Serbian

Sažetak
In this paper we present newly developed inflectional lexcions and manually annotated corpora of Croatian and Serbian. We introducehrLexandsrLex—two freely available inflectional lexicons of Croatian and Serbian—and describe the process of building theselexicons, supported by supervised machine learning techniques for lemma and paradigm prediction. Furthermore, we introducehr500k, a manually annotated corpus of Croatian, 500 thousand tokens in size. We showcase the three newly developed resources on the task ofmorphosyntactic annotation of both languages by using a recently developed CRF tagger. We achieve best results yet reported on thetask for both languages, beating the HunPos baseline trained on the same datasets by a wide margin.

Izvorni jezik
Engleski

Znanstvena područja
Računarstvo, Informacijske i komunikacijske znanosti

POVEZANOST RADA

Ustanove:
Filozofski fakultet, Zagreb

Profili:

Željko Agić (autor)

Nikola Ljubešić (autor)

Poveznice na cjeloviti tekst rada:

Pristup cjelovitom tekstu rada www.lrec-conf.org

CROSBI Hrvatska znanstvena bibliografija

Pregled bibliografske jedinice broj: 815830

New inflectional lexicons and training corpora for improved morphosyntactic annotation of Croatian and Serbian

Poveznice na cjeloviti tekst rada:

Citiraj ovu publikaciju:

Pregled bibliografske jedinice broj: 815830

New inflectional lexicons and training corpora for improved morphosyntactic annotation of Croatian and Serbian

Poveznice na cjeloviti tekst rada:

Citiraj ovu publikaciju:

Podijeli: