Corpus vs. lexicon supervision in morphosyntactic tagging: the case of Slovene

Ljubešić, Nikola; Erjavec, Tomaž

izvor podataka: crosbi ✓

Corpus vs. lexicon supervision in morphosyntactic tagging: the case of Slovene (CROSBI ID 643371)

Prilog sa skupa u zborniku | izvorni znanstveni rad | međunarodna recenzija

Ljubešić, Nikola ; Erjavec, Tomaž Corpus vs. lexicon supervision in morphosyntactic tagging: the case of Slovene // Proceedings of the Tenth International conference on language resources and evaluation (LREC 2016) / Calzolari, N. (ur.). Portorož: European Language Resources Association (ELRA), 2016. str. 1527-1531

Podaci o odgovornosti

Autori

Ljubešić, Nikola ; Erjavec, Tomaž

Osnovni podaci na izvornom jeziku
Osnovni podaci na ostalim jezicima

Jezik

engleski

Naslov

Corpus vs. lexicon supervision in morphosyntactic tagging: the case of Slovene

Sažetak

In this paper we present a tagger developed for inflectionally rich languages for which both a training corpus and a lexicon are available. We do not constrain the tagger by the lexicon entries, allowing both for lexicon incompleteness and noisiness. By using the lexicon indirectly through features we allow for known and unknown words to be tagged in the same manner. We test our tagger on Slovene data, obtaining a 25% error reduction of the best previous results both on known and unknown words. Given that Slovene is, in comparison to some other Slavic languages, a well-resourced language, we perform experiments on the impact of token (corpus) vs. type (lexicon) supervision, obtaining useful insights in how to balance the effort of extending resources to yield better tagging results.

Ključne riječi

Part-of-Speech tagging ; evaluation ; Slavic languages

Napomena

nije evidentirano

Jezik

nije evidentirano

Naslov

nije evidentirano

Sažetak

nije evidentirano

Ključne riječi

nije evidentirano

Napomena

nije evidentirano

Podaci o prilogu

Stranice rada

1527-1531.

Godina izdavanja

2016.

Status objave rada

objavljeno

Podaci o matičnoj publikaciji

Naslov

Proceedings of the Tenth International conference on language resources and evaluation (LREC 2016)

Urednici

Calzolari, N.

Izdavač

Portorož: European Language Resources Association (ELRA)

ISBN

978-2-9517408-9-1

Podaci o skupu

Skup

Tenth International Conference on Language Resources and Evaluation (LREC 2016)

Vrsta sudjelovanja

poster

Datum održavanja skupa

23.05.2016-28.05.2016

Mjesto održavanja skupa

Portorož, Slovenija

Povezanost rada

Povezane osobe

Nikola Ljubešić (autor/i)

Povezane ustanove

Filozofski fakultet u Zagrebu (130) (autorova ustanova)

Područje

Informacijske i komunikacijske znanosti

Poveznice

aclweb.org