Pretražite po imenu i prezimenu autora, mentora, urednika, prevoditelja

Napredna pretraga

Pregled bibliografske jedinice broj: 852068

Corpus vs. lexicon supervision in morphosyntactic tagging: the case of Slovene


Ljubešić, Nikola; Erjavec, Tomaž
Corpus vs. lexicon supervision in morphosyntactic tagging: the case of Slovene // Proceedings of the Tenth International conference on language resources and evaluation (LREC 2016) / Calzolari, N. (ur.).
Portorož: European Language Resources Association (ELRA), 2016. str. 1527-1531 (poster, međunarodna recenzija, cjeloviti rad (in extenso), znanstveni)


CROSBI ID: 852068 Za ispravke kontaktirajte CROSBI podršku putem web obrasca

Naslov
Corpus vs. lexicon supervision in morphosyntactic tagging: the case of Slovene

Autori
Ljubešić, Nikola ; Erjavec, Tomaž

Vrsta, podvrsta i kategorija rada
Radovi u zbornicima skupova, cjeloviti rad (in extenso), znanstveni

Izvornik
Proceedings of the Tenth International conference on language resources and evaluation (LREC 2016) / Calzolari, N. - Portorož : European Language Resources Association (ELRA), 2016, 1527-1531

ISBN
978-2-9517408-9-1

Skup
Tenth International conference on language resources and evaluation - LREC 2016

Mjesto i datum
Portorož, Slovenija, 23.05.2016. - 28.05.2016

Vrsta sudjelovanja
Poster

Vrsta recenzije
Međunarodna recenzija

Ključne riječi
Part-of-Speech tagging ; evaluation ; Slavic languages

Sažetak
In this paper we present a tagger developed for inflectionally rich languages for which both a training corpus and a lexicon are available. We do not constrain the tagger by the lexicon entries, allowing both for lexicon incompleteness and noisiness. By using the lexicon indirectly through features we allow for known and unknown words to be tagged in the same manner. We test our tagger on Slovene data, obtaining a 25% error reduction of the best previous results both on known and unknown words. Given that Slovene is, in comparison to some other Slavic languages, a well-resourced language, we perform experiments on the impact of token (corpus) vs. type (lexicon) supervision, obtaining useful insights in how to balance the effort of extending resources to yield better tagging results.

Izvorni jezik
Engleski

Znanstvena područja
Informacijske i komunikacijske znanosti



POVEZANOST RADA


Ustanove:
Filozofski fakultet, Zagreb

Profili:

Avatar Url Nikola Ljubešić (autor)

Poveznice na cjeloviti tekst rada:

www.aclweb.org

Citiraj ovu publikaciju:

Ljubešić, Nikola; Erjavec, Tomaž
Corpus vs. lexicon supervision in morphosyntactic tagging: the case of Slovene // Proceedings of the Tenth International conference on language resources and evaluation (LREC 2016) / Calzolari, N. (ur.).
Portorož: European Language Resources Association (ELRA), 2016. str. 1527-1531 (poster, međunarodna recenzija, cjeloviti rad (in extenso), znanstveni)
Ljubešić, N. & Erjavec, T. (2016) Corpus vs. lexicon supervision in morphosyntactic tagging: the case of Slovene. U: Calzolari, N. (ur.)Proceedings of the Tenth International conference on language resources and evaluation (LREC 2016).
@article{article, author = {Ljube\v{s}i\'{c}, Nikola and Erjavec, Toma\v{z}}, editor = {Calzolari, N.}, year = {2016}, pages = {1527-1531}, keywords = {Part-of-Speech tagging, evaluation, Slavic languages}, isbn = {978-2-9517408-9-1}, title = {Corpus vs. lexicon supervision in morphosyntactic tagging: the case of Slovene}, keyword = {Part-of-Speech tagging, evaluation, Slavic languages}, publisher = {European Language Resources Association (ELRA)}, publisherplace = {Portoro\v{z}, Slovenija} }
@article{article, author = {Ljube\v{s}i\'{c}, Nikola and Erjavec, Toma\v{z}}, editor = {Calzolari, N.}, year = {2016}, pages = {1527-1531}, keywords = {Part-of-Speech tagging, evaluation, Slavic languages}, isbn = {978-2-9517408-9-1}, title = {Corpus vs. lexicon supervision in morphosyntactic tagging: the case of Slovene}, keyword = {Part-of-Speech tagging, evaluation, Slavic languages}, publisher = {European Language Resources Association (ELRA)}, publisherplace = {Portoro\v{z}, Slovenija} }




Contrast
Increase Font
Decrease Font
Dyslexic Font