Pretražite po imenu i prezimenu autora, mentora, urednika, prevoditelja

Napredna pretraga

Pregled bibliografske jedinice broj: 426782

Evaluating Full Lemmatization of Croatian Texts


Agić, Željko; Tadić, Marko; Dovedan, Zdravko
Evaluating Full Lemmatization of Croatian Texts // Technologies for the Processing and Retrieval of Semi- Structured Documents: Experience from the CADIAL Project / Tadić, Marko ; Dalbelo Bašić, Bojana ; Moens, Marie-Francine (ur.).
Zagreb: Hrvatsko društvo za jezične tehnologije, 2009. str. 133-144


CROSBI ID: 426782 Za ispravke kontaktirajte CROSBI podršku putem web obrasca

Naslov
Evaluating Full Lemmatization of Croatian Texts

Autori
Agić, Željko ; Tadić, Marko ; Dovedan, Zdravko

Vrsta, podvrsta i kategorija rada
Poglavlja u knjigama, znanstveni

Knjiga
Technologies for the Processing and Retrieval of Semi- Structured Documents: Experience from the CADIAL Project

Urednik/ci
Tadić, Marko ; Dalbelo Bašić, Bojana ; Moens, Marie-Francine

Izdavač
Hrvatsko društvo za jezične tehnologije

Grad
Zagreb

Godina
2009

Raspon stranica
133-144

ISBN
978-953-55375-1-9

Ključne riječi
full lemmatization, morphosyntactic tagging, Croatian language

Sažetak
The chapter presents the implementation and evaluation of a module for full lemmatization of Croatian texts. The module implements several lemmatization procedures, all of them based on merging outputs of the previously developed stochastic morphosyntactic tagger CroTag and the infectional lexicon of Croatian. Evaluation of the lemmatization module on two test cases, simulating realistic and ideal operating conditions, provided full lemmatization accuracy scores of 96.96 and 98.15 percent on a newspaper corpus, respectively. It is also shown that a majority of errors in this framework, 57.14 percent in the realistic testing scenario, occur on word forms with external homography. Moreover, approximately 80 percent of all lemmatization errors occur on nouns, adjectives, verbs and adverbs in that particular order. Language resources, testing environment and procedure descriptions are provided in the paper along with a discussion of obtained results and possible future research directions.

Izvorni jezik
Engleski

Znanstvena područja
Računarstvo, Informacijske i komunikacijske znanosti, Filologija

Napomena
This is a corrected version of a paper published in Klopotek, M. ; Przepiorkowski, A. ; Wierzchon, S. ; Trojanowski, K. (eds.) (2009) Recent Advances in Intelligent Information Systems, Academic Publishing House EXIT, Warsaw, 175-184.



POVEZANOST RADA


Projekti:
036-1300646-1986 - Otkrivanje znanja u tekstnim podacima (Dalbelo-Bašić, Bojana, MZO ) ( CroRIS)
130-1300646-0645 - Hrvatski jezični resursi i njihovo obilježavanje (Tadić, Marko, MZOS ) ( CroRIS)
130-1300646-1776 - Računalna sintaksa hrvatskoga jezika (Dovedan Han, Zdravko, MZOS ) ( CroRIS)

Ustanove:
Fakultet elektrotehnike i računarstva, Zagreb,
Filozofski fakultet, Zagreb

Profili:

Avatar Url Zdravko Dovedan Han (autor)

Avatar Url Marko Tadić (autor)

Avatar Url Željko Agić (autor)

Poveznice na cjeloviti tekst rada:

Pristup cjelovitom tekstu rada

Citiraj ovu publikaciju:

Agić, Željko; Tadić, Marko; Dovedan, Zdravko
Evaluating Full Lemmatization of Croatian Texts // Technologies for the Processing and Retrieval of Semi- Structured Documents: Experience from the CADIAL Project / Tadić, Marko ; Dalbelo Bašić, Bojana ; Moens, Marie-Francine (ur.).
Zagreb: Hrvatsko društvo za jezične tehnologije, 2009. str. 133-144
Agić, Ž., Tadić, M. & Dovedan, Z. (2009) Evaluating Full Lemmatization of Croatian Texts. U: Tadić, M., Dalbelo Bašić, B. & Moens, M. (ur.) Technologies for the Processing and Retrieval of Semi- Structured Documents: Experience from the CADIAL Project. Zagreb, Hrvatsko društvo za jezične tehnologije, str. 133-144.
@inbook{inbook, author = {Agi\'{c}, \v{Z}eljko and Tadi\'{c}, Marko and Dovedan, Zdravko}, year = {2009}, pages = {133-144}, keywords = {full lemmatization, morphosyntactic tagging, Croatian language}, isbn = {978-953-55375-1-9}, title = {Evaluating Full Lemmatization of Croatian Texts}, keyword = {full lemmatization, morphosyntactic tagging, Croatian language}, publisher = {Hrvatsko dru\v{s}tvo za jezi\v{c}ne tehnologije}, publisherplace = {Zagreb} }
@inbook{inbook, author = {Agi\'{c}, \v{Z}eljko and Tadi\'{c}, Marko and Dovedan, Zdravko}, year = {2009}, pages = {133-144}, keywords = {full lemmatization, morphosyntactic tagging, Croatian language}, isbn = {978-953-55375-1-9}, title = {Evaluating Full Lemmatization of Croatian Texts}, keyword = {full lemmatization, morphosyntactic tagging, Croatian language}, publisher = {Hrvatsko dru\v{s}tvo za jezi\v{c}ne tehnologije}, publisherplace = {Zagreb} }




Contrast
Increase Font
Decrease Font
Dyslexic Font