Pretražite po imenu i prezimenu autora, mentora, urednika, prevoditelja

Napredna pretraga

Pregled bibliografske jedinice broj: 332419

The Croatian Lemmatization Server


Tadić, Marko
The Croatian Lemmatization Server // Southern Journal of Linguistics, 29 (2005), 1/2; 206-217 (podatak o recenziji nije dostupan, članak, znanstveni)


CROSBI ID: 332419 Za ispravke kontaktirajte CROSBI podršku putem web obrasca

Naslov
The Croatian Lemmatization Server

Autori
Tadić, Marko

Izvornik
Southern Journal of Linguistics (0730-6245) 29 (2005), 1/2; 206-217

Vrsta, podvrsta i kategorija rada
Radovi u časopisima, članak, znanstveni

Ključne riječi
lemmatization; morphological processing; computational linguistics; Croatian; web service

Sažetak
The need for lemmatization in inflectionally rich languages is indisputable: it is applicable for the whole range of procedures, from text-search up to parsing. From two predominant approaches to lemmatization (algorithmic— generally rule-based and realized with FSA— and relational— generally data-driven and realized with databases), this paper opted for the latter. The reason is that formal-grammar approaches to Croatian morphology are rare and limited just to a part of morphological system. The other reason is that the generator for Croatian has already been developed (Tadić 1994) as well as Croatian Morphological Lexicon (CML) (Tadić and Fulgosi 2003). The idea was to offer an on-line lemmatization, POS/MSD service using CML v4.5 as the back-end. The Croatian Lemmatization Server (CLS) is available at http://hml.hnk.ffzg.hr, and it offers lemmatization and POS/MSD tagging at unigram level for now. For each token in submitted text, the server delivers all possible lemmas of which this token may be a word-form. For homographic tokens, each lemma is accompanied with all possible POS/MSD tags which are compliant to MulTextEast v3 specifications for Croatian. The CLS can also be used for generation: when lemma is inputted and marked, all its possible word-forms are being retrieved and delivered.

Izvorni jezik
Engleski

Znanstvena područja
Informacijske i komunikacijske znanosti, Filologija



POVEZANOST RADA


Projekti:
0130418

Ustanove:
Filozofski fakultet, Zagreb

Profili:

Avatar Url Marko Tadić (autor)


Citiraj ovu publikaciju:

Tadić, Marko
The Croatian Lemmatization Server // Southern Journal of Linguistics, 29 (2005), 1/2; 206-217 (podatak o recenziji nije dostupan, članak, znanstveni)
Tadić, M. (2005) The Croatian Lemmatization Server. Southern Journal of Linguistics, 29 (1/2), 206-217.
@article{article, author = {Tadi\'{c}, Marko}, year = {2005}, pages = {206-217}, keywords = {lemmatization, morphological processing, computational linguistics, Croatian, web service}, journal = {Southern Journal of Linguistics}, volume = {29}, number = {1/2}, issn = {0730-6245}, title = {The Croatian Lemmatization Server}, keyword = {lemmatization, morphological processing, computational linguistics, Croatian, web service} }
@article{article, author = {Tadi\'{c}, Marko}, year = {2005}, pages = {206-217}, keywords = {lemmatization, morphological processing, computational linguistics, Croatian, web service}, journal = {Southern Journal of Linguistics}, volume = {29}, number = {1/2}, issn = {0730-6245}, title = {The Croatian Lemmatization Server}, keyword = {lemmatization, morphological processing, computational linguistics, Croatian, web service} }

Uključenost u ostale bibliografske baze podataka::


  • Linguistics Abstracts
  • LLBA: Linguistics and Language Behavior Abstracts
  • MLA - Modern Language Abstracts





Contrast
Increase Font
Decrease Font
Dyslexic Font