Pregled bibliografske jedinice broj: 280673
Croatian Lemmatization Server
Croatian Lemmatization Server // Formal Approaches to south Slavic and Balkan Languages / Vulchanova, Mila Dimitrova ; Koeva, Svetla ; Krapova, Iliyana ; Vulchanov, Valentin (ur.).
Sofija: Bugarska akademija znanosti, 2006. str. 140-146 (predavanje, međunarodna recenzija, cjeloviti rad (in extenso), znanstveni)
CROSBI ID: 280673 Za ispravke kontaktirajte CROSBI podršku putem web obrasca
Naslov
Croatian Lemmatization Server
Autori
Tadić, Marko
Vrsta, podvrsta i kategorija rada
Radovi u zbornicima skupova, cjeloviti rad (in extenso), znanstveni
Izvornik
Formal Approaches to south Slavic and Balkan Languages
/ Vulchanova, Mila Dimitrova ; Koeva, Svetla ; Krapova, Iliyana ; Vulchanov, Valentin - Sofija : Bugarska akademija znanosti, 2006, 140-146
Skup
Fifth International Conference Formal Approaches to South Slavic and Balkan languages (FASSBL)
Mjesto i datum
Sofija, Bugarska, 18.10.2006. - 20.10.2006
Vrsta sudjelovanja
Predavanje
Vrsta recenzije
Međunarodna recenzija
Ključne riječi
lemmatization; POS tagging; MSD tagging; Croatian; web-service
Sažetak
The need for lemmatization in inflectionally rich languages is indisputable: it is applicable for the whole range of procedures — from textsearch, up to parsing. From two predominant approaches to lemmatization: 1) algorithmic (generally rule-based and realized with FSA) and 2) relational (generally data-driven and realized with databases), this paper opted for the latter. The reason is that formal-grammar approaches to Croatian morphology are rare and limited just to a part of morphological system. The other reason is that the generator for Croatian has already been developed (Tadić 1994) as well as Croatian Morphological Lexicon (CML) (Tadić & Fulgosi 2003). The idea was to offer an on-line lemmatization, POS/MSD service using CML v 4.5 as the back-end. The Croatian Lemmatization Server (CLS) is available at http://hml.hnk.ffzg.hr and it offers lemmatization and POS/MSD tagging at unigram level for now. For each token in submitted text, the server delivers all possible lemmas of which this token may be a word-form. For homographic tokens, each lemma is accompanied with all possible POS/MSD tags which are compliant to MulTextEast v3 specifications for Croatian . The CLS can also be used for generation: when lemma is inputted and marked, all its possible word-forms are being retrieved and delivered.
Izvorni jezik
Engleski
Znanstvena područja
Filologija
POVEZANOST RADA