Pregled bibliografske jedinice broj: 126493
Preparation of POS tagging of Croatian using CLaRK System
Preparation of POS tagging of Croatian using CLaRK System // Proceeding of RANLP2003 Conference
Sofija: BAS, 2003. str. 455-459 (poster, međunarodna recenzija, cjeloviti rad (in extenso), znanstveni)
CROSBI ID: 126493 Za ispravke kontaktirajte CROSBI podršku putem web obrasca
Naslov
Preparation of POS tagging of Croatian using CLaRK System
Autori
Tadić, Marko ; Bekavac, Božo
Vrsta, podvrsta i kategorija rada
Radovi u zbornicima skupova, cjeloviti rad (in extenso), znanstveni
Izvornik
Proceeding of RANLP2003 Conference
/ - Sofija : BAS, 2003, 455-459
Skup
Recent Advances in Natural Language Processing 2003
Mjesto i datum
Borovec, Bugarska, 10.09.2003. - 12.09.2003
Vrsta sudjelovanja
Poster
Vrsta recenzije
Međunarodna recenzija
Ključne riječi
Croatian Language; Croatian Morphological Lexicon; POS tagging; homography; CLaRK system
Sažetak
This paper presents the first results of POS tagging of Croatian texts using generated word-form list from Croatian Morphological Lexicon. The corpus of 500.000 tokens was processed using CLaRK System developed in SFS Tübingen and LML Sofia. The first part of the paper describes the process of mapping word-forms with accompanied MSDs to tokens in the corpus. The phenomena of &#8220 ; internal&#8221 ; homography (several word-forms of the same lemma sharing the same form) and &#8220 ; external&#8221 ; homography (word-forms potentially belonging to different lemmas sharing the same form) are discussed. Also the statistics that represent measures of MSD and lemma ambiguity of Croatian nouns, verbs and adjectives is presented. The final part of the paper describes extraction and quantification of several POS patterns from the same corpus which are expected to represent characteristic patterns of multiword terminological units in Croatian.
Izvorni jezik
Engleski
Znanstvena područja
Filologija