Napredna pretraga

Pregled bibliografske jedinice broj: 427399

String Distance-Based Stemming of the Highly Inflected Croatian Language


Šnajder, Jan; Dalbelo Bašić, Bojana
String Distance-Based Stemming of the Highly Inflected Croatian Language // Proceedings of Recent Advances in Natural Language Processing (RANLP-2009) / Angelova, Galia ; Bontcheva, Kalina ; Mitkov, Ruslan ; Nicolov, Nicolas ; Nikolov, Nikolai (ur.).
Shoumen: Incoma, 2009. str. 411-415 (poster, međunarodna recenzija, cjeloviti rad (in extenso), znanstveni)


Naslov
String Distance-Based Stemming of the Highly Inflected Croatian Language

Autori
Šnajder, Jan ; Dalbelo Bašić, Bojana

Vrsta, podvrsta i kategorija rada
Radovi u zbornicima skupova, cjeloviti rad (in extenso), znanstveni

Izvornik
Proceedings of Recent Advances in Natural Language Processing (RANLP-2009) / Angelova, Galia ; Bontcheva, Kalina ; Mitkov, Ruslan ; Nicolov, Nicolas ; Nikolov, Nikolai - Shoumen : Incoma, 2009, 411-415

Skup
International Conference Recent Advances in Natural Language Processing'2009 (RANLP-2009)

Mjesto i datum
Boroverts, Bugarska, 14-16.09.2009.

Vrsta sudjelovanja
Poster

Vrsta recenzije
Međunarodna recenzija

Ključne riječi
Stemming; morphology; string distance; Croatian language

Sažetak
Stemming refers to the grouping of morphologically related words into so-called stem classes for the purpose of improving information retrieval performance. Traditional approaches to stemming are language-specific and require a substantial amount of linguistic knowledge. A viable alternative is string distance-based stemming, in which stem classes are obtained by clustering word-forms from a corpus. In this paper, we apply string distance-based stemming to the highly inflected Croatian language using a number of string distance measures proposed in the literature. We focus on evaluating the stemming performance at both inflectional and derivational level, and investigate how this performance relates to the choice of the distance threshold value. Although our focus is on the Croatian language, we believe our results transfer well to languages of similar morphological complexity.

Izvorni jezik
Engleski

Znanstvena područja
Računarstvo



POVEZANOST RADA


Projekt / tema
036-1300646-1986 - Otkrivanje znanja u tekstnim podacima (Bojana Dalbelo-Bašić, )

Ustanove
Fakultet elektrotehnike i računarstva, Zagreb