Pretražite po imenu i prezimenu autora, mentora, urednika, prevoditelja

Napredna pretraga

Pregled bibliografske jedinice broj: 427400

Lexicon-Based Morphological Normalisation and its Aplication to Croatian Language


Šnajder, Jan; Dalbelo Bašić, Bojana; Tadić, Marko
Lexicon-Based Morphological Normalisation and its Aplication to Croatian Language // Technologies for the Processing and Retrieval of Semi-Structured Documents: Experience from the CADIAL Project / Tadić, Marko ; Dalbelo Bašić, Bojana ; Moens, Marie-Francine (ur.).
Zagreb: Hrvatsko društvo za jezične tehnologije, 2009. str. 23-80


CROSBI ID: 427400 Za ispravke kontaktirajte CROSBI podršku putem web obrasca

Naslov
Lexicon-Based Morphological Normalisation and its Aplication to Croatian Language

Autori
Šnajder, Jan ; Dalbelo Bašić, Bojana ; Tadić, Marko

Vrsta, podvrsta i kategorija rada
Poglavlja u knjigama, znanstveni

Knjiga
Technologies for the Processing and Retrieval of Semi-Structured Documents: Experience from the CADIAL Project

Urednik/ci
Tadić, Marko ; Dalbelo Bašić, Bojana ; Moens, Marie-Francine

Izdavač
Hrvatsko društvo za jezične tehnologije

Grad
Zagreb

Godina
2009

Raspon stranica
23-80

ISBN
978-953-55375-1-9

Ključne riječi
Morphological normalisation, morphological lexicon, inflection, derivation, lexicon acquisition, Croatian language

Sažetak
Due to language morphology, words appear in text in various inflectional and derivational forms. This morphological variation has been shown to negatively affect the performance of most information retrieval and text mining systems. Morphological variation may be reduced by performing morphological normalisation, i.e., the conflation of morphological variants of a word into a single representative form. A lexicon-based approach to normalisation allows for high normalisation precision, which for morphologically complex languages may otherwise be difficult to achieve. In this paper we describe a two-stage lexicon-based approach to morphological normalisation that addresses both inflectional and derivational variation. To eliminate the immense effort required to compile a lexicon by hand, we devise a procedure for acquiring automatically an inflectional morphological lexicon from raw corpora. We also propose a convenient and highly expressive morphology representation formalism on which the acquisition procedure is based. We apply our approach to the morphologically complex Croatian language, but our approach should be equally applicable to other languages of similar morphological complexity. A detailed task-independent evaluation reveals that our approach yields good normalisation performance at both inflectional and derivational level.

Izvorni jezik
Engleski

Znanstvena područja
Računarstvo



POVEZANOST RADA


Projekti:
036-1300646-1986 - Otkrivanje znanja u tekstnim podacima (Dalbelo-Bašić, Bojana, MZO ) ( CroRIS)

Ustanove:
Fakultet elektrotehnike i računarstva, Zagreb

Profili:

Avatar Url Jan Šnajder (autor)

Avatar Url Bojana Dalbelo Bašić (autor)

Avatar Url Marko Tadić (autor)


Citiraj ovu publikaciju:

Šnajder, Jan; Dalbelo Bašić, Bojana; Tadić, Marko
Lexicon-Based Morphological Normalisation and its Aplication to Croatian Language // Technologies for the Processing and Retrieval of Semi-Structured Documents: Experience from the CADIAL Project / Tadić, Marko ; Dalbelo Bašić, Bojana ; Moens, Marie-Francine (ur.).
Zagreb: Hrvatsko društvo za jezične tehnologije, 2009. str. 23-80
Šnajder, J., Dalbelo Bašić, B. & Tadić, M. (2009) Lexicon-Based Morphological Normalisation and its Aplication to Croatian Language. U: Tadić, M., Dalbelo Bašić, B. & Moens, M. (ur.) Technologies for the Processing and Retrieval of Semi-Structured Documents: Experience from the CADIAL Project. Zagreb, Hrvatsko društvo za jezične tehnologije, str. 23-80.
@inbook{inbook, author = {\v{S}najder, Jan and Dalbelo Ba\v{s}i\'{c}, Bojana and Tadi\'{c}, Marko}, year = {2009}, pages = {23-80}, keywords = {Morphological normalisation, morphological lexicon, inflection, derivation, lexicon acquisition, Croatian language}, isbn = {978-953-55375-1-9}, title = {Lexicon-Based Morphological Normalisation and its Aplication to Croatian Language}, keyword = {Morphological normalisation, morphological lexicon, inflection, derivation, lexicon acquisition, Croatian language}, publisher = {Hrvatsko dru\v{s}tvo za jezi\v{c}ne tehnologije}, publisherplace = {Zagreb} }
@inbook{inbook, author = {\v{S}najder, Jan and Dalbelo Ba\v{s}i\'{c}, Bojana and Tadi\'{c}, Marko}, year = {2009}, pages = {23-80}, keywords = {Morphological normalisation, morphological lexicon, inflection, derivation, lexicon acquisition, Croatian language}, isbn = {978-953-55375-1-9}, title = {Lexicon-Based Morphological Normalisation and its Aplication to Croatian Language}, keyword = {Morphological normalisation, morphological lexicon, inflection, derivation, lexicon acquisition, Croatian language}, publisher = {Hrvatsko dru\v{s}tvo za jezi\v{c}ne tehnologije}, publisherplace = {Zagreb} }




Contrast
Increase Font
Decrease Font
Dyslexic Font