Pretražite po imenu i prezimenu autora, mentora, urednika, prevoditelja

Napredna pretraga

Pregled bibliografske jedinice broj: 727739

Automatic Enrichment of Croatian Morphological Lexicon Using Large Corpora and Web Search


Merkler, Danijela; Agić, Željko; Tadić, Marko
Automatic Enrichment of Croatian Morphological Lexicon Using Large Corpora and Web Search // Proceedings of the 6th International Conference on Corpus Linguistics
Las Palmas: AELINCO, 2014. str. 42-42 (predavanje, međunarodna recenzija, sažetak, znanstveni)


CROSBI ID: 727739 Za ispravke kontaktirajte CROSBI podršku putem web obrasca

Naslov
Automatic Enrichment of Croatian Morphological Lexicon Using Large Corpora and Web Search

Autori
Merkler, Danijela ; Agić, Željko ; Tadić, Marko

Vrsta, podvrsta i kategorija rada
Sažeci sa skupova, sažetak, znanstveni

Izvornik
Proceedings of the 6th International Conference on Corpus Linguistics / - Las Palmas : AELINCO, 2014, 42-42

Skup
6th International Conference on Corpus Linguistics (CILC 2014)

Mjesto i datum
Las Palmas de Gran Canaria, Španjolska, 22.05.2014. - 24.05.2014

Vrsta sudjelovanja
Predavanje

Vrsta recenzije
Međunarodna recenzija

Ključne riječi
morphological lexicon; automatic enlargement; Croatian language

Sažetak
The first version of the Croatian Morphological Lexicon (HML) was developed as early as 1994 and was utilized in the implementation of various experiments and systems dealing with Croatian. Since the HML is frequently used both as a stand-alone application and as a module in many other systems for processing Croatian, the lexicon is constantly being updated to newer versions by manual inserting unknown wordforms (i.e. the corresponding 3- tuples of lemmas, wordforms and morphosyntactic tags) in batches. Current version of HML cosists of 110.000 lemmas and more than 4.000.000 lexicon entries. Due to limitations in availability of expert human annotators and various other constraints, the process of manual inspection, lemma assingment and inflectional pattern selection for unknown wordforms is a rather slow procedure. Accordingly, in this paper, we propose a generic approach to (semi-)automatic generation of new candidate lemmas for HML, their verification, assingment of inflectional patterns and finally creation and insertion of new lexicon entries to HML in a single processing pipeline.

Izvorni jezik
Engleski

Znanstvena područja
Informacijske i komunikacijske znanosti



POVEZANOST RADA


Projekti:
130-1300646-1776 - Računalna sintaksa hrvatskoga jezika (Dovedan Han, Zdravko, MZOS ) ( CroRIS)

Ustanove:
Filozofski fakultet, Zagreb

Profili:

Avatar Url Željko Agić (autor)

Citiraj ovu publikaciju:

Merkler, Danijela; Agić, Željko; Tadić, Marko
Automatic Enrichment of Croatian Morphological Lexicon Using Large Corpora and Web Search // Proceedings of the 6th International Conference on Corpus Linguistics
Las Palmas: AELINCO, 2014. str. 42-42 (predavanje, međunarodna recenzija, sažetak, znanstveni)
Merkler, D., Agić, Ž. & Tadić, M. (2014) Automatic Enrichment of Croatian Morphological Lexicon Using Large Corpora and Web Search. U: Proceedings of the 6th International Conference on Corpus Linguistics.
@article{article, author = {Merkler, Danijela and Agi\'{c}, \v{Z}eljko and Tadi\'{c}, Marko}, year = {2014}, pages = {42-42}, keywords = {morphological lexicon, automatic enlargement, Croatian language}, title = {Automatic Enrichment of Croatian Morphological Lexicon Using Large Corpora and Web Search}, keyword = {morphological lexicon, automatic enlargement, Croatian language}, publisher = {AELINCO}, publisherplace = {Las Palmas de Gran Canaria, \v{S}panjolska} }
@article{article, author = {Merkler, Danijela and Agi\'{c}, \v{Z}eljko and Tadi\'{c}, Marko}, year = {2014}, pages = {42-42}, keywords = {morphological lexicon, automatic enlargement, Croatian language}, title = {Automatic Enrichment of Croatian Morphological Lexicon Using Large Corpora and Web Search}, keyword = {morphological lexicon, automatic enlargement, Croatian language}, publisher = {AELINCO}, publisherplace = {Las Palmas de Gran Canaria, \v{S}panjolska} }




Contrast
Increase Font
Decrease Font
Dyslexic Font