Pregled bibliografske jedinice broj: 603927
Automatic Enrichment of Croatian Morphological Lexicon Using Large Corpora and Web Search
Automatic Enrichment of Croatian Morphological Lexicon Using Large Corpora and Web Search // Proceedings of FASSBL 2012
Dubrovnik, Hrvatska, 2012. (predavanje, međunarodna recenzija, pp prezentacija, znanstveni)
CROSBI ID: 603927 Za ispravke kontaktirajte CROSBI podršku putem web obrasca
Naslov
Automatic Enrichment of Croatian Morphological Lexicon Using Large Corpora and Web Search
Autori
Merkler, Danijela ; Agić, Željko ; Tadić, Marko
Vrsta, podvrsta i kategorija rada
Sažeci sa skupova, pp prezentacija, znanstveni
Izvornik
Proceedings of FASSBL 2012
/ - , 2012
Skup
The 8th International Conference on Formal Approaches to South Slavic and Balkan Languages (FASSBL 2012)
Mjesto i datum
Dubrovnik, Hrvatska, 19.09.2012. - 21.09.2012
Vrsta sudjelovanja
Predavanje
Vrsta recenzije
Međunarodna recenzija
Ključne riječi
automatic enrichment; morphological lexicon; large corpora
Sažetak
Inflectional (or morphological) lexica are considered to be language resources of high importance and frequent usage in many language processing tasks -- from basic problems such as lemmatization and morphosyntactic tagging of written text to applications in machine learning, information extraction, information retrieval and machine translation -- for highly inflectional languages such as Croatian. Being that Croatian Morphological Lexicon (HML) is frequently used both as a stand-alone application and as a module in many other systems for processing Croatian, unknown wordforms -- those undetected when matching unseen text with the current version of the HML database -- are constantly being logged and the lexicon is being updated to newer versions by inserting these new wordforms in batches. Accordingly, in this paper, we propose a generic approach to (semi-)automatic generation of new candidate lemmas for HML, their verification, assignment of inflectional patterns and finally creation and insertion of new lexicon entries to HML in a single processing pipeline.
Izvorni jezik
Engleski
Znanstvena područja
Informacijske i komunikacijske znanosti, Filologija
POVEZANOST RADA
Projekti:
130-1300646-0645 - Hrvatski jezični resursi i njihovo obilježavanje (Tadić, Marko, MZOS ) ( CroRIS)
130-1300646-1776 - Računalna sintaksa hrvatskoga jezika (Dovedan Han, Zdravko, MZOS ) ( CroRIS)
Ustanove:
Filozofski fakultet, Zagreb
Profili:
Željko Agić
(autor)