DerivBase.hr: A High-Coverage Derivational Morphology Resource for Croatian

Šnajder, Jan

Pregled bibliografske jedinice broj: 711761

DerivBase.hr: A High-Coverage Derivational Morphology Resource for Croatian

Šnajder, Jan

DerivBase.hr: A High-Coverage Derivational Morphology Resource for Croatian // Proceedings of the Ninth International Conference on Language Resources and Evaluation (LREC'14), Reykjavik. / Reykjavik (ur.).
Reykjavík, Island: European Language Resources Association (ELRA), 2014. str. 3371-3377 (poster, međunarodna recenzija, cjeloviti rad (in extenso), znanstveni)

CROSBI ID: 711761 Za ispravke kontaktirajte CROSBI podršku putem web obrasca

Naslov
DerivBase.hr: A High-Coverage Derivational Morphology Resource for Croatian

Autori
Šnajder, Jan

Vrsta, podvrsta i kategorija rada
Radovi u zbornicima skupova, cjeloviti rad (in extenso), znanstveni

Izvornik
Proceedings of the Ninth International Conference on Language Resources and Evaluation (LREC'14), Reykjavik. / Reykjavik - : European Language Resources Association (ELRA), 2014, 3371-3377

Skup
The Ninth International Conference on Language Resources and Evaluation (LREC'14)

Mjesto i datum
Reykjavík, Island, 26.05.2014. - 31.05.2014

Vrsta sudjelovanja
Poster

Vrsta recenzije
Međunarodna recenzija

Ključne riječi
derivational morphology; lexical resource; Croatian language

Sažetak
Knowledge about derivational morphology has been proven useful for a number of natural language processing (NLP) tasks. We describe the construction and evaluation of DerivBase.Hr, a large-coverage morphological resource for Croatian. DerivBase.Hr groups 100k lemmas from web corpus hrWaC into 56k clusters of derivationally related lemmas, so-called derivational families. We focus on suffixal derivation between and within nouns, verbs, and adjectives. We propose two approaches: an unsupervised approach and a knowledge-based approach based on a hand-crafted morphology model but without using any additional lexico-semantic resources. The resource acquisition procedure consists of three steps: corpus preprocessing, acquisition of an inflectional lexicon, and the induction of derivational families. We describe an evaluation methodology based on manually constructed derivational families from which we sample and annotate pairs of lemmas. We evaluate DerivBase.Hr on the so-obtained sample, and show that the knowledge-based version attains good clustering quality of 81.2% precision, 76.5% recall, and 78.8% F1-score. As with similar resources for other languages, we expect DerivBase.Hr to be useful for a number of NLP tasks.

Izvorni jezik
Engleski

Znanstvena područja
Računarstvo

POVEZANOST RADA

Ustanove:
Fakultet elektrotehnike i računarstva, Zagreb

Profili:

Jan Šnajder (autor)

www.lrec-conf.org

CROSBI Hrvatska znanstvena bibliografija

Pregled bibliografske jedinice broj: 711761

DerivBase.hr: A High-Coverage Derivational Morphology Resource for Croatian

Citiraj ovu publikaciju:

Pregled bibliografske jedinice broj: 711761

DerivBase.hr: A High-Coverage Derivational Morphology Resource for Croatian

Citiraj ovu publikaciju:

Podijeli: