Pregled bibliografske jedinice broj: 780159
Enlarging the Croatian Wordnet with WN-Toolkit and CroDeriV
Enlarging the Croatian Wordnet with WN-Toolkit and CroDeriV // Proceedings of the International Conference Recent Advances in Natural Language Processing / Angelova, Galia ; Bontcheva, Kalina ; Mitkov, Ruslan (ur.).
Hisarya: BAS, 2015. str. 480-487 (predavanje, međunarodna recenzija, cjeloviti rad (in extenso), znanstveni)
CROSBI ID: 780159 Za ispravke kontaktirajte CROSBI podršku putem web obrasca
Naslov
Enlarging the Croatian Wordnet with WN-Toolkit and CroDeriV
Autori
Oliver, Antoni ; Šojat, Krešimir ; Srebačić, Matea
Vrsta, podvrsta i kategorija rada
Radovi u zbornicima skupova, cjeloviti rad (in extenso), znanstveni
Izvornik
Proceedings of the International Conference Recent Advances in Natural Language Processing
/ Angelova, Galia ; Bontcheva, Kalina ; Mitkov, Ruslan - Hisarya : BAS, 2015, 480-487
Skup
RANLP 2015
Mjesto i datum
Hisar, Bugarska, 07.09.2015. - 09.09.2015
Vrsta sudjelovanja
Predavanje
Vrsta recenzije
Međunarodna recenzija
Ključne riječi
Croatian Wordnet ; WN-Toolkit ; CroDeriV
Sažetak
Wordnet is a standard semantic resource for several NLP tasks available for numerous languages. The Croatian Wordnet (CroWN) was a relatively small resource with 10, 026 synsets and 31, 367 synset-variant pairs covering only 45.91% of the so-called Core WordNet. Comparing these figures with the size of the Princeton WordNet 3.0 containing 117, 659 synsets and 206, 975 synset- variant pairs, it is clear that CroWN should be expanded. First experiments for its expansion were performed using the WN- Toolkit, a set of Python programs for wordnet creation and expansion using dictionary, Babelnet and parallel-corpora based strategies. The WN-Toolkit was previously successfully applied to other languages as Spanish, Catalan and Galician. After this first step, CroWN reached 70.63% of the Core WordNet. In the second step we used CroDeriv, a derivational database for Croatian. In the final step we manually created 1, 457 synset-variant pairs and reached 100% of the Core WordNet. After the whole procedure, CroWN contains 23, 137 synsets and 47, 931 synset-lemma pairs.
Izvorni jezik
Engleski
Znanstvena područja
Filologija
POVEZANOST RADA
Ustanove:
Filozofski fakultet, Zagreb
Citiraj ovu publikaciju:
Časopis indeksira:
- Scopus