Pregled bibliografske jedinice broj: 598891
Distributional Semantics Approach to Detecting Synonyms in Croatian Language
Distributional Semantics Approach to Detecting Synonyms in Croatian Language // Proceedings of the Eighth Language Technologies Conference / Erjavec, Tomaž ; Žganec Gros, Jerneja (ur.).
Ljubljana, 2012. str. 111-116 (predavanje, međunarodna recenzija, cjeloviti rad (in extenso), znanstveni)
CROSBI ID: 598891 Za ispravke kontaktirajte CROSBI podršku putem web obrasca
Naslov
Distributional Semantics Approach to Detecting Synonyms in Croatian Language
Autori
Karan, Mladen ; Šnajder, Jan ; Dalbelo Bašić, Bojana
Vrsta, podvrsta i kategorija rada
Radovi u zbornicima skupova, cjeloviti rad (in extenso), znanstveni
Izvornik
Proceedings of the Eighth Language Technologies Conference
/ Erjavec, Tomaž ; Žganec Gros, Jerneja - Ljubljana, 2012, 111-116
Skup
Information Society 2012 - Eighth Language Technologies Conference
Mjesto i datum
Ljubljana, Slovenija, 08.10.2012. - 09.10.2012
Vrsta sudjelovanja
Predavanje
Vrsta recenzije
Međunarodna recenzija
Ključne riječi
Named Entities ; Extraction ; Classification
Sažetak
Identifying synonyms is important for many natural language processing and information retrieval applications. In this paper we address the task of automatically identifying synonyms in Croatian language using distributional semantic models (DSM). We build several DSMs using latent semantic analysis (LSA) and random indexing (RI) on the large hrWaC corpus. We evaluate the models on a dictionarybased similarity test – a set of synonymy questions generated automatically from a machine readable dictionary. Results indicate that LSA models outperform RI models on this task, with accuracy of 68.7%, 68.2%, and 61.6% on nouns, adjectives, and verbs, respectively. We analyze how word frequency and polysemy level affect the performance and discuss common causes of synonym misidentification.
Izvorni jezik
Engleski
Znanstvena područja
Računarstvo
POVEZANOST RADA
Projekti:
036-1300646-1986 - Otkrivanje znanja u tekstnim podacima (Dalbelo-Bašić, Bojana, MZO ) ( CroRIS)
Ustanove:
Fakultet elektrotehnike i računarstva, Zagreb