Napredna pretraga

Pregled bibliografske jedinice broj: 616769

Addressing polysemy in bilingual lexicon extraction from comparable corpora


Fišer, Darja; Ljubešić, Nikola; Kubelka, Ozren
Addressing polysemy in bilingual lexicon extraction from comparable corpora // Proceedings of the Eight International Conference on Language Resources and Evaluation (LREC'12) / Nicoletta Calzolari et al. (ur.).
Istanbul, 2012. (predavanje, međunarodna recenzija, cjeloviti rad (in extenso), znanstveni)


Naslov
Addressing polysemy in bilingual lexicon extraction from comparable corpora

Autori
Fišer, Darja ; Ljubešić, Nikola ; Kubelka, Ozren

Vrsta, podvrsta i kategorija rada
Radovi u zbornicima skupova, cjeloviti rad (in extenso), znanstveni

Izvornik
Proceedings of the Eight International Conference on Language Resources and Evaluation (LREC'12) / Nicoletta Calzolari et al. - Istanbul, 2012

Skup
Eight International Conference on Language Resources and Evaluation (LREC'12)

Mjesto i datum
Istanbul, Turska, 21-27.05.2012.

Vrsta sudjelovanja
Predavanje

Vrsta recenzije
Međunarodna recenzija

Ključne riječi
Bilingual lexicon extraction; comparable corpora; polysemy

Sažetak
This paper presents an approach to extract translation equivalents from comparable corpora for polysemous nouns. As opposed to the standard approaches that build a single context vector for all occurrences of a given headword, we first disambiguate the headword with third- party sense taggers and then build a separate context vector for each sense of the headword. Since state-of- the- art word sense disambiguation tools are still far from perfect, we also tried to improve the results by combining the sense assignments provided by two different sense taggers. Evaluation of the results shows that we outperform the baseline (0.473) in all the settings we experimented with, even when using only one sense tagger, and that the best-performing results are indeed obtained by taking into account the intersection of both sense taggers (0.720).

Izvorni jezik
Engleski

Znanstvena područja
Informacijske i komunikacijske znanosti



POVEZANOST RADA


Projekt / tema
130-1301679-1380 - Hrvatska rječnička baština i hrvatski europski identitet (Damir Boras, )
FP7-248347

Ustanove
Filozofski fakultet, Zagreb