Addressing polysemy in bilingual lexicon extraction from comparable corpora

Fišer, Darja; Ljubešić, Nikola; Kubelka, Ozren

Pregled bibliografske jedinice broj: 616769

Addressing polysemy in bilingual lexicon extraction from comparable corpora

Fišer, Darja; Ljubešić, Nikola; Kubelka, Ozren

Addressing polysemy in bilingual lexicon extraction from comparable corpora // Proceedings of the Eight International Conference on Language Resources and Evaluation (LREC'12) / Nicoletta Calzolari et al. (ur.).
Istanbul, 2012. (predavanje, međunarodna recenzija, cjeloviti rad (in extenso), znanstveni)

CROSBI ID: 616769 Za ispravke kontaktirajte CROSBI podršku putem web obrasca

Naslov
Addressing polysemy in bilingual lexicon extraction from comparable corpora

Autori
Fišer, Darja ; Ljubešić, Nikola ; Kubelka, Ozren

Vrsta, podvrsta i kategorija rada
Radovi u zbornicima skupova, cjeloviti rad (in extenso), znanstveni

Izvornik
Proceedings of the Eight International Conference on Language Resources and Evaluation (LREC'12) / Nicoletta Calzolari et al. - Istanbul, 2012

Skup
Eight International Conference on Language Resources and Evaluation (LREC'12)

Mjesto i datum
Istanbul, Turska, 21.05.2012. - 27.05.2012

Vrsta sudjelovanja
Predavanje

Vrsta recenzije
Međunarodna recenzija

Ključne riječi
bilingual lexicon extraction; comparable corpora; polysemy

Sažetak
This paper presents an approach to extract translation equivalents from comparable corpora for polysemous nouns. As opposed to the standard approaches that build a single context vector for all occurrences of a given headword, we first disambiguate the headword with third- party sense taggers and then build a separate context vector for each sense of the headword. Since state-of- the- art word sense disambiguation tools are still far from perfect, we also tried to improve the results by combining the sense assignments provided by two different sense taggers. Evaluation of the results shows that we outperform the baseline (0.473) in all the settings we experimented with, even when using only one sense tagger, and that the best-performing results are indeed obtained by taking into account the intersection of both sense taggers (0.720).

Izvorni jezik
Engleski

Znanstvena područja
Informacijske i komunikacijske znanosti

POVEZANOST RADA

Projekti:
FP7-248347
130-1301679-1380 - Hrvatska rječnička baština i hrvatski europski identitet (Boras, Damir, MZOS ) ( CroRIS)

Ustanove:
Filozofski fakultet, Zagreb

Profili:

Ozren Kubelka (autor)

Nikola Ljubešić (autor)

www.lrec-conf.org

CROSBI Hrvatska znanstvena bibliografija

Pregled bibliografske jedinice broj: 616769

Addressing polysemy in bilingual lexicon extraction from comparable corpora

Citiraj ovu publikaciju:

Pregled bibliografske jedinice broj: 616769

Addressing polysemy in bilingual lexicon extraction from comparable corpora

Citiraj ovu publikaciju:

Podijeli: