Pregled bibliografske jedinice broj: 616769
Addressing polysemy in bilingual lexicon extraction from comparable corpora
Addressing polysemy in bilingual lexicon extraction from comparable corpora // Proceedings of the Eight International Conference on Language Resources and Evaluation (LREC'12) / Nicoletta Calzolari et al. (ur.).
Istanbul, 2012. (predavanje, međunarodna recenzija, cjeloviti rad (in extenso), znanstveni)
CROSBI ID: 616769 Za ispravke kontaktirajte CROSBI podršku putem web obrasca
Naslov
Addressing polysemy in bilingual lexicon extraction from comparable corpora
Autori
Fišer, Darja ; Ljubešić, Nikola ; Kubelka, Ozren
Vrsta, podvrsta i kategorija rada
Radovi u zbornicima skupova, cjeloviti rad (in extenso), znanstveni
Izvornik
Proceedings of the Eight International Conference on Language Resources and Evaluation (LREC'12)
/ Nicoletta Calzolari et al. - Istanbul, 2012
Skup
Eight International Conference on Language Resources and Evaluation (LREC'12)
Mjesto i datum
Istanbul, Turska, 21.05.2012. - 27.05.2012
Vrsta sudjelovanja
Predavanje
Vrsta recenzije
Međunarodna recenzija
Ključne riječi
bilingual lexicon extraction; comparable corpora; polysemy
Sažetak
This paper presents an approach to extract translation equivalents from comparable corpora for polysemous nouns. As opposed to the standard approaches that build a single context vector for all occurrences of a given headword, we first disambiguate the headword with third- party sense taggers and then build a separate context vector for each sense of the headword. Since state-of- the- art word sense disambiguation tools are still far from perfect, we also tried to improve the results by combining the sense assignments provided by two different sense taggers. Evaluation of the results shows that we outperform the baseline (0.473) in all the settings we experimented with, even when using only one sense tagger, and that the best-performing results are indeed obtained by taking into account the intersection of both sense taggers (0.720).
Izvorni jezik
Engleski
Znanstvena područja
Informacijske i komunikacijske znanosti
POVEZANOST RADA
Projekti:
FP7-248347
130-1301679-1380 - Hrvatska rječnička baština i hrvatski europski identitet (Boras, Damir, MZOS ) ( CroRIS)
Ustanove:
Filozofski fakultet, Zagreb