Nalazite se na CroRIS probnoj okolini. Ovdje evidentirani podaci neće biti pohranjeni u Informacijskom sustavu znanosti RH. Ako je ovo greška, CroRIS produkcijskoj okolini moguće je pristupi putem poveznice www.croris.hr
izvor podataka: crosbi !

Disambiguating vectors for bilingual lexicon extraction from comparable corpora (CROSBI ID 594345)

Prilog sa skupa u zborniku | izvorni znanstveni rad | međunarodna recenzija

Apidianaki, Marianna ; Ljubešić, Nikola ; Fišer, Darja Disambiguating vectors for bilingual lexicon extraction from comparable corpora // Proceedings of the Eighth LANGUAGE TECHNOLOGIES Conference / Erjavec, Tomaž ; Žganec Gros, Jerneja (ur.). Ljubljana, 2012. str. 10-15

Podaci o odgovornosti

Apidianaki, Marianna ; Ljubešić, Nikola ; Fišer, Darja

engleski

Disambiguating vectors for bilingual lexicon extraction from comparable corpora

This paper presents an approach to enhance the extraction of translation equivalents from comparable corpora by plugging in bilingual lexico-semantic knowledge harvested from a parallel corpus. First, the bilingual lexicon obtained from word-aligning the parallel corpus replaces an external seed dictionary, making the approach knowledge-light and portable. Next, instead of using simple 1:1 mappings between the source and the target language, translation equivalents are clustered into sets of synonyms based on contextual similarities, enabling us to expand the translation of vector features with several translation variants. And last but not least, the vector features are disambiguated and translated only with the translation variants from the most appropriate cluster, thus producing less noisy vectors that allow for a more successful cross- lingual comparison of the vectors compared to simpler methods.

bilingual lexicon extraction; cross-lingual sense clustering; feature disambiguation

nije evidentirano

nije evidentirano

nije evidentirano

nije evidentirano

nije evidentirano

nije evidentirano

Podaci o prilogu

10-15.

2012.

objavljeno

Podaci o matičnoj publikaciji

Proceedings of the Eighth LANGUAGE TECHNOLOGIES Conference

Erjavec, Tomaž ; Žganec Gros, Jerneja

Ljubljana:

Podaci o skupu

Eighth Language Technologies Conference

predavanje

08.10.2012-09.10.2012

Ljubljana, Slovenija

Povezanost rada

Informacijske i komunikacijske znanosti