Vector Disambiguation for Translation Extraction from Comparable Corpora

Apidianaki, Marianna; Ljubešić, Nikola; Fišer, Darja

izvor podataka: crosbi !

Vector Disambiguation for Translation Extraction from Comparable Corpora (CROSBI ID 223708)

Prilog u časopisu | izvorni znanstveni rad | međunarodna recenzija

Apidianaki, Marianna ; Ljubešić, Nikola ; Fišer, Darja Vector Disambiguation for Translation Extraction from Comparable Corpora // Informatica (Ljubljana), 37 (2013), 2; 193-201

Podaci o odgovornosti

Autori

Apidianaki, Marianna ; Ljubešić, Nikola ; Fišer, Darja

Osnovni podaci na izvornom jeziku
Osnovni podaci na ostalim jezicima

Jezik

engleski

Naslov

Vector Disambiguation for Translation Extraction from Comparable Corpora

Sažetak

We present a new data-driven approach for enhancing the extraction of translation equivalents from comparable corpora which exploits bilingual lexico-semantic knowledge harvested from a parallel corpus. First, the bilingual lexicon obtained from word-aligning the parallel corpus replaces an external seed dictionary, making the approach knowledge-light and portable. Next, instead of using simple one-to-one mappings between the source and the target language, translation equivalents are clustered into sets of synonyms by a cross-lingual Word Sense Induction method. The obtained sense clusters enable us to expand the translation of vector features with several translation variants using a cross-lingual Word Sense Disambiguation method. Consequently, the vector features are disambiguated and translated with the translation variants included in the semantically most appropriate cluster, thus producing less noisy and richer vectors that allow for a more successful cross-lingual vector comparison than in previous methods.

Ključne riječi

word sense disambiguation; sense clustering; comparable corpora

Napomena

nije evidentirano

Jezik

nije evidentirano

Naslov

nije evidentirano

Sažetak

nije evidentirano

Ključne riječi

nije evidentirano

Napomena

nije evidentirano

Podaci o izdanju

Časopis

Informatica (Ljubljana)

Volumen (broj)

37 (2)

Godina

2013.

Stranice rada

193-201

Status objave rada

objavljeno

ISSN

0350-5596

Povezanost rada

Povezane osobe

Nikola Ljubešić (autor/i)

Povezane ustanove

Filozofski fakultet u Zagrebu (130) (autorova ustanova)

Područje

Informacijske i komunikacijske znanosti

Poveznice

informatica.si

Indeksiranost

Scopus

Web of Science Core Collection, Emerging Sources Citation Index (WoSCC-ESCI)