Bootstrapping Bilingual Lexicons from Comparable Corpora for Closely Related Languages

Ljubešić, Nikola; Fišer, Darja

Pregled bibliografske jedinice broj: 552910

Bootstrapping Bilingual Lexicons from Comparable Corpora for Closely Related Languages

Ljubešić, Nikola; Fišer, Darja

Bootstrapping Bilingual Lexicons from Comparable Corpora for Closely Related Languages // Text, Speech and Dialogue / Habernal, Ivan ; Matoušek, Václav (ur.).
Berlin : Heidelberg: Springer, 2011. str. 91-98

CROSBI ID: 552910 Za ispravke kontaktirajte CROSBI podršku putem web obrasca

Naslov
Bootstrapping Bilingual Lexicons from Comparable Corpora for Closely Related Languages

Autori
Ljubešić, Nikola ; Fišer, Darja

Vrsta, podvrsta i kategorija rada
Poglavlja u knjigama, znanstveni

Knjiga
Text, Speech and Dialogue

Urednik/ci
Habernal, Ivan ; Matoušek, Václav

Izdavač
Springer

Grad
Berlin : Heidelberg

Godina
2011

Raspon stranica
91-98

ISBN
978-3-642-23537-5

Ključne riječi
comparable corpora, bilingual lexicon extraction, bootstrapping

Sažetak
In this paper we present an approach to bootstrap a Croatian- Slovene bilingual lexicon from comparable news corpora from scratch, without relying on any external bilingual knowledge resource. Instead of using a dictionary to translate context vectors, we build a seed lexicon from identical words in both languages and extend it with context-based cognates and translation candidates of the most frequent words. By enlarging the seed dictionary for only 7% we were able to improve the baseline precision from 0.597 to 0.731 on the mean reciprocal rank for the ten top-ranking translation candidates with a 50.4% recall on the gold standard of 500 entries.

Izvorni jezik
Engleski

Znanstvena područja
Informacijske i komunikacijske znanosti

POVEZANOST RADA

Projekti:
130-1301679-1380 - Hrvatska rječnička baština i hrvatski europski identitet (Boras, Damir, MZOS ) ( CroRIS)

Ustanove:
Filozofski fakultet, Zagreb

Profili:

Nikola Ljubešić (autor)

www.springerlink.com

CROSBI Hrvatska znanstvena bibliografija

Pregled bibliografske jedinice broj: 552910

Bootstrapping Bilingual Lexicons from Comparable Corpora for Closely Related Languages

Citiraj ovu publikaciju:

Pregled bibliografske jedinice broj: 552910

Bootstrapping Bilingual Lexicons from Comparable Corpora for Closely Related Languages

Citiraj ovu publikaciju:

Podijeli: