Napredna pretraga

Pregled bibliografske jedinice broj: 552910

Bootstrapping Bilingual Lexicons from Comparable Corpora for Closely Related Languages


Ljubešić, Nikola; Fišer, Darja
Bootstrapping Bilingual Lexicons from Comparable Corpora for Closely Related Languages // Text, Speech and Dialogue / Habernal, Ivan ; Matoušek, Václav (ur.).
Berlin / Heidelberg: Springer, 2011. str. 91-98


Naslov
Bootstrapping Bilingual Lexicons from Comparable Corpora for Closely Related Languages

Autori
Ljubešić, Nikola ; Fišer, Darja

Vrsta, podvrsta i kategorija rada
Poglavlja u knjigama, znanstveni

Knjiga
Text, Speech and Dialogue

Urednik/ci
Habernal, Ivan ; Matoušek, Václav

Izdavač
Springer

Grad
Berlin / Heidelberg

Godina
2011

Raspon stranica
91-98

ISBN
978-3-642-23537-5

Ključne riječi
Comparable corpora, bilingual lexicon extraction, bootstrapping

Sažetak
In this paper we present an approach to bootstrap a Croatian- Slovene bilingual lexicon from comparable news corpora from scratch, without relying on any external bilingual knowledge resource. Instead of using a dictionary to translate context vectors, we build a seed lexicon from identical words in both languages and extend it with context-based cognates and translation candidates of the most frequent words. By enlarging the seed dictionary for only 7% we were able to improve the baseline precision from 0.597 to 0.731 on the mean reciprocal rank for the ten top-ranking translation candidates with a 50.4% recall on the gold standard of 500 entries.

Izvorni jezik
Engleski

Znanstvena područja
Informacijske i komunikacijske znanosti



POVEZANOST RADA


Projekt / tema
130-1301679-1380 - Hrvatska rječnička baština i hrvatski europski identitet (Damir Boras, )

Ustanove
Filozofski fakultet, Zagreb

Autor s matičnim brojem:
Nikola Ljubešić, (272820)