Pregled bibliografske jedinice broj: 552910
Bootstrapping Bilingual Lexicons from Comparable Corpora for Closely Related Languages
Bootstrapping Bilingual Lexicons from Comparable Corpora for Closely Related Languages // Text, Speech and Dialogue / Habernal, Ivan ; Matoušek, Václav (ur.).
Berlin : Heidelberg: Springer, 2011. str. 91-98
CROSBI ID: 552910 Za ispravke kontaktirajte CROSBI podršku putem web obrasca
Naslov
Bootstrapping Bilingual Lexicons from Comparable Corpora for Closely Related Languages
Autori
Ljubešić, Nikola ; Fišer, Darja
Vrsta, podvrsta i kategorija rada
Poglavlja u knjigama, znanstveni
Knjiga
Text, Speech and Dialogue
Urednik/ci
Habernal, Ivan ; Matoušek, Václav
Izdavač
Springer
Grad
Berlin : Heidelberg
Godina
2011
Raspon stranica
91-98
ISBN
978-3-642-23537-5
Ključne riječi
comparable corpora, bilingual lexicon extraction, bootstrapping
Sažetak
In this paper we present an approach to bootstrap a Croatian- Slovene bilingual lexicon from comparable news corpora from scratch, without relying on any external bilingual knowledge resource. Instead of using a dictionary to translate context vectors, we build a seed lexicon from identical words in both languages and extend it with context-based cognates and translation candidates of the most frequent words. By enlarging the seed dictionary for only 7% we were able to improve the baseline precision from 0.597 to 0.731 on the mean reciprocal rank for the ten top-ranking translation candidates with a 50.4% recall on the gold standard of 500 entries.
Izvorni jezik
Engleski
Znanstvena područja
Informacijske i komunikacijske znanosti
POVEZANOST RADA
Projekti:
130-1301679-1380 - Hrvatska rječnička baština i hrvatski europski identitet (Boras, Damir, MZOS ) ( CroRIS)
Ustanove:
Filozofski fakultet, Zagreb
Profili:
Nikola Ljubešić
(autor)