Pretražite po imenu i prezimenu autora, mentora, urednika, prevoditelja

Napredna pretraga

Pregled bibliografske jedinice broj: 552852

Bilingual Lexicon Extraction from Comparable Corpora for Closely Related Languages


Fišer, Darja; Ljubešić, Nikola
Bilingual Lexicon Extraction from Comparable Corpora for Closely Related Languages // Proceedings of the International Conference Recent Advances in Natural Language Processing 2011
Hisarya: RANLP 2011 Organising Committee, 2011. str. 125-131 (predavanje, međunarodna recenzija, cjeloviti rad (in extenso), znanstveni)


CROSBI ID: 552852 Za ispravke kontaktirajte CROSBI podršku putem web obrasca

Naslov
Bilingual Lexicon Extraction from Comparable Corpora for Closely Related Languages

Autori
Fišer, Darja ; Ljubešić, Nikola

Vrsta, podvrsta i kategorija rada
Radovi u zbornicima skupova, cjeloviti rad (in extenso), znanstveni

Izvornik
Proceedings of the International Conference Recent Advances in Natural Language Processing 2011 / - Hisarya : RANLP 2011 Organising Committee, 2011, 125-131

Skup
Recent Advances in Natural Language Processing 2011

Mjesto i datum
Hisar, Bugarska, 12.09.2011. - 14.09.2011

Vrsta sudjelovanja
Predavanje

Vrsta recenzije
Međunarodna recenzija

Ključne riječi
comparable corpora; lexicon extraction; closely related languages

Sažetak
In this paper we present a knowledge-light approach to extract a bilingual lexicon for closely related languages from comparable corpora. While in most related work an existing dictionary is used to translate context vectors, we take advantage of the similarities between languages instead and build a seed lexicon from words that are identical in both languages and then further extend it with context-based cognates and translations of the most frequent words. We also use cognates for reranking translation candidates obtained via context similarity and extract translation equivalents for all content words, not just nouns as in most related work. The results are very encouraging, suggesting that other similar languages could bene- fit from the same approach. By enlarging the seed lexicon with cognates and translations of the most frequent words and by cognate-based reranking of translation candidates we were able to improve the average baseline precision from 0.592 to 0.797 on the mean reciprocal rank for the ten top- ranking translation candidates for nouns, verbs and adjectives with a 46% recall on the gold standard of 1000 random entries from a traditional dictionary.

Izvorni jezik
Engleski

Znanstvena područja
Informacijske i komunikacijske znanosti



POVEZANOST RADA


Projekti:
130-1301679-1380 - Hrvatska rječnička baština i hrvatski europski identitet (Boras, Damir, MZOS ) ( CroRIS)

Ustanove:
Filozofski fakultet, Zagreb

Profili:

Avatar Url Nikola Ljubešić (autor)

Citiraj ovu publikaciju:

Fišer, Darja; Ljubešić, Nikola
Bilingual Lexicon Extraction from Comparable Corpora for Closely Related Languages // Proceedings of the International Conference Recent Advances in Natural Language Processing 2011
Hisarya: RANLP 2011 Organising Committee, 2011. str. 125-131 (predavanje, međunarodna recenzija, cjeloviti rad (in extenso), znanstveni)
Fišer, D. & Ljubešić, N. (2011) Bilingual Lexicon Extraction from Comparable Corpora for Closely Related Languages. U: Proceedings of the International Conference Recent Advances in Natural Language Processing 2011.
@article{article, author = {Fi\v{s}er, Darja and Ljube\v{s}i\'{c}, Nikola}, year = {2011}, pages = {125-131}, keywords = {comparable corpora, lexicon extraction, closely related languages}, title = {Bilingual Lexicon Extraction from Comparable Corpora for Closely Related Languages}, keyword = {comparable corpora, lexicon extraction, closely related languages}, publisher = {RANLP 2011 Organising Committee}, publisherplace = {Hisar, Bugarska} }
@article{article, author = {Fi\v{s}er, Darja and Ljube\v{s}i\'{c}, Nikola}, year = {2011}, pages = {125-131}, keywords = {comparable corpora, lexicon extraction, closely related languages}, title = {Bilingual Lexicon Extraction from Comparable Corpora for Closely Related Languages}, keyword = {comparable corpora, lexicon extraction, closely related languages}, publisher = {RANLP 2011 Organising Committee}, publisherplace = {Hisar, Bugarska} }




Contrast
Increase Font
Decrease Font
Dyslexic Font