Pretražite po imenu i prezimenu autora, mentora, urednika, prevoditelja

Napredna pretraga

Pregled bibliografske jedinice broj: 552786

Bilingual lexicon extraction from comparable corpora: A comparative study


Ljubešić, Nikola; Fišer, Darja; Vintar, Špela; Pollak, Senja
Bilingual lexicon extraction from comparable corpora: A comparative study // First International Workshop on Lexical Resources, An ESSLLI 2011 Workshop, Ljubljana, Slovenia - August 1-5, 2011
Ljubljana, Slovenija, 2011. (predavanje, međunarodna recenzija, cjeloviti rad (in extenso), znanstveni)


CROSBI ID: 552786 Za ispravke kontaktirajte CROSBI podršku putem web obrasca

Naslov
Bilingual lexicon extraction from comparable corpora: A comparative study

Autori
Ljubešić, Nikola ; Fišer, Darja ; Vintar, Špela ; Pollak, Senja

Vrsta, podvrsta i kategorija rada
Radovi u zbornicima skupova, cjeloviti rad (in extenso), znanstveni

Izvornik
First International Workshop on Lexical Resources, An ESSLLI 2011 Workshop, Ljubljana, Slovenia - August 1-5, 2011 / - , 2011

Skup
First International Workshop on Lexical Resources

Mjesto i datum
Ljubljana, Slovenija, 01.08.2011. - 05.08.2011

Vrsta sudjelovanja
Predavanje

Vrsta recenzije
Međunarodna recenzija

Ključne riječi
comparable corpora; bilingual lexicon extraction

Sažetak
This paper presents a comparative study of the impact of the key parameters for bilingual lexicon extraction for nouns from comparable corpora. The parameters we analyzed are: corpus size and comparability, dictionary size and type, feature selection for context vectors and window size, and association and similarity measures. Evaluation against the gold standard shows that window size of 7 with encoded position yields best results. The consistently best-performing association and similarity measures are Jensen-Shannon divergence with log-likelihood. We have shown that very good results can be achieved with small-sized but purpose-built seed lexicons and that problems arising from dissimilarities between the source and the target corpus can be compensated with their sufficient size.

Izvorni jezik
Engleski

Znanstvena područja
Informacijske i komunikacijske znanosti



POVEZANOST RADA


Projekti:
130-1301679-1380 - Hrvatska rječnička baština i hrvatski europski identitet (Boras, Damir, MZOS ) ( CroRIS)

Ustanove:
Filozofski fakultet, Zagreb

Profili:

Avatar Url Nikola Ljubešić (autor)

Citiraj ovu publikaciju:

Ljubešić, Nikola; Fišer, Darja; Vintar, Špela; Pollak, Senja
Bilingual lexicon extraction from comparable corpora: A comparative study // First International Workshop on Lexical Resources, An ESSLLI 2011 Workshop, Ljubljana, Slovenia - August 1-5, 2011
Ljubljana, Slovenija, 2011. (predavanje, međunarodna recenzija, cjeloviti rad (in extenso), znanstveni)
Ljubešić, N., Fišer, D., Vintar, Š. & Pollak, S. (2011) Bilingual lexicon extraction from comparable corpora: A comparative study. U: First International Workshop on Lexical Resources, An ESSLLI 2011 Workshop, Ljubljana, Slovenia - August 1-5, 2011.
@article{article, author = {Ljube\v{s}i\'{c}, Nikola and Fi\v{s}er, Darja and Vintar, \v{S}pela and Pollak, Senja}, year = {2011}, keywords = {comparable corpora, bilingual lexicon extraction}, title = {Bilingual lexicon extraction from comparable corpora: A comparative study}, keyword = {comparable corpora, bilingual lexicon extraction}, publisherplace = {Ljubljana, Slovenija} }
@article{article, author = {Ljube\v{s}i\'{c}, Nikola and Fi\v{s}er, Darja and Vintar, \v{S}pela and Pollak, Senja}, year = {2011}, keywords = {comparable corpora, bilingual lexicon extraction}, title = {Bilingual lexicon extraction from comparable corpora: A comparative study}, keyword = {comparable corpora, bilingual lexicon extraction}, publisherplace = {Ljubljana, Slovenija} }




Contrast
Increase Font
Decrease Font
Dyslexic Font