Pretražite po imenu i prezimenu autora, mentora, urednika, prevoditelja

Napredna pretraga

Pregled bibliografske jedinice broj: 552816

Building and using comparable corpora for domain-specific bilingual lexicon extraction


Fišer, Darja; Ljubešić, Nikola; Vintar, Špela; Pollak, Senja
Building and using comparable corpora for domain-specific bilingual lexicon extraction // 4th Workshop on Building and Using Comparable Corpora: Comparable Corpora and the Web
Portland (ME): Association for Computational Linguistics (ACL), 2011. str. 19-26 (predavanje, međunarodna recenzija, cjeloviti rad (in extenso), znanstveni)


CROSBI ID: 552816 Za ispravke kontaktirajte CROSBI podršku putem web obrasca

Naslov
Building and using comparable corpora for domain-specific bilingual lexicon extraction

Autori
Fišer, Darja ; Ljubešić, Nikola ; Vintar, Špela ; Pollak, Senja

Vrsta, podvrsta i kategorija rada
Radovi u zbornicima skupova, cjeloviti rad (in extenso), znanstveni

Izvornik
4th Workshop on Building and Using Comparable Corpora: Comparable Corpora and the Web / - Portland (ME) : Association for Computational Linguistics (ACL), 2011, 19-26

Skup
4th Workshop on Building and Using Comparable Corpora: Comparable Corpora and the Web

Mjesto i datum
Portland (OR), Sjedinjene Američke Države, 24.07.2011

Vrsta sudjelovanja
Predavanje

Vrsta recenzije
Međunarodna recenzija

Ključne riječi
comparable corpora; bilingual lexicon extraction; domain lexicons

Sažetak
This paper presents a series of experiments aimed at inducing and evaluating domain- specific bilingual lexica from comparable corpora. First, a small English-Slovene comparable corpus from health magazines was manually constructed and then used to compile a large comparable corpus on health-related topics from web corpora. Next, a bilingual lexicon for the domain was extracted from the corpus by comparing context vectors in the two languages. Evaluation of the results shows that a 2-way translation of context vectors significantly improves precision of the extracted translation equivalents. We also show that it is sufficient to increase the corpus for one language in order to obtain a higher recall, and that the increase of the number of new words is linear in the size of the corpus. Finally, we demonstrate that by lowering the frequency threshold for context vectors, the drop in precision is much slower than the increase of recall.

Izvorni jezik
Engleski

Znanstvena područja
Informacijske i komunikacijske znanosti



POVEZANOST RADA


Projekti:
130-1301679-1380 - Hrvatska rječnička baština i hrvatski europski identitet (Boras, Damir, MZOS ) ( CroRIS)

Ustanove:
Filozofski fakultet, Zagreb

Profili:

Avatar Url Nikola Ljubešić (autor)

Citiraj ovu publikaciju:

Fišer, Darja; Ljubešić, Nikola; Vintar, Špela; Pollak, Senja
Building and using comparable corpora for domain-specific bilingual lexicon extraction // 4th Workshop on Building and Using Comparable Corpora: Comparable Corpora and the Web
Portland (ME): Association for Computational Linguistics (ACL), 2011. str. 19-26 (predavanje, međunarodna recenzija, cjeloviti rad (in extenso), znanstveni)
Fišer, D., Ljubešić, N., Vintar, Š. & Pollak, S. (2011) Building and using comparable corpora for domain-specific bilingual lexicon extraction. U: 4th Workshop on Building and Using Comparable Corpora: Comparable Corpora and the Web.
@article{article, author = {Fi\v{s}er, Darja and Ljube\v{s}i\'{c}, Nikola and Vintar, \v{S}pela and Pollak, Senja}, year = {2011}, pages = {19-26}, keywords = {comparable corpora, bilingual lexicon extraction, domain lexicons}, title = {Building and using comparable corpora for domain-specific bilingual lexicon extraction}, keyword = {comparable corpora, bilingual lexicon extraction, domain lexicons}, publisher = {Association for Computational Linguistics (ACL)}, publisherplace = {Portland (OR), Sjedinjene Ameri\v{c}ke Dr\v{z}ave} }
@article{article, author = {Fi\v{s}er, Darja and Ljube\v{s}i\'{c}, Nikola and Vintar, \v{S}pela and Pollak, Senja}, year = {2011}, pages = {19-26}, keywords = {comparable corpora, bilingual lexicon extraction, domain lexicons}, title = {Building and using comparable corpora for domain-specific bilingual lexicon extraction}, keyword = {comparable corpora, bilingual lexicon extraction, domain lexicons}, publisher = {Association for Computational Linguistics (ACL)}, publisherplace = {Portland (OR), Sjedinjene Ameri\v{c}ke Dr\v{z}ave} }




Contrast
Increase Font
Decrease Font
Dyslexic Font