Multi-word term extraction from comparable corpora by combining contextual and constituent clues

Ljubešić, Nikola; Vintar, Špela; Fišer, Darja

Pregled bibliografske jedinice broj: 616813

Multi-word term extraction from comparable corpora by combining contextual and constituent clues

Ljubešić, Nikola; Vintar, Špela; Fišer, Darja

Multi-word term extraction from comparable corpora by combining contextual and constituent clues // Proceedings of the Workshop on Building and Using Comparable Corpora (BUCC’12) / Rapp, Reinhard ; Tadić, Marko ; Sharoff, Serge ; Zweigenbaum, Pierre (ur.).
Istanbul, 2012. str. 143-147 (predavanje, međunarodna recenzija, cjeloviti rad (in extenso), znanstveni)

CROSBI ID: 616813 Za ispravke kontaktirajte CROSBI podršku putem web obrasca

Naslov
Multi-word term extraction from comparable corpora by combining contextual and constituent clues

Autori
Ljubešić, Nikola ; Vintar, Špela ; Fišer, Darja

Vrsta, podvrsta i kategorija rada
Radovi u zbornicima skupova, cjeloviti rad (in extenso), znanstveni

Izvornik
Proceedings of the Workshop on Building and Using Comparable Corpora (BUCC’12) / Rapp, Reinhard ; Tadić, Marko ; Sharoff, Serge ; Zweigenbaum, Pierre - Istanbul, 2012, 143-147

Skup
5th Workshop on Building and Using Comparable Corpora (BUCC 2012)

Mjesto i datum
Istanbul, Turska, 26.05.2012

Vrsta sudjelovanja
Predavanje

Vrsta recenzije
Međunarodna recenzija

Ključne riječi
bilingual term extraction; comparable corpora; multi-word expressions; constituent clues

Sažetak
In this paper we present an approach to automatically extract and align multi-word terms from an English-Slovene comparable health corpus. First, the terms are extracted from the corpus for each language separately using a list of user-adjustable morphosyntactic patterns and a term weighting measure. Then, the extracted terms are aligned in a bag-of-equivalents fashion with a seed bilingual lexicon. In the extension of the approach we also show that the small general seed lexicon can be enriched with domain-specific vocabulary by harvesting it directly from the comparable corpus, which significantly improves the results of multi-word term mapping. While most previous efforts in bilingual lexicon extraction from comparable corpora have focused on mapping of single words, the proposed technique successfully augments them in that it is able to deal with multi-word terms as well. Since the proposed approach requires minimal knowledge resources, it is easily adaptable for a new language pair or domain, which is one of its biggest advantages.

Izvorni jezik
Engleski

Znanstvena područja
Informacijske i komunikacijske znanosti

POVEZANOST RADA

Projekti:
FP7-248347
130-1301679-1380 - Hrvatska rječnička baština i hrvatski europski identitet (Boras, Damir, MZOS ) ( CroRIS)

Ustanove:
Filozofski fakultet, Zagreb

Profili:

Nikola Ljubešić (autor)

CROSBI Hrvatska znanstvena bibliografija

Pregled bibliografske jedinice broj: 616813

Multi-word term extraction from comparable corpora by combining contextual and constituent clues

Citiraj ovu publikaciju:

Pregled bibliografske jedinice broj: 616813

Multi-word term extraction from comparable corpora by combining contextual and constituent clues

Citiraj ovu publikaciju:

Podijeli: