Corpus-based bilingual terminology extraction

Gašpar, Angelina

Pregled bibliografske jedinice broj: 805395

Corpus-based bilingual terminology extraction

Gašpar, Angelina

Corpus-based bilingual terminology extraction // Multidisciplinary Approaches to Multilingualism / Cergol Kovačević, Kristina ; Udier, Sanda Lucija (ur.).
Frankfurt: Peter Lang, 2015. str. 303-318

CROSBI ID: 805395 Za ispravke kontaktirajte CROSBI podršku putem web obrasca

Naslov
Corpus-based bilingual terminology extraction

Autori
Gašpar, Angelina

Vrsta, podvrsta i kategorija rada
Poglavlja u knjigama, znanstveni

Knjiga
Multidisciplinary Approaches to Multilingualism

Urednik/ci
Cergol Kovačević, Kristina ; Udier, Sanda Lucija

Izdavač
Peter Lang

Grad
Frankfurt

Godina
2015

Raspon stranica
303-318

ISBN
978-3-631-66377-6

Ključne riječi
bilingual terminology extraction, Croatian-English parallel corpus of legislative texts, evaluation

Sažetak
The paper describes a methodology for bilingual terminology extraction and termbase building based on the terminological, lexical and pragmatic criteria along with the translator's knowledge and experience. The research work is conducted on the sentence aligned million- word Croatian-English parallel corpus of legislative texts, the first bigger corpus designed for this language pair so far. In order to assess the hybrid, statistical and linguistic approach as well as the tools for automatic term extraction, the automatically obtained lists of term candidates are compared to the manually created reference list. The term extraction includes multi-word units and single-word units corresponding to multi-word ones. The tools used in this research are: SDL Trados WinAlign (sentence alignment), SDLMultiTermExtract, and WordSmith (for statistically-based term extraction) and NooJ (linguistically-based environment). The evaluation is reported by statistical measures of precision, recall and Fmeasure. The language resources covering a specific domain speed up the translation process, reduce the cost and time and enable communication across different languages and cultures. Also, their application greatly facilitates machine translation and computer-assisted translation, information retrieval, building of multilingual term bases, glossaries and other resources which are prerequisite for the development of a language with insufficient linguistic resources, such as Croatian.

Izvorni jezik
Engleski

Znanstvena područja
Informacijske i komunikacijske znanosti

POVEZANOST RADA

Ustanove:
Filozofski fakultet, Zagreb

Profili:

Angelina Gašpar (autor)

CROSBI Hrvatska znanstvena bibliografija

Pregled bibliografske jedinice broj: 805395

Corpus-based bilingual terminology extraction

Citiraj ovu publikaciju:

Pregled bibliografske jedinice broj: 805395

Corpus-based bilingual terminology extraction

Citiraj ovu publikaciju:

Podijeli: