Pregled bibliografske jedinice broj: 805395
Corpus-based bilingual terminology extraction
Corpus-based bilingual terminology extraction // Multidisciplinary Approaches to Multilingualism / Cergol Kovačević, Kristina ; Udier, Sanda Lucija (ur.).
Frankfurt: Peter Lang, 2015. str. 303-318
CROSBI ID: 805395 Za ispravke kontaktirajte CROSBI podršku putem web obrasca
Naslov
Corpus-based bilingual terminology extraction
Autori
Gašpar, Angelina
Vrsta, podvrsta i kategorija rada
Poglavlja u knjigama, znanstveni
Knjiga
Multidisciplinary Approaches to Multilingualism
Urednik/ci
Cergol Kovačević, Kristina ; Udier, Sanda Lucija
Izdavač
Peter Lang
Grad
Frankfurt
Godina
2015
Raspon stranica
303-318
ISBN
978-3-631-66377-6
Ključne riječi
bilingual terminology extraction, Croatian-English parallel corpus of legislative texts, evaluation
Sažetak
The paper describes a methodology for bilingual terminology extraction and termbase building based on the terminological, lexical and pragmatic criteria along with the translator's knowledge and experience. The research work is conducted on the sentence aligned million- word Croatian-English parallel corpus of legislative texts, the first bigger corpus designed for this language pair so far. In order to assess the hybrid, statistical and linguistic approach as well as the tools for automatic term extraction, the automatically obtained lists of term candidates are compared to the manually created reference list. The term extraction includes multi-word units and single-word units corresponding to multi-word ones. The tools used in this research are: SDL Trados WinAlign (sentence alignment), SDLMultiTermExtract, and WordSmith (for statistically-based term extraction) and NooJ (linguistically-based environment). The evaluation is reported by statistical measures of precision, recall and Fmeasure. The language resources covering a specific domain speed up the translation process, reduce the cost and time and enable communication across different languages and cultures. Also, their application greatly facilitates machine translation and computer-assisted translation, information retrieval, building of multilingual term bases, glossaries and other resources which are prerequisite for the development of a language with insufficient linguistic resources, such as Croatian.
Izvorni jezik
Engleski
Znanstvena područja
Informacijske i komunikacijske znanosti