Pregled bibliografske jedinice broj: 69119
Possibilities of Identification of Translation Equivalents in Croatian-English Parallel Corpus
Possibilities of Identification of Translation Equivalents in Croatian-English Parallel Corpus // Proceedings of the 5th TELRI seminar / Teubert, Wolfgang et al. (ur.).
Mannheim: TELRI Association, 2001. (predavanje, međunarodna recenzija, neobjavljeni rad, znanstveni)
CROSBI ID: 69119 Za ispravke kontaktirajte CROSBI podršku putem web obrasca
Naslov
Possibilities of Identification of Translation Equivalents in Croatian-English Parallel Corpus
Autori
Šojat, Krešimir ; Tadić, Marko
Vrsta, podvrsta i kategorija rada
Sažeci sa skupova, neobjavljeni rad, znanstveni
Izvornik
Proceedings of the 5th TELRI seminar
/ Teubert, Wolfgang et al. - Mannheim : TELRI Association, 2001
Skup
5th TELRI seminar "Extracting Meaning from Corpora"
Mjesto i datum
Ljubljana, Slovenija, 20.09.2000. - 23.09.2000
Vrsta sudjelovanja
Predavanje
Vrsta recenzije
Međunarodna recenzija
Ključne riječi
corpus; parallel corpora; Croatian; English; translation equivalents
Sažetak
The paper discuses associations between translation equivalents in a parallel aligned corpus. The focus is on identification of multi-word units in a parallel corpus and verification of translation equivalents. The data have been extracted from the Croatian-English parallel corpus aligned to the level of sentence that has been compiled at the Institute of Linguistics, Faculty of Philosophy, University of Zagreb. Using statistical measures, primarily mutual information value, significant co-occurrences of words were identified first in a source language. In order to define significant multi-word units both in the source language (Croatian) as well as in the target language (English), the same procedure has been carried through in the target language. The analysis and classification of the results takes place afterwards. In order to establish which translations are significant, the next step consist of applying the statistical procedures between translation equivalents in a test sample extracted from the parallel corpus. Applying such statistical measures between translation equivalents enables a systematization of terminology from the areas where a constant lack of new Croatian terms exists (e.g. market-economy, computer science). On the other hand, examination of translation equivalents in the target language can be used as an instrument for information extraction in the source language.
Izvorni jezik
Engleski
Znanstvena područja
Filologija