Nalazite se na CroRIS probnoj okolini. Ovdje evidentirani podaci neće biti pohranjeni u Informacijskom sustavu znanosti RH. Ako je ovo greška, CroRIS produkcijskoj okolini moguće je pristupi putem poveznice www.croris.hr
izvor podataka: crosbi

The applicability of lemmatisation in translation equivalents detection (CROSBI ID 28289)

Prilog u knjizi | izvorni znanstveni rad

Tadić, Marko ; Fulgosi, Sanja ; Šojat, Krešimir The applicability of lemmatisation in translation equivalents detection // Meaningful Texts: The Extraction of Semantic Information from Monolingual and Multilingual Corpora / Barnbrook, Geoff ; Danielsson, Pernilla ; Mahlberg, Michaela (ur.). London : New York (NY): Continuum International Publishing Group, 2004. str. 195-206-x

Podaci o odgovornosti

Tadić, Marko ; Fulgosi, Sanja ; Šojat, Krešimir

engleski

The applicability of lemmatisation in translation equivalents detection

The aim of the research is to help in identification of TEs in 1:1 aligned sentences at the level of single-word units. The research is based on the Croatian-English parallel corpus compiled at the University of Zagreb. The method is based entirely on a statistical approach with no linguistic filter applied before or after the processing which has 3 steps: 1) generation of all possible pairs of tokens from 1:1 aligned sentences (Carthesius product) ; 2) application of mutual information to generated pairs in order to detect candidates for real TE ; 3) sorting the pairs according to calculated MI and choosing real TE for further use. The same method was applied to nonlemmatized and lemmatized material. The latter demonstrated 4.5 % higher precision and it has proven our hypothesis that for Croatian-English pair (and possibly other morphologically rich languages like Croatian) the lemmatized form of corpus data helps the statistical methods of TE detection.

Croatian Language, English Language, Croatian-English Parallel Corpus, parallel corpus, lemmatization, translation equivalents, translation equivalents detection

nije evidentirano

nije evidentirano

nije evidentirano

nije evidentirano

nije evidentirano

nije evidentirano

Podaci o prilogu

195-206-x.

objavljeno

Podaci o knjizi

Meaningful Texts: The Extraction of Semantic Information from Monolingual and Multilingual Corpora

Barnbrook, Geoff ; Danielsson, Pernilla ; Mahlberg, Michaela

London : New York (NY): Continuum International Publishing Group

2004.

082647490X

Povezanost rada

Filologija

Poveznice