Pretražite po imenu i prezimenu autora, mentora, urednika, prevoditelja

Napredna pretraga

Pregled bibliografske jedinice broj: 125583

The applicability of lemmatisation in translation equivalents detection


Tadić, Marko; Fulgosi, Sanja; Šojat, Krešimir
The applicability of lemmatisation in translation equivalents detection // Meaningful Texts: The Extraction of Semantic Information from Monolingual and Multilingual Corpora / Barnbrook, Geoff ; Danielsson, Pernilla ; Mahlberg, Michaela (ur.).
London : New York (NY): Continuum International Publishing Group, 2004. str. 195-206


CROSBI ID: 125583 Za ispravke kontaktirajte CROSBI podršku putem web obrasca

Naslov
The applicability of lemmatisation in translation equivalents detection

Autori
Tadić, Marko ; Fulgosi, Sanja ; Šojat, Krešimir

Vrsta, podvrsta i kategorija rada
Poglavlja u knjigama, znanstveni

Knjiga
Meaningful Texts: The Extraction of Semantic Information from Monolingual and Multilingual Corpora

Urednik/ci
Barnbrook, Geoff ; Danielsson, Pernilla ; Mahlberg, Michaela

Izdavač
Continuum International Publishing Group

Grad
London : New York (NY)

Godina
2004

Raspon stranica
195-206

ISBN
082647490X

Ključne riječi
Croatian Language, English Language, Croatian-English Parallel Corpus, parallel corpus, lemmatization, translation equivalents, translation equivalents detection

Sažetak
The aim of the research is to help in identification of TEs in 1:1 aligned sentences at the level of single-word units. The research is based on the Croatian-English parallel corpus compiled at the University of Zagreb. The method is based entirely on a statistical approach with no linguistic filter applied before or after the processing which has 3 steps: 1) generation of all possible pairs of tokens from 1:1 aligned sentences (Carthesius product) ; 2) application of mutual information to generated pairs in order to detect candidates for real TE ; 3) sorting the pairs according to calculated MI and choosing real TE for further use. The same method was applied to nonlemmatized and lemmatized material. The latter demonstrated 4.5 % higher precision and it has proven our hypothesis that for Croatian-English pair (and possibly other morphologically rich languages like Croatian) the lemmatized form of corpus data helps the statistical methods of TE detection.

Izvorni jezik
Engleski

Znanstvena područja
Filologija



POVEZANOST RADA


Projekti:
0130418

Ustanove:
Filozofski fakultet, Zagreb

Profili:

Avatar Url Marko Tadić (autor)

Avatar Url Krešimir Šojat (autor)

Avatar Url Sanja Fulgosi (autor)

Citiraj ovu publikaciju:

Tadić, Marko; Fulgosi, Sanja; Šojat, Krešimir
The applicability of lemmatisation in translation equivalents detection // Meaningful Texts: The Extraction of Semantic Information from Monolingual and Multilingual Corpora / Barnbrook, Geoff ; Danielsson, Pernilla ; Mahlberg, Michaela (ur.).
London : New York (NY): Continuum International Publishing Group, 2004. str. 195-206
Tadić, M., Fulgosi, S. & Šojat, K. (2004) The applicability of lemmatisation in translation equivalents detection. U: Barnbrook, G., Danielsson, P. & Mahlberg, M. (ur.) Meaningful Texts: The Extraction of Semantic Information from Monolingual and Multilingual Corpora. London : New York (NY), Continuum International Publishing Group, str. 195-206.
@inbook{inbook, author = {Tadi\'{c}, Marko and Fulgosi, Sanja and \v{S}ojat, Kre\v{s}imir}, year = {2004}, pages = {195-206}, keywords = {Croatian Language, English Language, Croatian-English Parallel Corpus, parallel corpus, lemmatization, translation equivalents, translation equivalents detection}, isbn = {082647490X}, title = {The applicability of lemmatisation in translation equivalents detection}, keyword = {Croatian Language, English Language, Croatian-English Parallel Corpus, parallel corpus, lemmatization, translation equivalents, translation equivalents detection}, publisher = {Continuum International Publishing Group}, publisherplace = {London : New York (NY)} }
@inbook{inbook, author = {Tadi\'{c}, Marko and Fulgosi, Sanja and \v{S}ojat, Kre\v{s}imir}, year = {2004}, pages = {195-206}, keywords = {Croatian Language, English Language, Croatian-English Parallel Corpus, parallel corpus, lemmatization, translation equivalents, translation equivalents detection}, isbn = {082647490X}, title = {The applicability of lemmatisation in translation equivalents detection}, keyword = {Croatian Language, English Language, Croatian-English Parallel Corpus, parallel corpus, lemmatization, translation equivalents, translation equivalents detection}, publisher = {Continuum International Publishing Group}, publisherplace = {London : New York (NY)} }




Contrast
Increase Font
Decrease Font
Dyslexic Font