Napredna pretraga

Pregled bibliografske jedinice broj: 719183

Pseudo-lemmatization in Croatian-English SMT


Brkić, Marija; Matetić, Maja; Seljan, Sanja
Pseudo-lemmatization in Croatian-English SMT // Proceedings of the Central European Conference on Information and Intelligent Systems / Hunjak, T. ; Lovrenčić, S. ; Tomičić, I. (ur.).
Varaždin: Faculty of Organization and Informatics, University of Zagreb, 2014. str. 242-249 (predavanje, međunarodna recenzija, cjeloviti rad (in extenso), znanstveni)


Naslov
Pseudo-lemmatization in Croatian-English SMT

Autori
Brkić, Marija ; Matetić, Maja ; Seljan, Sanja

Vrsta, podvrsta i kategorija rada
Radovi u zbornicima skupova, cjeloviti rad (in extenso), znanstveni

Izvornik
Proceedings of the Central European Conference on Information and Intelligent Systems / Hunjak, T. ; Lovrenčić, S. ; Tomičić, I. - Varaždin : Faculty of Organization and Informatics, University of Zagreb, 2014, 242-249

Skup
Central European Conference on Information and Intelligent Systems

Mjesto i datum
Varaždin, Croatia, 17.-19.09.2014.

Vrsta sudjelovanja
Predavanje

Vrsta recenzije
Međunarodna recenzija

Ključne riječi
Phrase-based statistical machine translation; pseudolemmatization; Croatian-English

Sažetak
One of the first difficulties in conducting a thorough analysis of statistical machine translation involving Croatian as a morphologically rich and resource poor language is the lack of quality language resources. This paper presents results of two standard fourteen feature Croatian-English phrase-based statistical machine translation systems. Prior to building the second system a partial pseudo- lemmatization of the Croatian parts of training and test sets is made in an attempt to simplify the translation process. Besides automatic evaluation, a manual evaluation is conducted in order to gain insight into the nature of the translation differences achieved between the two systems.

Izvorni jezik
Engleski

Znanstvena područja
Računarstvo, Informacijske i komunikacijske znanosti



POVEZANOST RADA


Projekt / tema
130-1300646-0909 - Informacijska tehnologija u prevođenju hrvatskoga i e-učenju jezika (Sanja Seljan, )
13.13.1.3.03

Ustanove
Filozofski fakultet, Zagreb,
Sveučilište u Rijeci - Odjel za informatiku