Napredna pretraga

Pregled bibliografske jedinice broj: 507895

Statistical machine translation of Croatian weather forecast: How much data do we need?


Ljubešić, Nikola; Bago, Petra; Boras, Damir
Statistical machine translation of Croatian weather forecast: How much data do we need? // Proceedings of the ITI 2010 32nd International Conference on INFORMATION TECHNOLOGY INTERFACES / Luzar-Stiffler, V. (ur.).
Zagreb: University Computing Centre, University of Zagreb, 2010. (predavanje, međunarodna recenzija, cjeloviti rad (in extenso), znanstveni)


Naslov
Statistical machine translation of Croatian weather forecast: How much data do we need?

Autori
Ljubešić, Nikola ; Bago, Petra ; Boras, Damir

Vrsta, podvrsta i kategorija rada
Radovi u zbornicima skupova, cjeloviti rad (in extenso), znanstveni

Izvornik
Proceedings of the ITI 2010 32nd International Conference on INFORMATION TECHNOLOGY INTERFACES / Luzar-Stiffler, V. - Zagreb : University Computing Centre, University of Zagreb, 2010

ISBN
978-1-4244-5732-8

Skup
ITI 2010 32nd International Conference on Information Technology Interfaces

Mjesto i datum
Cavtat / Dubrovnik, Hrvatska, 21.-24.06.2010

Vrsta sudjelovanja
Predavanje

Vrsta recenzije
Međunarodna recenzija

Ključne riječi
Statistical machine translation; weather forecast; automatic evaluation; human evaluation

Sažetak
This research is a first step towards a system for translating Croatian weather forecast into multiple languages. This steps deals with the Croatian-English language pair. The parallel corpus consists of a one-year sample of the weather forecasts for the Adriatic consisting of 7, 893 sentence pairs. Evaluation is performed by best known automatic evaluation measures BLUE, NIST and METEOR, as well as by evaluating manually a sample of 200 translations. In this research we have shown that with a small-sized training set and the state-of-the art Moses system, decoding can be done with 96% accuracy concerning adequacy and fluency. Additional improvement is to be expected by increasing the training set size.

Izvorni jezik
Engleski

Znanstvena područja
Informacijske i komunikacijske znanosti



POVEZANOST RADA


Projekt / tema
130-1301679-1380 - Hrvatska rječnička baština i hrvatski europski identitet (Damir Boras, )

Ustanove
Filozofski fakultet, Zagreb