Pregled bibliografske jedinice broj: 507895
Statistical machine translation of Croatian weather forecast: How much data do we need?
Statistical machine translation of Croatian weather forecast: How much data do we need? // Proceedings of the ITI 2010 32nd International Conference on INFORMATION TECHNOLOGY INTERFACES / Luzar-Stiffler, V. (ur.).
Zagreb: Sveučilišni računski centar Sveučilišta u Zagrebu (Srce), 2010. (predavanje, međunarodna recenzija, cjeloviti rad (in extenso), znanstveni)
CROSBI ID: 507895 Za ispravke kontaktirajte CROSBI podršku putem web obrasca
Naslov
Statistical machine translation of Croatian weather forecast: How much data do we need?
Autori
Ljubešić, Nikola ; Bago, Petra ; Boras, Damir
Vrsta, podvrsta i kategorija rada
Radovi u zbornicima skupova, cjeloviti rad (in extenso), znanstveni
Izvornik
Proceedings of the ITI 2010 32nd International Conference on INFORMATION TECHNOLOGY INTERFACES
/ Luzar-Stiffler, V. - Zagreb : Sveučilišni računski centar Sveučilišta u Zagrebu (Srce), 2010
ISBN
978-1-4244-5732-8
Skup
ITI 2010 32nd International Conference on Information Technology Interfaces
Mjesto i datum
Dubrovnik, Hrvatska; Cavtat, Hrvatska, 21.06.2010. - 24.06.2010
Vrsta sudjelovanja
Predavanje
Vrsta recenzije
Međunarodna recenzija
Ključne riječi
statistical machine translation; weather forecast; automatic evaluation; human evaluation
Sažetak
This research is a first step towards a system for translating Croatian weather forecast into multiple languages. This steps deals with the Croatian-English language pair. The parallel corpus consists of a one-year sample of the weather forecasts for the Adriatic consisting of 7, 893 sentence pairs. Evaluation is performed by best known automatic evaluation measures BLUE, NIST and METEOR, as well as by evaluating manually a sample of 200 translations. In this research we have shown that with a small-sized training set and the state-of-the art Moses system, decoding can be done with 96% accuracy concerning adequacy and fluency. Additional improvement is to be expected by increasing the training set size.
Izvorni jezik
Engleski
Znanstvena područja
Informacijske i komunikacijske znanosti
POVEZANOST RADA
Projekti:
130-1301679-1380 - Hrvatska rječnička baština i hrvatski europski identitet (Boras, Damir, MZOS ) ( CroRIS)
Ustanove:
Filozofski fakultet, Zagreb