Pregled bibliografske jedinice broj: 1040504
Domain Adaptation for Machine Translation Involving a Low-Resource Language: Google AutoML vs. from-scratch NMT Systems
Domain Adaptation for Machine Translation Involving a Low-Resource Language: Google AutoML vs. from-scratch NMT Systems // Translating and the Computer 41 / Esteves-Ferreira, João ; Macan, Juliet Margaret ; Mitkov, Ruslan ; Stefanov, Olaf-Michael (ur.).
Ženeva: Editions Tradulex, 2019. str. 113-124 (predavanje, međunarodna recenzija, cjeloviti rad (in extenso), znanstveni)
CROSBI ID: 1040504 Za ispravke kontaktirajte CROSBI podršku putem web obrasca
Naslov
Domain Adaptation for Machine Translation Involving a Low-Resource Language: Google AutoML vs. from-scratch NMT Systems
Autori
Šoštarić, Margita ; Pavlović, Nataša ; Boltužić, Filip
Vrsta, podvrsta i kategorija rada
Radovi u zbornicima skupova, cjeloviti rad (in extenso), znanstveni
Izvornik
Translating and the Computer 41
/ Esteves-Ferreira, João ; Macan, Juliet Margaret ; Mitkov, Ruslan ; Stefanov, Olaf-Michael - Ženeva : Editions Tradulex, 2019, 113-124
ISBN
978-2970-10957-0
Skup
Translating and the Computer (TC41 2019)
Mjesto i datum
London, Ujedinjeno Kraljevstvo, 21.11.2019. - 22.11.2019
Vrsta sudjelovanja
Predavanje
Vrsta recenzije
Međunarodna recenzija
Ključne riječi
machine translation, domain adaptation, low-resource language, neural machine translation
Sažetak
Despite the advances in machine translation (MT) made with neural models, adaptation of such systems for specialist domains is challenging. The problem is heightened for low-resource languages. Additionally, the computational resources and expertise needed to train neural models present barriers for smaller translation companies and freelancers, for whom paid but affordable customization services might present a viable solution. One such service, Google Cloud AutoML, is here compared to domain adaptation of neural MT systems trained from scratch using OpenNMT, an open-source MT toolkit. The from-scratch systems are trained on a larger out-of-domain and a smaller in-domain dataset comprised of medical texts. The same indomain data are used to customize Google Translate. System performance is compared using automatic and human evaluation. The resources, skills, costs and time necessary to set up the examined systems are discussed.
Izvorni jezik
Engleski
Znanstvena područja
Računarstvo, Informacijske i komunikacijske znanosti, Filologija
POVEZANOST RADA
Ustanove:
Fakultet elektrotehnike i računarstva, Zagreb,
Filozofski fakultet, Zagreb
Profili:
Nataša Pavlović
(autor)