Pretražite po imenu i prezimenu autora, mentora, urednika, prevoditelja

Napredna pretraga

Pregled bibliografske jedinice broj: 1099743

Neural Machine Translation for translating into Croatian and Serbian


Popovic, Maja; Poncelas, Alberto; Brkic, Marija; Way, Andy
Neural Machine Translation for translating into Croatian and Serbian // Proceedings of the 7th Workshop on NLP for Similar Languages, Varieties and Dialects / Zampier, Marco ; Nakov, Preslav ; Ljubešić, Nikola ; Tiedemann ; Jörg, Scherrer, Yves (ur.).
Barcelona: International Committee on Computational Linguistics (ICCL), 2020. str. 102-113 (poster, međunarodna recenzija, cjeloviti rad (in extenso), znanstveni)


CROSBI ID: 1099743 Za ispravke kontaktirajte CROSBI podršku putem web obrasca

Naslov
Neural Machine Translation for translating into Croatian and Serbian

Autori
Popovic, Maja ; Poncelas, Alberto ; Brkic, Marija ; Way, Andy

Vrsta, podvrsta i kategorija rada
Radovi u zbornicima skupova, cjeloviti rad (in extenso), znanstveni

Izvornik
Proceedings of the 7th Workshop on NLP for Similar Languages, Varieties and Dialects / Zampier, Marco ; Nakov, Preslav ; Ljubešić, Nikola ; Tiedemann ; Jörg, Scherrer, Yves - Barcelona : International Committee on Computational Linguistics (ICCL), 2020, 102-113

Skup
VarDial 2020

Mjesto i datum
Barcelona, Španjolska, 13.12.2020. - 13.12.2020

Vrsta sudjelovanja
Poster

Vrsta recenzije
Međunarodna recenzija

Ključne riječi
neural machine translation ; South-Slavic languages ; domain ; genre ; synthetic in-domain corpus ; out-of-domain corpus

Sažetak
In this work, we systematically investigate different set-ups for training of neural machine translation (NMT) systems for translation into Croatian and Serbian, two closely related South Slavic languages. We explore English and German as source languages, different sizes and types of training corpora, as well as bilingual and multilingual systems. We also explore translation of English IMDb user movie reviews, a domain/genre where only monolingual data are available. First, our results confirm that multilingual systems with joint target languages perform better. Furthermore, translation performance from English is much better than from German, partly because German is morphologically more complex and partly because the corpus consists mostly of parallel human translations instead of original text and its human translation. The translation from German should be further investigated systematically. For translating user reviews, creating synthetic in-domain parallel data through back- and forward-translation and adding them to a small out-of-domain parallel corpus can yield performance comparable with a system trained on a full out-of-domain corpus. However, it is still not clear what is the optimal size of synthetic in-domain data, especially for forward-translated data where the target language is machine translated. More detailed research including manual evaluation and analysis is needed in this direction.

Izvorni jezik
Engleski

Znanstvena područja
Informacijske i komunikacijske znanosti



POVEZANOST RADA


Ustanove:
Fakultet informatike i digitalnih tehnologija, Rijeka

Profili:

Avatar Url Marija Brkić Bakarić (autor)

Poveznice na cjeloviti tekst rada:

www.aclweb.org

Citiraj ovu publikaciju:

Popovic, Maja; Poncelas, Alberto; Brkic, Marija; Way, Andy
Neural Machine Translation for translating into Croatian and Serbian // Proceedings of the 7th Workshop on NLP for Similar Languages, Varieties and Dialects / Zampier, Marco ; Nakov, Preslav ; Ljubešić, Nikola ; Tiedemann ; Jörg, Scherrer, Yves (ur.).
Barcelona: International Committee on Computational Linguistics (ICCL), 2020. str. 102-113 (poster, međunarodna recenzija, cjeloviti rad (in extenso), znanstveni)
Popovic, M., Poncelas, A., Brkic, M. & Way, A. (2020) Neural Machine Translation for translating into Croatian and Serbian. U: Zampier, M., Nakov, P., Ljubešić, N., Tiedemann & Jörg, Scherrer, Yves (ur.)Proceedings of the 7th Workshop on NLP for Similar Languages, Varieties and Dialects.
@article{article, author = {Popovic, Maja and Poncelas, Alberto and Brkic, Marija and Way, Andy}, year = {2020}, pages = {102-113}, keywords = {neural machine translation, South-Slavic languages, domain, genre, synthetic in-domain corpus, out-of-domain corpus}, title = {Neural Machine Translation for translating into Croatian and Serbian}, keyword = {neural machine translation, South-Slavic languages, domain, genre, synthetic in-domain corpus, out-of-domain corpus}, publisher = {International Committee on Computational Linguistics (ICCL)}, publisherplace = {Barcelona, \v{S}panjolska} }
@article{article, author = {Popovic, Maja and Poncelas, Alberto and Brkic, Marija and Way, Andy}, year = {2020}, pages = {102-113}, keywords = {neural machine translation, South-Slavic languages, domain, genre, synthetic in-domain corpus, out-of-domain corpus}, title = {Neural Machine Translation for translating into Croatian and Serbian}, keyword = {neural machine translation, South-Slavic languages, domain, genre, synthetic in-domain corpus, out-of-domain corpus}, publisher = {International Committee on Computational Linguistics (ICCL)}, publisherplace = {Barcelona, \v{S}panjolska} }




Contrast
Increase Font
Decrease Font
Dyslexic Font