Pregled bibliografske jedinice broj: 701057
Treebank Translation for Cross-Lingual Parser Induction
Treebank Translation for Cross-Lingual Parser Induction // Proceedings of the Eighteenth Conference on Computational Natural Language Learning (CoNLL 2014)
Baltimore (MD): Association for Computational Linguistics (ACL), 2014. str. 130-140 (predavanje, međunarodna recenzija, cjeloviti rad (in extenso), znanstveni)
CROSBI ID: 701057 Za ispravke kontaktirajte CROSBI podršku putem web obrasca
Naslov
Treebank Translation for Cross-Lingual Parser Induction
Autori
Tiedemann, Jörg ; Agić, Željko ; Nivre, Joakim
Vrsta, podvrsta i kategorija rada
Radovi u zbornicima skupova, cjeloviti rad (in extenso), znanstveni
Izvornik
Proceedings of the Eighteenth Conference on Computational Natural Language Learning (CoNLL 2014)
/ - Baltimore (MD) : Association for Computational Linguistics (ACL), 2014, 130-140
ISBN
978-1-941643-02-0
Skup
Eighteenth Conference on Computational Natural Language Learning (CoNLL 2014)
Mjesto i datum
Baltimore (MD), Sjedinjene Američke Države, 26.06.2014. - 27.06.2014
Vrsta sudjelovanja
Predavanje
Vrsta recenzije
Međunarodna recenzija
Ključne riječi
treebank translation; cross-lingual parsing; parser induction
Sažetak
Cross-lingual learning has become a popular approach to facilitate the development of resources and tools for low density languages. Its underlying idea is to make use of existing tools and annotations in resource-rich languages to create similar tools and resources for resource-poor languages. Typically, this is achieved by either projecting annotations across parallel corpora, or by transferring models from one or more source languages to a target language. In this paper, we explore a third strategy by using machine translation to create synthetic training data from the original source-side annotations. Specifically, we apply this technique to dependency parsing, using a cross-lingually unified treebank for adequate evaluation. Our approach draws on annotation projection but avoids the use of noisy source-side annotation of an unrelated parallel corpus and instead relies on manual treebank annotation in combination with statistical machine translation, which makes it possible to train fully lexicalized parsers. We show that this approach significantly outperforms delexicalized transfer parsing.% despite the error-prone translation step.
Izvorni jezik
Engleski
Znanstvena područja
Informacijske i komunikacijske znanosti
POVEZANOST RADA
Projekti:
130-1300646-1776 - Računalna sintaksa hrvatskoga jezika (Dovedan Han, Zdravko, MZOS ) ( CroRIS)
Ustanove:
Filozofski fakultet, Zagreb
Profili:
Željko Agić
(autor)