Pregled bibliografske jedinice broj: 597440
Slovene-Croatian Treebank Transfer Using Bilingual Lexicon Improves Croatian Dependency Parsing
Slovene-Croatian Treebank Transfer Using Bilingual Lexicon Improves Croatian Dependency Parsing // Proceedings of the 15th International Multiconference Information Society (IS 2012), Volume C, Proceedings of the 8th Language Technologies Conference / Erjavec, Tomaž ; Žganec Gros, Jerneja (ur.).
Ljubljana: Institut Jožef Stefan, 2012. str. 5-9 (predavanje, međunarodna recenzija, cjeloviti rad (in extenso), znanstveni)
CROSBI ID: 597440 Za ispravke kontaktirajte CROSBI podršku putem web obrasca
Naslov
Slovene-Croatian Treebank Transfer Using Bilingual Lexicon Improves Croatian Dependency Parsing
Autori
Agić, Željko ; Merkler, Danijela ; Berović, Daša
Vrsta, podvrsta i kategorija rada
Radovi u zbornicima skupova, cjeloviti rad (in extenso), znanstveni
Izvornik
Proceedings of the 15th International Multiconference Information Society (IS 2012), Volume C, Proceedings of the 8th Language Technologies Conference
/ Erjavec, Tomaž ; Žganec Gros, Jerneja - Ljubljana : Institut Jožef Stefan, 2012, 5-9
ISBN
978-961-264-048-4
Skup
Eighth Language Technologies Conference
Mjesto i datum
Ljubljana, Slovenija, 08.10.2012. - 09.10.2012
Vrsta sudjelovanja
Predavanje
Vrsta recenzije
Međunarodna recenzija
Ključne riječi
treebank transfer; bilingual lexicon; dependency parsing
Sažetak
A method is presented for transferring dependency treebanks between similar languages by using a bilingual lexicon, aiming to improve dependency parsing accuracy on the target language. It is illustrated by transferring the Slovene Dependency Treebank to Croatian by using a GIZA++ bilingual lexicon constructed from the Croatian-Slovene 1984 parallel corpus from the Multext East project. The transferred treebank is merged with the Croatian Dependency Treebank and the merged treebank is used to train and test two graph-based dependency parsers. MSTParser and CroDep accuracy on parsing the 1984 fictional text shows a statistically significant increase and a similar decrease on parsing the Croatian Dependency Treebank newspaper text.
Izvorni jezik
Engleski
Znanstvena područja
Informacijske i komunikacijske znanosti, Filologija
POVEZANOST RADA
Projekti:
130-1300646-0645 - Hrvatski jezični resursi i njihovo obilježavanje (Tadić, Marko, MZOS ) ( CroRIS)
130-1300646-1776 - Računalna sintaksa hrvatskoga jezika (Dovedan Han, Zdravko, MZOS ) ( CroRIS)
Ustanove:
Filozofski fakultet, Zagreb