Pregled bibliografske jedinice broj: 1214805
Typological Approach to Improve Dependency Parsing for Croatian Language
Typological Approach to Improve Dependency Parsing for Croatian Language // Proceedings of the 20th International Workshop on Treebanks and Linguistic Theories (TLT, SyntaxFest 2021) / Dakota, Daniel ; Evang, Kilian ; Kübler, Sandra (ur.).
Sofija: Association for Computational Linguistics (ACL), 2021. str. 1-11 (predavanje, međunarodna recenzija, cjeloviti rad (in extenso), znanstveni)
CROSBI ID: 1214805 Za ispravke kontaktirajte CROSBI podršku putem web obrasca
Naslov
Typological Approach to Improve Dependency Parsing
for Croatian
Language
(Typological Approach to Improve Dependency Parsing
for Croatian
Language)
Autori
Alves, Diego ; Bekavac, Božo ; Tadić, Marko
Vrsta, podvrsta i kategorija rada
Radovi u zbornicima skupova, cjeloviti rad (in extenso), znanstveni
Izvornik
Proceedings of the 20th International Workshop on Treebanks and Linguistic Theories (TLT, SyntaxFest 2021)
/ Dakota, Daniel ; Evang, Kilian ; Kübler, Sandra - Sofija : Association for Computational Linguistics (ACL), 2021, 1-11
ISBN
978-1-955917-16-2
Skup
20th International Workshop on Treebanks and Linguistic Theories (TLT 2021)
Mjesto i datum
Online, 21.03.2022. - 24.03.2022
Vrsta sudjelovanja
Predavanje
Vrsta recenzije
Međunarodna recenzija
Ključne riječi
dependency parsing ; typology ; Croatian ; multilingualism
Sažetak
This article presents the results of the experiments concerning different typological approaches considering syntactic structures with the aim to identify similar languages which can be com- bined with Croatian to improve UAS and LAS metrics when using a deep learning tool. From the eight selected languages coming from different linguistic families and genera, we showed that Slovene and Irish are the best candidates which improved significantly dependency parsing results. Slovak is the only language presenting negative synergy when combined with Croat- ian. Both typological approaches presented in this study, using quantitative data concerning rules from context-free grammar extracted from corpora using Marsagram tool and using syntactic features from lang2vec language vectors, did not allow us to explain the observed synergy when the different languages were combined. The traditional genealogical classification does not ex- plain either the improvement provided by Irish or the negative impact of the Slovak language in both considered metrics.
Izvorni jezik
Engleski
Znanstvena područja
Informacijske i komunikacijske znanosti, Filologija
POVEZANOST RADA
Projekti:
EK-H2020-812997 - Cross-lingual Event-centric Open Analytics Research Academy (Cleopatra) (Tadić, Marko, EK - H2020-MSCA-ITN-2018) ( CroRIS)
Ustanove:
Filozofski fakultet, Zagreb