Combining Linguistic Features for the Detection of Croatian Multiword Expressions (CROSBI ID 702548)
Prilog sa skupa u zborniku | izvorni znanstveni rad | međunarodna recenzija
Podaci o odgovornosti
Buljan, Maja ; Šnajder, Jan
engleski
Combining Linguistic Features for the Detection of Croatian Multiword Expressions
As multiword expressions (MWEs) exhibit a range of idiosyncrasies, their automatic detection warrants the use of many different features. Tsvetkov and Wintner (2014) proposed a Bayesian network model that combines linguistically motivated features and also models their interactions. In this paper, we extend their model with new features and apply it to Croatian, a morphologically complex and a relatively free word order language, achieving a satisfactory performance of 0.823 F1-score. Furthermore, by comparing against (semi) naive Bayes models, we demonstrate that manually modeling feature interactions is indeed important. We make our annotated dataset of Croatian MWEs freely available.
multiword expressions ; multiword expression detection ; Bayes network ; Croatian language
nije evidentirano
nije evidentirano
nije evidentirano
nije evidentirano
nije evidentirano
nije evidentirano
Podaci o prilogu
194-199.
2017.
objavljeno
Podaci o matičnoj publikaciji
Proceedings of the 13th Workshop on Multiword Expressions (MWE 2017)
Podaci o skupu
The 13th Workshop on Multiword Expressions (MWE 2017)
predavanje
04.04.2017-04.04.2017
Valencia, Španjolska