Pregled bibliografske jedinice broj: 1124544
Combining Linguistic Features for the Detection of Croatian Multiword Expressions
Combining Linguistic Features for the Detection of Croatian Multiword Expressions // Proceedings of the 13th Workshop on Multiword Expressions (MWE 2017)
Valencia, Španjolska, 2017. str. 194-199 (predavanje, međunarodna recenzija, cjeloviti rad (in extenso), znanstveni)
CROSBI ID: 1124544 Za ispravke kontaktirajte CROSBI podršku putem web obrasca
Naslov
Combining Linguistic Features for the Detection of Croatian Multiword Expressions
Autori
Buljan, Maja ; Šnajder, Jan
Vrsta, podvrsta i kategorija rada
Radovi u zbornicima skupova, cjeloviti rad (in extenso), znanstveni
Izvornik
Proceedings of the 13th Workshop on Multiword Expressions (MWE 2017)
/ - , 2017, 194-199
Skup
The 13th Workshop on Multiword Expressions (MWE 2017)
Mjesto i datum
Valencia, Španjolska, 04.04.2017
Vrsta sudjelovanja
Predavanje
Vrsta recenzije
Međunarodna recenzija
Ključne riječi
multiword expressions ; multiword expression detection ; Bayes network ; Croatian language
Sažetak
As multiword expressions (MWEs) exhibit a range of idiosyncrasies, their automatic detection warrants the use of many different features. Tsvetkov and Wintner (2014) proposed a Bayesian network model that combines linguistically motivated features and also models their interactions. In this paper, we extend their model with new features and apply it to Croatian, a morphologically complex and a relatively free word order language, achieving a satisfactory performance of 0.823 F1-score. Furthermore, by comparing against (semi) naive Bayes models, we demonstrate that manually modeling feature interactions is indeed important. We make our annotated dataset of Croatian MWEs freely available.
Izvorni jezik
Engleski
Znanstvena područja
Računarstvo
POVEZANOST RADA
Ustanove:
Fakultet elektrotehnike i računarstva, Zagreb