Pregled bibliografske jedinice broj: 1186854
Representing word meaning with lexical substitutes
Representing word meaning with lexical substitutes, 2021., doktorska disertacija, Fakultet elektrotehnike i računarstva, Zagreb
CROSBI ID: 1186854 Za ispravke kontaktirajte CROSBI podršku putem web obrasca
Naslov
Representing word meaning with lexical substitutes
Autori
Alagić, Domagoj
Vrsta, podvrsta i kategorija rada
Ocjenski radovi, doktorska disertacija
Fakultet
Fakultet elektrotehnike i računarstva
Mjesto
Zagreb
Datum
12.10
Godina
2021
Stranica
73
Mentor
Šnajder, Jan
Ključne riječi
word meaning ; computational lexical semantics ; lexical substitution ; word sense induction, natural language processing
Sažetak
The thesis focuses on exploring computational approaches to representing word meaning in context. While representing the meaning of individual words is crucial for most natural language processing (NLP) tasks, it is still a challenge because word meaning often depends on the context. This research investigates computational models for representing word meaning in context using lexical substitutes (LS), meaning-preserving replacements for a word in context. More specifically, it explores in depth to what extent computational substitute-based representation corresponds to the more established sense-based representation. First, a proof of concept study aimed to validate the initial hypothesis of lexical substitutes being suitable for representing word meaning is presented. Seeing that this hypothesis is best tested on a downstream benchmark NLP tasks, this study opts for a word sense induction (WSI), a well-established semantic NLP task. The results obtained using simple methods based around lexical substituted motivated the more detailed experiment on the correspondence between the sense- and substitute-based representation. The thesis introduces a new lexical sample dataset annotated with both word senses and lexical substitutes, which served as a testbed for the study. Experiments using both manually and automatically produced lexical substitutes are also conducted, uncovering the performance gap between the two. Lastly, WSI is again considered, now with the computational approaches verified in the mentioned experiments, and compared against the state-of-the-art WSI model. Complementing the previous experiments, a focused end-to-end study on lexical substitution for Croatian language was also performed, yielding the first Croatian lexical substitution dataset.
Izvorni jezik
Engleski
Znanstvena područja
Računarstvo
POVEZANOST RADA
Projekti:
HRZZ-UIP-2014-09-7312 - SenseHive: Dinamički modeli za postepenu izgradnju leksičko-semantičkih resursa potpomognuti radom mnoštva (SenseHive) (Šnajder, Jan, HRZZ ) ( CroRIS)
Ustanove:
Fakultet elektrotehnike i računarstva, Zagreb