Short texts semantic similarity based on word embeddings (CROSBI ID 685328)
Prilog sa skupa u časopisu | izvorni znanstveni rad | međunarodna recenzija
Podaci o odgovornosti
Babić, Karlo ; Martinčić-Ipšić, Sanda ; Meštrović, Ana ; Guerra, Francesco
engleski
Short texts semantic similarity based on word embeddings
Evaluating semantic similarity of texts is a task that assumes paramount importance in real- world applications. In this paper, we describe some experiments we carried out to evaluate the performance of different forms of word embeddings and their aggregations in the task of measuring the similarity of short texts. In particular, we explore the results obtained with two publicly available pre- trained word embeddings (one based on word2vec trained on a specific dataset and the second extending it with embeddings of word senses). We test five approaches for aggregating words into text. Two approaches are based on centroids and summarize a text as a word embedding. The other approaches are some variations of the Okapi BM25 function and provide directly a measure of the similarity of two texts.
semantic similarity ; short texts similarity ; word embeddings ; word2vec, NLP
nije evidentirano
nije evidentirano
nije evidentirano
nije evidentirano
nije evidentirano
nije evidentirano
Podaci o prilogu
27-33.
2019.
nije evidentirano
objavljeno
Podaci o matičnoj publikaciji
Central European conference on information and intelligent systems
Strahonja, Vjeran ; Kirinić, Valentina
Varaždin: Fakultet organizacije i informatike Sveučilišta u Zagrebu
1847-2001
1848-2295
Podaci o skupu
30th Central European Conference on Information and Intelligent Systems (CECIIS 2019)
predavanje
02.10.2019-04.10.2019
Varaždin, Hrvatska
Povezanost rada
Informacijske i komunikacijske znanosti, Računarstvo