Pretražite po imenu i prezimenu autora, mentora, urednika, prevoditelja

Napredna pretraga

Pregled bibliografske jedinice broj: 1038005

Short texts semantic similarity based on word embeddings


Babić, Karlo; Martinčić-Ipšić, Sanda; Meštrović, Ana; Guerra, Francesco
Short texts semantic similarity based on word embeddings // 2019 30th International Scientific Conference on Information and Intelligent Systems (CECIIS) / Strahonja, Vjeran ; Kirinić, Valentina (ur.).
Varaždin: Fakultet organizacije i informatike Sveučilišta u Zagrebu, 2019. str. 27-33 (predavanje, međunarodna recenzija, cjeloviti rad (in extenso), znanstveni)


CROSBI ID: 1038005 Za ispravke kontaktirajte CROSBI podršku putem web obrasca

Naslov
Short texts semantic similarity based on word embeddings

Autori
Babić, Karlo ; Martinčić-Ipšić, Sanda ; Meštrović, Ana ; Guerra, Francesco

Vrsta, podvrsta i kategorija rada
Radovi u zbornicima skupova, cjeloviti rad (in extenso), znanstveni

Izvornik
2019 30th International Scientific Conference on Information and Intelligent Systems (CECIIS) / Strahonja, Vjeran ; Kirinić, Valentina - Varaždin : Fakultet organizacije i informatike Sveučilišta u Zagrebu, 2019, 27-33

Skup
30th Central European Conference on Information and Intelligent Systems (CECIIS 2019)

Mjesto i datum
Varaždin, Hrvatska, 02.10.2018. - 04.10.2018

Vrsta sudjelovanja
Predavanje

Vrsta recenzije
Međunarodna recenzija

Ključne riječi
semantic similarity ; short texts similarity ; word embeddings ; word2vec ; NLP
(semantic similarity ; short texts similarity ; word embeddings ; word2vec, NLP)

Sažetak
Evaluating semantic similarity of texts is a task that assumes paramount importance in real- world applications. In this paper, we describe some experiments we carried out to evaluate the performance of different forms of word embeddings and their aggregations in the task of measuring the similarity of short texts. In particular, we explore the results obtained with two publicly available pre- trained word embeddings (one based on word2vec trained on a specific dataset and the second extending it with embeddings of word senses). We test five approaches for aggregating words into text. Two approaches are based on centroids and summarize a text as a word embedding. The other approaches are some variations of the Okapi BM25 function and provide directly a measure of the similarity of two texts.

Izvorni jezik
Engleski

Znanstvena područja
Računarstvo, Informacijske i komunikacijske znanosti



POVEZANOST RADA


Projekti:
uniri-drustv-18-38

Ustanove:
Fakultet informatike i digitalnih tehnologija, Rijeka

Poveznice na cjeloviti tekst rada:

archive.ceciis.foi.hr archive.ceciis.foi.hr

Citiraj ovu publikaciju:

Babić, Karlo; Martinčić-Ipšić, Sanda; Meštrović, Ana; Guerra, Francesco
Short texts semantic similarity based on word embeddings // 2019 30th International Scientific Conference on Information and Intelligent Systems (CECIIS) / Strahonja, Vjeran ; Kirinić, Valentina (ur.).
Varaždin: Fakultet organizacije i informatike Sveučilišta u Zagrebu, 2019. str. 27-33 (predavanje, međunarodna recenzija, cjeloviti rad (in extenso), znanstveni)
Babić, K., Martinčić-Ipšić, S., Meštrović, A. & Guerra, F. (2019) Short texts semantic similarity based on word embeddings. U: Strahonja, V. & Kirinić, V. (ur.)2019 30th International Scientific Conference on Information and Intelligent Systems (CECIIS).
@article{article, author = {Babi\'{c}, Karlo and Martin\v{c}i\'{c}-Ip\v{s}i\'{c}, Sanda and Me\v{s}trovi\'{c}, Ana and Guerra, Francesco}, year = {2019}, pages = {27-33}, keywords = {semantic similarity, short texts similarity, word embeddings, word2vec, NLP}, title = {Short texts semantic similarity based on word embeddings}, keyword = {semantic similarity, short texts similarity, word embeddings, word2vec, NLP}, publisher = {Fakultet organizacije i informatike Sveu\v{c}ili\v{s}ta u Zagrebu}, publisherplace = {Vara\v{z}din, Hrvatska} }
@article{article, author = {Babi\'{c}, Karlo and Martin\v{c}i\'{c}-Ip\v{s}i\'{c}, Sanda and Me\v{s}trovi\'{c}, Ana and Guerra, Francesco}, year = {2019}, pages = {27-33}, keywords = {semantic similarity, short texts similarity, word embeddings, word2vec, NLP}, title = {Short texts semantic similarity based on word embeddings}, keyword = {semantic similarity, short texts similarity, word embeddings, word2vec, NLP}, publisher = {Fakultet organizacije i informatike Sveu\v{c}ili\v{s}ta u Zagrebu}, publisherplace = {Vara\v{z}din, Hrvatska} }




Contrast
Increase Font
Decrease Font
Dyslexic Font