Pretražite po imenu i prezimenu autora, mentora, urednika, prevoditelja

Napredna pretraga

Pregled bibliografske jedinice broj: 1096691

A Comparison of Approaches for Measuring the Semantic Similarity of Short Texts Based on Word Embeddings


Babić, Karlo; Guerra, Francesco; Martinčić- Ipšić, Sanda; Meštrović, Ana
A Comparison of Approaches for Measuring the Semantic Similarity of Short Texts Based on Word Embeddings // Journal of information and organizational sciences, 44 (2020), 2; 231-246 doi:10.31341/jios.44.2.2 (međunarodna recenzija, članak, znanstveni)


CROSBI ID: 1096691 Za ispravke kontaktirajte CROSBI podršku putem web obrasca

Naslov
A Comparison of Approaches for Measuring the Semantic Similarity of Short Texts Based on Word Embeddings

Autori
Babić, Karlo ; Guerra, Francesco ; Martinčić- Ipšić, Sanda ; Meštrović, Ana

Izvornik
Journal of information and organizational sciences (1846-3312) 44 (2020), 2; 231-246

Vrsta, podvrsta i kategorija rada
Radovi u časopisima, članak, znanstveni

Ključne riječi
semantic similarity, short texts similarity, word embedding, Word2Vec, FastText, TF-IDF

Sažetak
Measuring the semantic similarity of texts has a vital role in various tasks from the field of natural language processing. In this paper, we describe a set of experiments we carried out to evaluate and compare the performance of different approaches for measuring the semantic similarity of short texts. We perform a comparison of four models based on word embeddings: two variants of Word2Vec (one based on Word2Vec trained on a specific dataset and the second extending it with embeddings of word senses), FastText, and TF-IDF. Since these models provide word vectors, we experiment with various methods that calculate the semantic similarity of short texts based on word vectors. More precisely, for each of these models, we test five methods for aggregating word embeddings into text embedding. We introduced three methods by making variations of two commonly used similarity measures. One method is an extension of the cosine similarity based on centroids, and the other two methods are variations of the Okapi BM25 function. We evaluate all approaches on the two publicly available datasets: SICK and Lee in terms of the Pearson and Spearman correlation. The results indicate that extended methods perform better from the original in most of the cases.

Izvorni jezik
Engleski

Znanstvena područja
Računarstvo, Informacijske i komunikacijske znanosti



POVEZANOST RADA


Ustanove:
Fakultet informatike i digitalnih tehnologija, Rijeka

Poveznice na cjeloviti tekst rada:

Pristup cjelovitom tekstu rada doi jios.foi.hr

Citiraj ovu publikaciju:

Babić, Karlo; Guerra, Francesco; Martinčić- Ipšić, Sanda; Meštrović, Ana
A Comparison of Approaches for Measuring the Semantic Similarity of Short Texts Based on Word Embeddings // Journal of information and organizational sciences, 44 (2020), 2; 231-246 doi:10.31341/jios.44.2.2 (međunarodna recenzija, članak, znanstveni)
Babić, K., Guerra, F., Martinčić- Ipšić, S. & Meštrović, A. (2020) A Comparison of Approaches for Measuring the Semantic Similarity of Short Texts Based on Word Embeddings. Journal of information and organizational sciences, 44 (2), 231-246 doi:10.31341/jios.44.2.2.
@article{article, author = {Babi\'{c}, Karlo and Guerra, Francesco and Martin\v{c}i\'{c}- Ip\v{s}i\'{c}, Sanda and Me\v{s}trovi\'{c}, Ana}, year = {2020}, pages = {231-246}, DOI = {10.31341/jios.44.2.2}, keywords = {semantic similarity, short texts similarity, word embedding, Word2Vec, FastText, TF-IDF}, journal = {Journal of information and organizational sciences}, doi = {10.31341/jios.44.2.2}, volume = {44}, number = {2}, issn = {1846-3312}, title = {A Comparison of Approaches for Measuring the Semantic Similarity of Short Texts Based on Word Embeddings}, keyword = {semantic similarity, short texts similarity, word embedding, Word2Vec, FastText, TF-IDF} }
@article{article, author = {Babi\'{c}, Karlo and Guerra, Francesco and Martin\v{c}i\'{c}- Ip\v{s}i\'{c}, Sanda and Me\v{s}trovi\'{c}, Ana}, year = {2020}, pages = {231-246}, DOI = {10.31341/jios.44.2.2}, keywords = {semantic similarity, short texts similarity, word embedding, Word2Vec, FastText, TF-IDF}, journal = {Journal of information and organizational sciences}, doi = {10.31341/jios.44.2.2}, volume = {44}, number = {2}, issn = {1846-3312}, title = {A Comparison of Approaches for Measuring the Semantic Similarity of Short Texts Based on Word Embeddings}, keyword = {semantic similarity, short texts similarity, word embedding, Word2Vec, FastText, TF-IDF} }

Časopis indeksira:


  • Web of Science Core Collection (WoSCC)
    • Emerging Sources Citation Index (ESCI)
  • Scopus


Citati:





    Contrast
    Increase Font
    Decrease Font
    Dyslexic Font