Pretražite po imenu i prezimenu autora, mentora, urednika, prevoditelja

Napredna pretraga

Pregled bibliografske jedinice broj: 1247233

Retweet Prediction Based on Heterogeneous Data Sources: The Combination of Text and Multilayer Network Features


Meštrović, Ana; Petrović, Milan; Beliga, Slobodan
Retweet Prediction Based on Heterogeneous Data Sources: The Combination of Text and Multilayer Network Features // Applied sciences (Basel), 12 (2022), 21; 11216-11237 doi:10.3390/app122111216 (međunarodna recenzija, članak, znanstveni)


CROSBI ID: 1247233 Za ispravke kontaktirajte CROSBI podršku putem web obrasca

Naslov
Retweet Prediction Based on Heterogeneous Data Sources: The Combination of Text and Multilayer Network Features

Autori
Meštrović, Ana ; Petrović, Milan ; Beliga, Slobodan

Izvornik
Applied sciences (Basel) (2076-3417) 12 (2022), 21; 11216-11237

Vrsta, podvrsta i kategorija rada
Radovi u časopisima, članak, znanstveni

Ključne riječi
retweet prediction ; multilayer network ; natural language processing ; text features ; multilayer network ; Twitter data

Sažetak
Retweet prediction is an important task in the context of various problems, such as information spreading analysis, automatic fake news detection, social media monitoring, etc. In this study, we explore retweet prediction based on heterogeneous data sources. In order to classify a tweet according to the number of retweets, we combine features extracted from the multilayer network and text. More specifically, we introduce a multilayer framework for the multilayer network representation of Twitter. This formalism captures different users’ actions and complex relationships, as well as other key properties of communication on Twitter. Next, we select a set of local network measures from each layer and construct a set of multilayer network features. We also adopt a BERT-based language model, namely Cro-CoV-cseBERT, to capture the high-level semantics and structure of tweets as a set of text features. We then trained six machine learning (ML) algorithms: random forest, multilayer perceptron, light gradient boosting machine, category-embedding model, neural oblivious decision ensembles, and an attentive interpretable tabular learning model for the retweet-prediction task. We compared the performance of all six algorithms in three different setups: with text features only, with multilayer network features only, and with both feature sets. We evaluated all the setups in terms of standard evaluation measures. For this task, we first prepared an empirical dataset of 199, 431 tweets in Croatian posted between 1 January 2020 and 31 May 2021. Our results indicate that the prediction model performs better by integrating multilayer network features with text features than by using only one set of features.

Izvorni jezik
Engleski

Znanstvena područja
Računarstvo, Informacijske i komunikacijske znanosti



POVEZANOST RADA


Projekti:
HRZZ-IP-CORONA-2020-04-2061 - Višeslojni okvir za karakterizaciju širenja informacija putem društvenih medija tijekom krize COVID-19 (InfoCoV) (Meštrović, Ana, HRZZ - 2020-04) ( CroRIS)
NadSve-Sveučilište u Rijeci-uniri-drustv-18-38 - Postupci mjerenja semantičke sličnosti tekstova (SemText) (Meštrović, Ana, NadSve - Natječaj za dodjelu sredstava potpore znanstvenim istraživanjima na Sveučilištu u Rijeci za 2018. godinu - projekti iskusnih znanstvenika i umjetnika) ( CroRIS)

Ustanove:
Fakultet informatike i digitalnih tehnologija, Rijeka

Profili:

Avatar Url Slobodan Beliga (autor)

Avatar Url Milan Petrović (autor)

Avatar Url Ana Meštrović (autor)

Poveznice na cjeloviti tekst rada:

doi www.mdpi.com www.mdpi.com

Poveznice na istraživačke podatke:

github.com

Citiraj ovu publikaciju:

Meštrović, Ana; Petrović, Milan; Beliga, Slobodan
Retweet Prediction Based on Heterogeneous Data Sources: The Combination of Text and Multilayer Network Features // Applied sciences (Basel), 12 (2022), 21; 11216-11237 doi:10.3390/app122111216 (međunarodna recenzija, članak, znanstveni)
Meštrović, A., Petrović, M. & Beliga, S. (2022) Retweet Prediction Based on Heterogeneous Data Sources: The Combination of Text and Multilayer Network Features. Applied sciences (Basel), 12 (21), 11216-11237 doi:10.3390/app122111216.
@article{article, author = {Me\v{s}trovi\'{c}, Ana and Petrovi\'{c}, Milan and Beliga, Slobodan}, year = {2022}, pages = {11216-11237}, DOI = {10.3390/app122111216}, keywords = {retweet prediction, multilayer network, natural language processing, text features, multilayer network, Twitter data}, journal = {Applied sciences (Basel)}, doi = {10.3390/app122111216}, volume = {12}, number = {21}, issn = {2076-3417}, title = {Retweet Prediction Based on Heterogeneous Data Sources: The Combination of Text and Multilayer Network Features}, keyword = {retweet prediction, multilayer network, natural language processing, text features, multilayer network, Twitter data} }
@article{article, author = {Me\v{s}trovi\'{c}, Ana and Petrovi\'{c}, Milan and Beliga, Slobodan}, year = {2022}, pages = {11216-11237}, DOI = {10.3390/app122111216}, keywords = {retweet prediction, multilayer network, natural language processing, text features, multilayer network, Twitter data}, journal = {Applied sciences (Basel)}, doi = {10.3390/app122111216}, volume = {12}, number = {21}, issn = {2076-3417}, title = {Retweet Prediction Based on Heterogeneous Data Sources: The Combination of Text and Multilayer Network Features}, keyword = {retweet prediction, multilayer network, natural language processing, text features, multilayer network, Twitter data} }

Časopis indeksira:


  • Current Contents Connect (CCC)
  • Web of Science Core Collection (WoSCC)
    • Science Citation Index Expanded (SCI-EXP)
    • Social Science Citation Index (SSCI)
    • SCI-EXP, SSCI i/ili A&HCI
  • Scopus


Uključenost u ostale bibliografske baze podataka::


  • Computer and Information Systems Abstracts


Citati:





    Contrast
    Increase Font
    Decrease Font
    Dyslexic Font