Characterisation of COVID-19-Related Tweets in the Croatian Language: Framework Based on the Cro-CoV-cseBERT Model

Babić, Karlo; Petrović, Milan; Beliga, Slobodan; Martinčić-Ipšić, Sanda; Matešić, Mihaela; Meštrović, Ana

izvor podataka: crosbi ✓

Characterisation of COVID-19-Related Tweets in the Croatian Language: Framework Based on the Cro-CoV-cseBERT Model (CROSBI ID 300396)

Prilog u časopisu | izvorni znanstveni rad | međunarodna recenzija

Babić, Karlo ; Petrović, Milan ; Beliga, Slobodan ; Martinčić-Ipšić, Sanda ; Matešić, Mihaela ; Meštrović, Ana Characterisation of COVID-19-Related Tweets in the Croatian Language: Framework Based on the Cro-CoV-cseBERT Model // Applied sciences (Basel), 11 (2021), 21; 10442, 22. doi: 10.3390/app112110442

Podaci o odgovornosti

Autori

Babić, Karlo ; Petrović, Milan ; Beliga, Slobodan ; Martinčić-Ipšić, Sanda ; Matešić, Mihaela ; Meštrović, Ana

Osnovni podaci na izvornom jeziku
Osnovni podaci na ostalim jezicima

Jezik

engleski

Naslov

Characterisation of COVID-19-Related Tweets in the Croatian Language: Framework Based on the Cro-CoV-cseBERT Model

Sažetak

This study aims to provide insights into the COVID-19-related communication on Twitter in the Republic of Croatia. For that purpose, we developed an NL-based framework that enables automatic analysis of a large dataset of tweets in the Croatian language. We collected and analysed 206, 196 tweets related to COVID-19 and constructed a dataset of 10, 000 tweets which we manually annotated with a sentiment label. We trained the Cro-CoV-cseBERT language model for the representation and clustering of tweets. Additionally, we compared the performance of four machine learning algorithms on the task of sentiment classification. After identifying the best performing setup of NLP methods, we applied the proposed framework in the task of characterisation of COVID-19 tweets in Croatia. More precisely, we performed sentiment analysis and tracked the sentiment over time. Furthermore, we detected how tweets are grouped into clusters with similar themes across three pandemic waves. Additionally, we characterised the tweets by analysing the distribution of sentiment polarity (in each thematic cluster and over time) and the number of retweets (in each thematic cluster and sentiment class). These results could be useful for additional research and interpretation in the domains of sociology, psychology or other sciences, as well as for the authorities, who could use them to address crisis communication problems.

Ključne riječi

sentiment analysis ; clustering ; BERT model ; natural language processing ; COVID-19 ; Twitter data ; social media

Napomena

nije evidentirano

Jezik

nije evidentirano

Naslov

nije evidentirano

Sažetak

nije evidentirano

Ključne riječi

nije evidentirano

Napomena

nije evidentirano

Podaci o izdanju

Časopis

Applied sciences (Basel)

Volumen (broj)

11 (21)

Godina

2021.

Broj rada

10442

Broj stranica

Status objave rada

objavljeno

e-ISSN

2076-3417

DOI

10.3390/app112110442

Povezanost rada

Povezane osobe

Karlo Babić (autor/i)

Slobodan Beliga (autor/i)

Sanda Martinčić-Ipšić (autor/i)

Mihaela Matešić (autor/i)

Ana Meštrović (autor/i)

Povezane ustanove

Filozofski fakultet u Rijeci (009) (autorova ustanova)

Sveučilište u Rijeci, Fakultet informatike i digitalnih tehnologija (318) (autorova ustanova)

Povezani projekti

Višeslojni okvir za karakterizaciju širenja informacija putem društvenih medija tijekom krize COVID-19 (rezultat rada na projektu)

Postupci mjerenja semantičke sličnosti tekstova (rezultat rada na projektu)

Područje

Informacijske i komunikacijske znanosti, Interdisciplinarne društvene znanosti, Računarstvo

Poveznice

doi.org

mdpi.com

Indeksiranost

Scopus

Current Contents Connect (CCC)

Web of Science Core Collection, Science Citation Index Expanded (WoSCC-SCI-Exp)

Web of Science Core Collection, SCI-Exp, SSCI & A&HCI (WoSCC-SCI-Exp, SSCI, A&HCI)