Detection of Hate Speech Spreaders with BERT

Dukić, David; Sović Kržić, Ana

Pregled bibliografske jedinice broj: 1149447

Detection of Hate Speech Spreaders with BERT

Dukić, David; Sović Kržić, Ana

Detection of Hate Speech Spreaders with BERT // CLEF 2021 Labs and Workshops, Notebook Papers / Faggioli, Guglielmo ; Ferro, Nicola ; Joly, Alexis ; Maistro, Maria ; Piroi, Florina (ur.).
Bukurešt, 2021. str. 1-10 (radionica, međunarodna recenzija, cjeloviti rad (in extenso), znanstveni)

CROSBI ID: 1149447 Za ispravke kontaktirajte CROSBI podršku putem web obrasca

Naslov
Detection of Hate Speech Spreaders with BERT

Autori
Dukić, David ; Sović Kržić, Ana

Vrsta, podvrsta i kategorija rada
Radovi u zbornicima skupova, cjeloviti rad (in extenso), znanstveni

Izvornik
CLEF 2021 Labs and Workshops, Notebook Papers / Faggioli, Guglielmo ; Ferro, Nicola ; Joly, Alexis ; Maistro, Maria ; Piroi, Florina - Bukurešt, 2021, 1-10

Skup
CLEF 2021

Mjesto i datum
Bukurešt, Rumunjska, 21.09.2021. - 24.09.2021

Vrsta sudjelovanja
Radionica

Vrsta recenzije
Međunarodna recenzija

Ključne riječi
BERT ; fine-tuning ; indicators ; logistic regression

Sažetak
As social media grows, more and more users are disseminating hate speech through their posts. This often comes as a consequence of feeling a false security and anonymity in virtual environment. To stop hate speech spreaders, researchers started developing machine learning systems that automatically detect spreaders of hate speech based on the contents of their posts. This paper describes one such system which was trained on a corpus of English Twitter posts with a goal to predict if author of the given posts spreads hate speech or not. The features were crafted using fine-tuned BERT contextualized embeddings summed over the last 12 hidden states corresponding to the classification token, concatenated with the three binary variables called indicators. Binary variables were indicating whether hashtag, retweet or url were present in author's tweet posts, respectively. Feature vectors were then fed into a Logistic Regression classifier. Described model achieved 75% of accuracy score on the test set.

Izvorni jezik
Engleski

Znanstvena područja
Računarstvo

POVEZANOST RADA

Ustanove:
Fakultet elektrotehnike i računarstva, Zagreb

Profili:

Ana Sović (autor)