Pregled bibliografske jedinice broj: 1132113
PANDORA Talks: Personality and Demographics on Reddit
PANDORA Talks: Personality and Demographics on Reddit // Proceedings of the Ninth International Workshop on Natural Language Processing for Social Media / Association for Computational Linguistics (ur.).
online: Association for Computational Linguistics (ACL), 2021. str. 138-152 doi:10.18653/v1/2021.socialnlp-1.12 (predavanje, međunarodna recenzija, cjeloviti rad (in extenso), znanstveni)
CROSBI ID: 1132113 Za ispravke kontaktirajte CROSBI podršku putem web obrasca
Naslov
PANDORA Talks: Personality and Demographics on
Reddit
Autori
Gjurković, Matej ; Karan, Mladen ; Vukojević, Iva ; Bošnjak, Mihaela ; Šnajder, Jan
Vrsta, podvrsta i kategorija rada
Radovi u zbornicima skupova, cjeloviti rad (in extenso), znanstveni
Izvornik
Proceedings of the Ninth International Workshop on Natural Language Processing for Social Media
/ Association for Computational Linguistics - Online : Association for Computational Linguistics (ACL), 2021, 138-152
Skup
9th International Workshop on Natural Language Processing for Social Media
Mjesto i datum
Online, 06.06.2021. - 11.06.2021
Vrsta sudjelovanja
Predavanje
Vrsta recenzije
Međunarodna recenzija
Ključne riječi
Personality ; Text Analysis ; Natural Language Processing ; Social Network Analysis ; Reddit ; Computational Social Science
Sažetak
Personality and demographics are important variables in social sciences and computational sociolinguistics. However, datasets with both personality and demographic labels are scarce. To address this, we present PANDORA, the first dataset of Reddit comments of 10k users partially labeled with three personality models and demographics (age, gender, and location), including 1.6k users labeled with the well- established Big 5 personality model. We showcase the usefulness of this dataset on three experiments, where we leverage the more readily available data from other personality models to predict the Big 5 traits, analyze gender classification biases arising from psycho- demographic variables, and carry out a confirmatory and exploratory analysis based on psychological theories. Finally, we present benchmark prediction models for all personality and demographic variables.
Izvorni jezik
Engleski
Znanstvena područja
Računarstvo, Interdisciplinarne tehničke znanosti, Psihologija
POVEZANOST RADA
Projekti:
HRZZ-IP-2020-02-8671 - Računalni modeli za predviđanje i analizu ličnosti na temelju teksta (psy.txt) (Šnajder, Jan, HRZZ - 2020-02) ( CroRIS)
Ustanove:
Fakultet elektrotehnike i računarstva, Zagreb