PANDORA Talks: Personality and Demographics on Reddit (CROSBI ID 704187)
Prilog sa skupa u zborniku | izvorni znanstveni rad | međunarodna recenzija
Podaci o odgovornosti
Gjurković, Matej ; Karan, Mladen ; Vukojević, Iva ; Bošnjak, Mihaela ; Šnajder, Jan
engleski
PANDORA Talks: Personality and Demographics on Reddit
Personality and demographics are important variables in social sciences and computational sociolinguistics. However, datasets with both personality and demographic labels are scarce. To address this, we present PANDORA, the first dataset of Reddit comments of 10k users partially labeled with three personality models and demographics (age, gender, and location), including 1.6k users labeled with the well- established Big 5 personality model. We showcase the usefulness of this dataset on three experiments, where we leverage the more readily available data from other personality models to predict the Big 5 traits, analyze gender classification biases arising from psycho- demographic variables, and carry out a confirmatory and exploratory analysis based on psychological theories. Finally, we present benchmark prediction models for all personality and demographic variables.
Personality ; Text Analysis ; Natural Language Processing ; Social Network Analysis ; Reddit ; Computational Social Science
nije evidentirano
nije evidentirano
nije evidentirano
nije evidentirano
nije evidentirano
nije evidentirano
Podaci o prilogu
138-152.
2021.
objavljeno
10.18653/v1/2021.socialnlp-1.12
Podaci o matičnoj publikaciji
Proceedings of the Ninth International Workshop on Natural Language Processing for Social Media
Association for Computational Linguistics
online: Association for Computational Linguistics (ACL)
Podaci o skupu
9th International Workshop on Natural Language Processing for Social Media
predavanje
06.06.2021-11.06.2021
online
Povezanost rada
Interdisciplinarne tehničke znanosti, Psihologija, Računarstvo