Pregled bibliografske jedinice broj: 1155535
How good BERT based models are in sentiment analysis of Croatian tweets: comparison of four multilingual BERTs
How good BERT based models are in sentiment analysis of Croatian tweets: comparison of four multilingual BERTs // Proceedings of 32nd International Scientific Conference Central European Conference on Information and Intelligent Systems / Vrček, Neven ; Pergler, Elisabeth ; Grđ, Petra (ur.).
Varaždin: Fakultet organizacije i informatike Sveučilišta u Zagrebu, 2021. str. 175-182 (predavanje, međunarodna recenzija, cjeloviti rad (in extenso), znanstveni)
CROSBI ID: 1155535 Za ispravke kontaktirajte CROSBI podršku putem web obrasca
Naslov
How good BERT based models are in sentiment analysis of Croatian tweets: comparison of four multilingual BERTs
Autori
Ptiček, Martina
Vrsta, podvrsta i kategorija rada
Radovi u zbornicima skupova, cjeloviti rad (in extenso), znanstveni
Izvornik
Proceedings of 32nd International Scientific Conference Central European Conference on Information and Intelligent Systems
/ Vrček, Neven ; Pergler, Elisabeth ; Grđ, Petra - Varaždin : Fakultet organizacije i informatike Sveučilišta u Zagrebu, 2021, 175-182
Skup
32nd Central European Conference on Information and Intelligent Systems (CECIIS 2021)
Mjesto i datum
Varaždin, Hrvatska, 13.10.2021. - 15.10.2021
Vrsta sudjelovanja
Predavanje
Vrsta recenzije
Međunarodna recenzija
Ključne riječi
Sentiment analysis, contextual word embeddings, multilingual BERT, Croatian language
Sažetak
Contextual word embeddings like BERT or GPT give the state-of-the-art results in a vast array of tasks in NLP - especially when applied to English datasets, given the fact that these models themselves were trained on numerous data in English language. However, the successfulness of these models has not yet been sufficiently researched for low resource languages, as Croatian. This paper describes a comparison between the application of BERT based multilingual word embeddings (mBERT, DistilBERT, XLM-RoBERTa, CroSloEngual) in sentiment analysis on tweets in Croatian language. The article shows that BERT based multilingual models give good results in sentiment analysis in Croatian language, particularly the models trained on larger sets of data in Croatian as XLM-RoBERTa and CroSloEngual.
Izvorni jezik
Engleski
Znanstvena područja
Informacijske i komunikacijske znanosti
POVEZANOST RADA
Ustanove:
Fakultet organizacije i informatike, Varaždin
Profili:
Martina Ptiček
(autor)
Citiraj ovu publikaciju:
Časopis indeksira:
- Web of Science Core Collection (WoSCC)
- Social Science Citation Index (SSCI)
- Conference Proceedings Citation Index - Social Sciences & Humanities (CPCI-SSH)