Discriminating between Closely Related Languages on Twitter

Ljubešić, Nikola; Kranjčić, Denis

izvor podataka: crosbi !

Discriminating between Closely Related Languages on Twitter (CROSBI ID 223706)

Prilog u časopisu | izvorni znanstveni rad | međunarodna recenzija

Ljubešić, Nikola ; Kranjčić, Denis Discriminating between Closely Related Languages on Twitter // Informatica (Ljubljana), 39 (2015), 1; 1-8

Podaci o odgovornosti

Autori

Ljubešić, Nikola ; Kranjčić, Denis

Osnovni podaci na izvornom jeziku
Osnovni podaci na ostalim jezicima

Jezik

engleski

Naslov

Discriminating between Closely Related Languages on Twitter

Sažetak

In this paper we tackle the problem of discriminating Twitter users by the language they tweet in, taking into account very similar South-Slavic languages – Bosnian, Croatian, Montenegrin and Serbian. We apply the supervised machine learning approach by annotating a subset of 500 users from an existing Twitter collection by the language the users primarily tweet in. We show that by using a simple bag-of- words model, univariate feature selection, 320 strongest features and a standard classifier, we reach user classification accuracy of ∼98%. Annotating the whole 63, 160 users strong Twitter collection with the best performing classifier and visualizing it on a map via tweet geo-information, we produce a Twitter language map which clearly depicts the robustness of the classifier.

Ključne riječi

microblogging; language identification; closely related languages

Napomena

nije evidentirano

Jezik

nije evidentirano

Naslov

nije evidentirano

Sažetak

nije evidentirano

Ključne riječi

nije evidentirano

Napomena

nije evidentirano

Podaci o izdanju

Časopis

Informatica (Ljubljana)

Volumen (broj)

39 (1)

Godina

2015.

Stranice rada

1-8

Status objave rada

objavljeno

ISSN

0350-5596

Povezanost rada

Povezane osobe

Nikola Ljubešić (CroRIS ID: 4119; MBZ: 272820) (autor/i)

Povezane ustanove

Filozofski fakultet u Zagrebu (130) (autorova ustanova)

Područje

Informacijske i komunikacijske znanosti

Poveznice

informatica.si

Indeksiranost

Scopus

Web of Science Core Collection, Emerging Sources Citation Index (WoSCC-ESCI)