Pretražite po imenu i prezimenu autora, mentora, urednika, prevoditelja

Napredna pretraga

Pregled bibliografske jedinice broj: 792822

Discriminating between Closely Related Languages on Twitter


Ljubešić, Nikola; Kranjčić, Denis
Discriminating between Closely Related Languages on Twitter // Informatica (Ljubljana), 39 (2015), 1; 1-8 (međunarodna recenzija, članak, znanstveni)


CROSBI ID: 792822 Za ispravke kontaktirajte CROSBI podršku putem web obrasca

Naslov
Discriminating between Closely Related Languages on Twitter

Autori
Ljubešić, Nikola ; Kranjčić, Denis

Izvornik
Informatica (Ljubljana) (0350-5596) 39 (2015), 1; 1-8

Vrsta, podvrsta i kategorija rada
Radovi u časopisima, članak, znanstveni

Ključne riječi
microblogging; language identification; closely related languages

Sažetak
In this paper we tackle the problem of discriminating Twitter users by the language they tweet in, taking into account very similar South-Slavic languages – Bosnian, Croatian, Montenegrin and Serbian. We apply the supervised machine learning approach by annotating a subset of 500 users from an existing Twitter collection by the language the users primarily tweet in. We show that by using a simple bag-of- words model, univariate feature selection, 320 strongest features and a standard classifier, we reach user classification accuracy of ∼98%. Annotating the whole 63, 160 users strong Twitter collection with the best performing classifier and visualizing it on a map via tweet geo-information, we produce a Twitter language map which clearly depicts the robustness of the classifier.

Izvorni jezik
Engleski

Znanstvena područja
Informacijske i komunikacijske znanosti



POVEZANOST RADA


Ustanove:
Filozofski fakultet, Zagreb

Profili:

Avatar Url Nikola Ljubešić (autor)

Citiraj ovu publikaciju:

Ljubešić, Nikola; Kranjčić, Denis
Discriminating between Closely Related Languages on Twitter // Informatica (Ljubljana), 39 (2015), 1; 1-8 (međunarodna recenzija, članak, znanstveni)
Ljubešić, N. & Kranjčić, D. (2015) Discriminating between Closely Related Languages on Twitter. Informatica (Ljubljana), 39 (1), 1-8.
@article{article, author = {Ljube\v{s}i\'{c}, Nikola and Kranj\v{c}i\'{c}, Denis}, year = {2015}, pages = {1-8}, keywords = {microblogging, language identification, closely related languages}, journal = {Informatica (Ljubljana)}, volume = {39}, number = {1}, issn = {0350-5596}, title = {Discriminating between Closely Related Languages on Twitter}, keyword = {microblogging, language identification, closely related languages} }
@article{article, author = {Ljube\v{s}i\'{c}, Nikola and Kranj\v{c}i\'{c}, Denis}, year = {2015}, pages = {1-8}, keywords = {microblogging, language identification, closely related languages}, journal = {Informatica (Ljubljana)}, volume = {39}, number = {1}, issn = {0350-5596}, title = {Discriminating between Closely Related Languages on Twitter}, keyword = {microblogging, language identification, closely related languages} }

Časopis indeksira:


  • Web of Science Core Collection (WoSCC)
    • Emerging Sources Citation Index (ESCI)
  • Scopus





Contrast
Increase Font
Decrease Font
Dyslexic Font