Nalazite se na CroRIS probnoj okolini. Ovdje evidentirani podaci neće biti pohranjeni u Informacijskom sustavu znanosti RH. Ako je ovo greška, CroRIS produkcijskoj okolini moguće je pristupi putem poveznice www.croris.hr
izvor podataka: crosbi !

Language identification: how to distinguish similar languages? (CROSBI ID 136992)

Prilog u časopisu | izvorni znanstveni rad

Ljubešić, Nikola ; Mikelić, Nives ; Boras, Damir Language identification: how to distinguish similar languages? // ITI ..., 1 (2007), 541-546

Podaci o odgovornosti

Ljubešić, Nikola ; Mikelić, Nives ; Boras, Damir

engleski

Language identification: how to distinguish similar languages?

The goal of this paper is to discuss the language identification problem of Croatian, language that even state-of-the-art language identification tools fi nd hard to distinguish from similar languages, such as Serbian, Slovenian or Slovak language. We developed the tool that implements the list of Croatian most frequent words with the threshold that each document needs to satisfy, we added the specific characters elimination rule, applied second-order Markov model classification and a rule of forbidden words. Finally, we built up the tool that overperforms current tools in discriminating between these similar languages.

Written language identification; Croatian language; second-order Markov model; web-corpus; most frequent words method; forbidden words method

nije evidentirano

nije evidentirano

nije evidentirano

nije evidentirano

nije evidentirano

nije evidentirano

Podaci o izdanju

1

2007.

541-546

objavljeno

1330-1012

Povezanost rada

Informacijske i komunikacijske znanosti