Pretražite po imenu i prezimenu autora, mentora, urednika, prevoditelja

Napredna pretraga

Pregled bibliografske jedinice broj: 324219

Language identification: how to distinguish similar languages?


Ljubešić, Nikola; Mikelić, Nives; Boras, Damir
Language identification: how to distinguish similar languages? // ITI ..., 1 (2007), 541-546 (podatak o recenziji nije dostupan, članak, znanstveni)


CROSBI ID: 324219 Za ispravke kontaktirajte CROSBI podršku putem web obrasca

Naslov
Language identification: how to distinguish similar languages?

Autori
Ljubešić, Nikola ; Mikelić, Nives ; Boras, Damir

Izvornik
ITI ... (1330-1012) 1 (2007); 541-546

Vrsta, podvrsta i kategorija rada
Radovi u časopisima, članak, znanstveni

Ključne riječi
Written language identification; Croatian language; second-order Markov model; web-corpus; most frequent words method; forbidden words method

Sažetak
The goal of this paper is to discuss the language identification problem of Croatian, language that even state-of-the-art language identification tools fi nd hard to distinguish from similar languages, such as Serbian, Slovenian or Slovak language. We developed the tool that implements the list of Croatian most frequent words with the threshold that each document needs to satisfy, we added the specific characters elimination rule, applied second-order Markov model classification and a rule of forbidden words. Finally, we built up the tool that overperforms current tools in discriminating between these similar languages.

Izvorni jezik
Engleski

Znanstvena područja
Informacijske i komunikacijske znanosti



POVEZANOST RADA


Projekti:
130-1301679-1380 - Hrvatska rječnička baština i hrvatski europski identitet (Boras, Damir, MZOS ) ( CroRIS)

Ustanove:
Filozofski fakultet, Zagreb


Citiraj ovu publikaciju:

Ljubešić, Nikola; Mikelić, Nives; Boras, Damir
Language identification: how to distinguish similar languages? // ITI ..., 1 (2007), 541-546 (podatak o recenziji nije dostupan, članak, znanstveni)
Ljubešić, N., Mikelić, N. & Boras, D. (2007) Language identification: how to distinguish similar languages?. ITI ..., 1, 541-546.
@article{article, author = {Ljube\v{s}i\'{c}, Nikola and Mikeli\'{c}, Nives and Boras, Damir}, year = {2007}, pages = {541-546}, keywords = {Written language identification, Croatian language, second-order Markov model, web-corpus, most frequent words method, forbidden words method}, journal = {ITI ...}, volume = {1}, issn = {1330-1012}, title = {Language identification: how to distinguish similar languages?}, keyword = {Written language identification, Croatian language, second-order Markov model, web-corpus, most frequent words method, forbidden words method} }
@article{article, author = {Ljube\v{s}i\'{c}, Nikola and Mikeli\'{c}, Nives and Boras, Damir}, year = {2007}, pages = {541-546}, keywords = {Written language identification, Croatian language, second-order Markov model, web-corpus, most frequent words method, forbidden words method}, journal = {ITI ...}, volume = {1}, issn = {1330-1012}, title = {Language identification: how to distinguish similar languages?}, keyword = {Written language identification, Croatian language, second-order Markov model, web-corpus, most frequent words method, forbidden words method} }

Uključenost u ostale bibliografske baze podataka::


  • Web of Science
  • SCOPUS
  • INSPEC
  • IEEE





Contrast
Increase Font
Decrease Font
Dyslexic Font