Pretražite po imenu i prezimenu autora, mentora, urednika, prevoditelja

Napredna pretraga

Pregled bibliografske jedinice broj: 616773

Efficient Discrimination Between Closely Related Languages


Tiedemann, Jörg; Ljubešić, Nikola
Efficient Discrimination Between Closely Related Languages // Proceedings of COLING 2012
Mumbai, 2012. str. 2619-2634 (predavanje, međunarodna recenzija, cjeloviti rad (in extenso), znanstveni)


CROSBI ID: 616773 Za ispravke kontaktirajte CROSBI podršku putem web obrasca

Naslov
Efficient Discrimination Between Closely Related Languages

Autori
Tiedemann, Jörg ; Ljubešić, Nikola

Vrsta, podvrsta i kategorija rada
Radovi u zbornicima skupova, cjeloviti rad (in extenso), znanstveni

Izvornik
Proceedings of COLING 2012 / - Mumbai, 2012, 2619-2634

Skup
COLING 2012

Mjesto i datum
Mumbai, Indija, 10.12.2012. - 15.12.2012

Vrsta sudjelovanja
Predavanje

Vrsta recenzije
Međunarodna recenzija

Ključne riječi
language identification; language discrimination; closely related languages

Sažetak
In this paper, we revisit the problem of language identification with the focus on proper discrimination between closely related languages. Strong similarities between certain languages make it very hard to classify them correctly using standard methods that have been proposed in the literature. Dedicated models that focus on specific discrimination tasks help to improve the accuracy of general-purpose language identification tools. We propose and compare methods based on simple document classification techniques trained on parallel corpora of closely related languages and methods that emphasize discriminating features in terms of blacklisted words. Our experiments demonstrate that these techniques are highly accurate for the difficult task of discriminating between Bosnian, Croatian and Serbian. The best setup yields an absolute improvement of over 9% in accuracy over the best performing baseline using a state-of-the-art language identification tool.

Izvorni jezik
Engleski

Znanstvena područja
Informacijske i komunikacijske znanosti



POVEZANOST RADA


Projekti:
FP7-288342

Ustanove:
Filozofski fakultet, Zagreb

Profili:

Avatar Url Nikola Ljubešić (autor)


Citiraj ovu publikaciju:

Tiedemann, Jörg; Ljubešić, Nikola
Efficient Discrimination Between Closely Related Languages // Proceedings of COLING 2012
Mumbai, 2012. str. 2619-2634 (predavanje, međunarodna recenzija, cjeloviti rad (in extenso), znanstveni)
Tiedemann, J. & Ljubešić, N. (2012) Efficient Discrimination Between Closely Related Languages. U: Proceedings of COLING 2012.
@article{article, author = {Tiedemann, J\"{o}rg and Ljube\v{s}i\'{c}, Nikola}, year = {2012}, pages = {2619-2634}, keywords = {language identification, language discrimination, closely related languages}, title = {Efficient Discrimination Between Closely Related Languages}, keyword = {language identification, language discrimination, closely related languages}, publisherplace = {Mumbai, Indija} }
@article{article, author = {Tiedemann, J\"{o}rg and Ljube\v{s}i\'{c}, Nikola}, year = {2012}, pages = {2619-2634}, keywords = {language identification, language discrimination, closely related languages}, title = {Efficient Discrimination Between Closely Related Languages}, keyword = {language identification, language discrimination, closely related languages}, publisherplace = {Mumbai, Indija} }




Contrast
Increase Font
Decrease Font
Dyslexic Font