Nalazite se na CroRIS probnoj okolini. Ovdje evidentirani podaci neće biti pohranjeni u Informacijskom sustavu znanosti RH. Ako je ovo greška, CroRIS produkcijskoj okolini moguće je pristupi putem poveznice www.croris.hr
izvor podataka: crosbi !

Efficient Discrimination Between Closely Related Languages (CROSBI ID 594337)

Prilog sa skupa u zborniku | izvorni znanstveni rad | međunarodna recenzija

Tiedemann, Jörg ; Ljubešić, Nikola Efficient Discrimination Between Closely Related Languages // Proceedings of COLING 2012. Mumbai, 2012. str. 2619-2634

Podaci o odgovornosti

Tiedemann, Jörg ; Ljubešić, Nikola

engleski

Efficient Discrimination Between Closely Related Languages

In this paper, we revisit the problem of language identification with the focus on proper discrimination between closely related languages. Strong similarities between certain languages make it very hard to classify them correctly using standard methods that have been proposed in the literature. Dedicated models that focus on specific discrimination tasks help to improve the accuracy of general-purpose language identification tools. We propose and compare methods based on simple document classification techniques trained on parallel corpora of closely related languages and methods that emphasize discriminating features in terms of blacklisted words. Our experiments demonstrate that these techniques are highly accurate for the difficult task of discriminating between Bosnian, Croatian and Serbian. The best setup yields an absolute improvement of over 9% in accuracy over the best performing baseline using a state-of-the-art language identification tool.

language identification; language discrimination; closely related languages

nije evidentirano

nije evidentirano

nije evidentirano

nije evidentirano

nije evidentirano

nije evidentirano

Podaci o prilogu

2619-2634.

2012.

objavljeno

Podaci o matičnoj publikaciji

Proceedings of COLING 2012

Mumbai:

Podaci o skupu

COLING 2012

predavanje

10.12.2012-15.12.2012

Mumbai, Indija

Povezanost rada

Informacijske i komunikacijske znanosti