Near Language Identification using NooJ (CROSBI ID 54074)
Prilog u knjizi | izvorni znanstveni rad | međunarodna recenzija
Podaci o odgovornosti
Bekavac, Božo ; Kocijan, Kristina ; Tadić, Marko
engleski
Near Language Identification using NooJ
In this work we took a linguistic knowledge aware approach tailored for a specific pair of languages. We use NooJ as a core part of a system designed for automatic identification of near languages, Croatian and Serbian in particular. We use several levels of NooJ processing capabilities. First, we apply specially designed lexical transducers for the detection of the typical morphological spots in language. Then we apply the syntactic grammars for the detection of verb da verb syntagmas, characteristic for Serbian language. Finally, we measure discrepancies between texts provided by text processing. The output is generated according to predefined voting principle using AutoHotkey program. Our results show high F1 measures for language identification of Croatian and Serbian texts.
near language identification, Croatian language, Serbian language, local grammars, NooJ
nije evidentirano
nije evidentirano
nije evidentirano
nije evidentirano
nije evidentirano
nije evidentirano
Podaci o prilogu
152-166.
objavljeno
Podaci o knjizi
Formalising Natural Languages with NooJ 2014: Selected papers from the NooJ 2014 International Conference
Monti, Johanna ; Silberztein, Max ; Monteleone, Mario ; Pia di Buono, Maria
Newcastle upon Tyne: Cambridge Scholars Publishing
2015.
1-4438-7558-9