Pregled bibliografske jedinice broj: 761038
Near Language Identification using NooJ
Near Language Identification using NooJ // Formalising Natural Languages with NooJ 2014: Selected papers from the NooJ 2014 International Conference / Monti, Johanna ; Silberztein, Max ; Monteleone, Mario ; Pia di Buono, Maria (ur.).
Newcastle upon Tyne: Cambridge Scholars Publishing, 2015. str. 152-166
CROSBI ID: 761038 Za ispravke kontaktirajte CROSBI podršku putem web obrasca
Naslov
Near Language Identification using NooJ
Autori
Bekavac, Božo ; Kocijan, Kristina ; Tadić, Marko
Vrsta, podvrsta i kategorija rada
Poglavlja u knjigama, znanstveni
Knjiga
Formalising Natural Languages with NooJ 2014: Selected papers from the NooJ 2014 International Conference
Urednik/ci
Monti, Johanna ; Silberztein, Max ; Monteleone, Mario ; Pia di Buono, Maria
Izdavač
Cambridge Scholars Publishing
Grad
Newcastle upon Tyne
Godina
2015
Raspon stranica
152-166
ISBN
1-4438-7558-9
Ključne riječi
near language identification, Croatian language, Serbian language, local grammars, NooJ
Sažetak
In this work we took a linguistic knowledge aware approach tailored for a specific pair of languages. We use NooJ as a core part of a system designed for automatic identification of near languages, Croatian and Serbian in particular. We use several levels of NooJ processing capabilities. First, we apply specially designed lexical transducers for the detection of the typical morphological spots in language. Then we apply the syntactic grammars for the detection of verb da verb syntagmas, characteristic for Serbian language. Finally, we measure discrepancies between texts provided by text processing. The output is generated according to predefined voting principle using AutoHotkey program. Our results show high F1 measures for language identification of Croatian and Serbian texts.
Izvorni jezik
Engleski
Znanstvena područja
Informacijske i komunikacijske znanosti, Filologija
POVEZANOST RADA
Projekti:
130-1300646-0645 - Hrvatski jezični resursi i njihovo obilježavanje (Tadić, Marko, MZOS ) ( CroRIS)
130-1300646-1776 - Računalna sintaksa hrvatskoga jezika (Dovedan Han, Zdravko, MZOS ) ( CroRIS)
Ustanove:
Filozofski fakultet, Zagreb