Napredna pretraga

Pregled bibliografske jedinice broj: 761038

Near Language Identification using NooJ


Bekavac, Božo; Kocijan, Kristina; Tadić, Marko
Near Language Identification using NooJ // Formalising Natural Languages with NooJ 2014: Selected papers from the NooJ 2014 International Conference / Monti, Johanna ; Silberztein, Max ; Monteleone, Mario ; Pia di Buono, Maria (ur.).
Newcastle upon Tyne: Cambridge Scholars Publishing, 2015. str. 152-166


Naslov
Near Language Identification using NooJ

Autori
Bekavac, Božo ; Kocijan, Kristina ; Tadić, Marko

Vrsta, podvrsta i kategorija rada
Poglavlja u knjigama, znanstveni

Knjiga
Formalising Natural Languages with NooJ 2014: Selected papers from the NooJ 2014 International Conference

Urednik/ci
Monti, Johanna ; Silberztein, Max ; Monteleone, Mario ; Pia di Buono, Maria

Izdavač
Cambridge Scholars Publishing

Grad
Newcastle upon Tyne

Godina
2015

Raspon stranica
152-166

ISBN
1-4438-7558-9

Ključne riječi
Near language identification, Croatian language, Serbian language, local grammars, NooJ

Sažetak
In this work we took a linguistic knowledge aware approach tailored for a specific pair of languages. We use NooJ as a core part of a system designed for automatic identification of near languages, Croatian and Serbian in particular. We use several levels of NooJ processing capabilities. First, we apply specially designed lexical transducers for the detection of the typical morphological spots in language. Then we apply the syntactic grammars for the detection of verb da verb syntagmas, characteristic for Serbian language. Finally, we measure discrepancies between texts provided by text processing. The output is generated according to predefined voting principle using AutoHotkey program. Our results show high F1 measures for language identification of Croatian and Serbian texts.

Izvorni jezik
Engleski

Znanstvena područja
Informacijske i komunikacijske znanosti, Filologija



POVEZANOST RADA


Projekt / tema
130-1300646-0645 - Hrvatski jezični resursi i njihovo obilježavanje (Marko Tadić, )
130-1300646-1776 - Računalna sintaksa hrvatskoga jezika (Zdravko Dovedan Han, )

Ustanove
Filozofski fakultet, Zagreb