Pregled bibliografske jedinice broj: 300859
Implementation of Croatian NERC system
Implementation of Croatian NERC system // Proceedings of the Workshop on Balto-Slavonic Natural Language Processing 2007, Special Theme: Information Extraction and Enabling Technologies / Piskorski, Jakub ; Tanev, Hristo ; Pouliquen, Bruno ; Steinberger, Ralf (ur.).
Prag: Association for Computational Linguistics (ACL), 2007. str. 11-18 (predavanje, međunarodna recenzija, cjeloviti rad (in extenso), znanstveni)
CROSBI ID: 300859 Za ispravke kontaktirajte CROSBI podršku putem web obrasca
Naslov
Implementation of Croatian NERC system
Autori
Bekavac, Božo ; Tadić, Marko
Vrsta, podvrsta i kategorija rada
Radovi u zbornicima skupova, cjeloviti rad (in extenso), znanstveni
Izvornik
Proceedings of the Workshop on Balto-Slavonic Natural Language Processing 2007, Special Theme: Information Extraction and Enabling Technologies
/ Piskorski, Jakub ; Tanev, Hristo ; Pouliquen, Bruno ; Steinberger, Ralf - Prag : Association for Computational Linguistics (ACL), 2007, 11-18
ISBN
978-1-932432-88-6
Skup
45th Annual Meeting of the Association of Computational Linguistics (ACL 2007)
Mjesto i datum
Prag, Češka Republika, 23.06.2007. - 30.06.2007
Vrsta sudjelovanja
Predavanje
Vrsta recenzije
Međunarodna recenzija
Ključne riječi
named entity recognition and classification; Croatian; computational linguistics; information extraction
Sažetak
In this paper a system for Named Entity Recognition and Classification in Croatian language is described. The system is com-posed of the module for sentence segmen-tation, inflectional lexicon of common words, inflectional lexicon of names and regular local grammars for automatic rec-ognition of numerical and temporal expres-sions. After the first step (sentence segmen-tation), the system attaches to each token its full morphosyntactic description and appropriate lemma and additional tags for potential categories for names without dis-ambiguation. The third step (the core of the system) is the application of a set of rules for recognition and classification of named entities in already annotated texts. Rules based on described strategies (like internal and external evidence) are applied in cas-cade of transducers in defined order. Al-though there are other classification sys-tems for NEs, the results of our system are annotated NEs which are following MUC-7 specification. System is applied on infor-mative and noninformative texts and results are compared. F-measure of the system ap-plied on informative texts yields over 90%.
Izvorni jezik
Engleski
Znanstvena područja
Informacijske i komunikacijske znanosti, Filologija
POVEZANOST RADA
Projekti:
036-1300646-1986 - Otkrivanje znanja u tekstnim podacima (Dalbelo-Bašić, Bojana, MZO ) ( CroRIS)
130-1300646-0645 - Hrvatski jezični resursi i njihovo obilježavanje (Tadić, Marko, MZOS ) ( CroRIS)
130-1300646-1002 - Leksička semantika u izradi Hrvatskog WordNeta (Raffaelli, Ida, MZOS ) ( CroRIS)
Ustanove:
Filozofski fakultet, Zagreb