Napredna pretraga

Pregled bibliografske jedinice broj: 364770

Document Representation Methods for News Event Detection in Croatian


Ljubešić, Nikola; Agić, Željko; Bakarić, Nikola
Document Representation Methods for News Event Detection in Croatian // Proceedings of the 6th International Conference on Formal Approaches to South Slavic and Balkan Languages / Tadić, Marko ; Dimitrova-Vulchanova, Mila ; Koeva, Svetla (ur.).
Zagreb: Croatian Language Technologies Society, 2008. str. 79-84 (predavanje, međunarodna recenzija, cjeloviti rad (in extenso), znanstveni)


Naslov
Document Representation Methods for News Event Detection in Croatian

Autori
Ljubešić, Nikola ; Agić, Željko ; Bakarić, Nikola

Vrsta, podvrsta i kategorija rada
Radovi u zbornicima skupova, cjeloviti rad (in extenso), znanstveni

Izvornik
Proceedings of the 6th International Conference on Formal Approaches to South Slavic and Balkan Languages / Tadić, Marko ; Dimitrova-Vulchanova, Mila ; Koeva, Svetla - Zagreb : Croatian Language Technologies Society, 2008, 79-84

ISBN
978-953-55375-0-2

Skup
6th International Conference on Formal Approaches to South Slavic and Balkan Languages (FASSBL 2008)

Mjesto i datum
Dubrovnik, Hrvatska, 25-28.09.2008.

Vrsta sudjelovanja
Predavanje

Vrsta recenzije
Međunarodna recenzija

Ključne riječi
Document representation; document clustering; news event detection

Sažetak
Constant increase in the amount of available data in the world in general demands new organizational and representational ideas and approaches. Document clustering as a method for event detection uses, supplements and upgrades existing information retrieval methods in order to improve knowledge management and representation. This article describes the research done in order to determine the impact of various methods of document representation on cluster analysis. Several statistical and linguistic NLP morphological normalization methods of document representation are tested in an event detection scenario. Event detection was conducted using online newspaper articles issued on a single day. A cluster analysis was done using the various document representation methods and a clustering algorithm. The results were then compared against a human evaluated golden standard. The results show that both statistical and linguistic methods simplify the representational complexity and minimally improve the results which lead to the conclusion that for this task statistical methods should be preferred.

Izvorni jezik
Engleski

Znanstvena područja
Računarstvo, Informacijske i komunikacijske znanosti, Filologija



POVEZANOST RADA


Projekt / tema
130-1300646-1776 - Računalna sintaksa hrvatskoga jezika (Zdravko Dovedan Han, )
130-1301679-1380 - Hrvatska rječnička baština i hrvatski europski identitet (Damir Boras, )

Ustanove
Filozofski fakultet, Zagreb