Pregled bibliografske jedinice broj: 198152
Automated news item categorization
Automated news item categorization // Proceedings of JSAI 2005 Workshop on Conversational Informatics, in conjunction with the 19th Annual Conference of The Japanese Society for Artificial Intelligence JSAI 2005 / Sumi, Yasuyuki ; Nishida, Toyoaki (ur.).
Kitakyushu: Kyoto University, 2005. str. 57-62 (predavanje, međunarodna recenzija, cjeloviti rad (in extenso), znanstveni)
CROSBI ID: 198152 Za ispravke kontaktirajte CROSBI podršku putem web obrasca
Naslov
Automated news item categorization
Autori
Bačan, Hrvoje ; Gulija, Darko ; Pandžić, Igor
Vrsta, podvrsta i kategorija rada
Radovi u zbornicima skupova, cjeloviti rad (in extenso), znanstveni
Izvornik
Proceedings of JSAI 2005 Workshop on Conversational Informatics, in conjunction with the 19th Annual Conference of The Japanese Society for Artificial Intelligence JSAI 2005
/ Sumi, Yasuyuki ; Nishida, Toyoaki - Kitakyushu : Kyoto University, 2005, 57-62
Skup
JSAI 2005 Workshop on Conversational Informatics, in conjunction with the 19th Annual Conference of The Japanese Society for Artificial Intelligence JSAI 2005
Mjesto i datum
Kitakjūshū, Japan, 13.06.2005. - 14.06.2005
Vrsta sudjelovanja
Predavanje
Vrsta recenzije
Međunarodna recenzija
Ključne riječi
text categorization; machine learning; news categorization; IPTC
Sažetak
We present a system for automatic categorization of news items into a standard set of categories. The system has been built specifically for news stories written in Croatian language. It uses the standard set of news categories established by the International Press Telecommunications Council (IPTC). The algorithm used for categorization transforms each document into a vector of weights corresponding to an automatically chosen set of keywords. This process is performed on a large training set of news items, forming the multi-dimensional space populated by news items of known categories. An unknown news item is also transformed into a vector of keyword weights and then categorized using the k-NN method in this space. The has been trained on the collection of approx. 2700 manually categorized news provided by the Croatian News Agency and tested on a different set of approx. 500 randomly chosen news items from the same source. The automatic categorization gave a correct result for 85% of tested news items.
Izvorni jezik
Engleski
Znanstvena područja
Elektrotehnika, Računarstvo, Informacijske i komunikacijske znanosti
POVEZANOST RADA
Projekti:
0036060
Ustanove:
Fakultet elektrotehnike i računarstva, Zagreb
Profili:
Igor Sunday Pandžić
(autor)