Pregled bibliografske jedinice broj: 200075
Mining textual data in Croatian
Mining textual data in Croatian // Proceedings of the XXVIII International Conference MIPRO 2005, Business Intelligence Systems / Baranović, Mirta ; Sandri, Roberto ; Čišić, Dragan ; Hutinski, Željko (ur.).
Opatija: Hrvatska udruga za informacijsku i komunikacijsku tehnologiju, elektroniku i mikroelektroniku - MIPRO, 2005. str. 61-66 (predavanje, međunarodna recenzija, cjeloviti rad (in extenso), znanstveni)
CROSBI ID: 200075 Za ispravke kontaktirajte CROSBI podršku putem web obrasca
Naslov
Mining textual data in Croatian
Autori
Dalbelo Bašić, Bojana ; Bereček, Boris ; Cvitaš, Ana
Vrsta, podvrsta i kategorija rada
Radovi u zbornicima skupova, cjeloviti rad (in extenso), znanstveni
Izvornik
Proceedings of the XXVIII International Conference MIPRO 2005, Business Intelligence Systems
/ Baranović, Mirta ; Sandri, Roberto ; Čišić, Dragan ; Hutinski, Željko - Opatija : Hrvatska udruga za informacijsku i komunikacijsku tehnologiju, elektroniku i mikroelektroniku - MIPRO, 2005, 61-66
Skup
Business Intelligence Systems - MIPRO 2005
Mjesto i datum
Opatija, Hrvatska, 30.05.2005. - 03.06.2005
Vrsta sudjelovanja
Predavanje
Vrsta recenzije
Međunarodna recenzija
Ključne riječi
text mining; text classification; clustering; morphological normalisation
Sažetak
Business intelligence systems find textual data a very useful source of information. Text processing algorithms and systems in English and other world languages are well developed, which is not the case with Croatian language. This paper explores the applicability of existing systems and examines optimal parameters for Croatian. The quality of input data strongly influences clustering and classification results. Experiments are significantly better run after reducing noise. The impact of input learning set size and dimensionality are also considered. Special preprocessing for Croatian language consists of morphological normalisation, a useful step towards better results. Non-croatian specialised text mining tools are also applicable.
Izvorni jezik
Engleski
Znanstvena područja
Računarstvo
POVEZANOST RADA
Ustanove:
Fakultet elektrotehnike i računarstva, Zagreb
Profili:
Bojana Dalbelo Bašić
(autor)