Pregled bibliografske jedinice broj: 717193
An experimental comparison of classification algorithm performances for highly imbalanced datasets
An experimental comparison of classification algorithm performances for highly imbalanced datasets // Proceedings of the 25th Central European Conference on Information and Intelligent Systems / Hunjak, Tihomir ; Lovrenčić, Sandra ; Tomičić, Igor (ur.).
Varaždin: Fakultet organizacije i informatike Sveučilišta u Zagrebu, 2014. str. 4-11 (predavanje, međunarodna recenzija, cjeloviti rad (in extenso), znanstveni)
CROSBI ID: 717193 Za ispravke kontaktirajte CROSBI podršku putem web obrasca
Naslov
An experimental comparison of classification algorithm performances for highly imbalanced datasets
Autori
Oreški, Goran ; Oreški, Stjepan
Vrsta, podvrsta i kategorija rada
Radovi u zbornicima skupova, cjeloviti rad (in extenso), znanstveni
Izvornik
Proceedings of the 25th Central European Conference on Information and Intelligent Systems
/ Hunjak, Tihomir ; Lovrenčić, Sandra ; Tomičić, Igor - Varaždin : Fakultet organizacije i informatike Sveučilišta u Zagrebu, 2014, 4-11
Skup
25th Central European Conference on Information and Intelligent Systems
Mjesto i datum
Varaždin, Hrvatska, 17.09.2014. - 19.09.2014
Vrsta sudjelovanja
Predavanje
Vrsta recenzije
Međunarodna recenzija
Ključne riječi
imbalanced data ; classification learning algorithm ; re-sampling technique ; reduction of class imbalance
Sažetak
Imbalanced learning data often emerges during the process of the knowledge discovery in data and presents a significant challenge for data mining methods. In this paper we investigate the influence of class imbalanced data on: artificial intelligence methods, i.e. neural networks and support vector machine and on classical classification methods represented by RIPPER and Naïve Bayes classifier. The research is conducted on classification problems and, in purpose of measuring the quality of classification, the accuracy and the area under ROC curve measures are used. For the reduction of the negative influence of imbalanced data, SMOTE oversampling technique is used. All experiments on 30 different data sets, obtained from KEEL (Knowledge Extraction based on Evolutionary Learning) repository, are conducted on original datasets, and repeated on balanced datasets generated using SMOTE technique. The results of the research indicate that imbalanced data have significant negative influence on AUC measure on neural network and support vector machine. The same methods are showing improvement of AUC measure when applied on balanced data, but at the same time, are showing the deterioration of results from aspect of the classification accuracy. RIPPER results are also similar, but the changes are of smaller magnitude, while results of Naïve Bayes classifier show overall deterioration of results on balanced distributions.
Izvorni jezik
Engleski
Znanstvena područja
Računarstvo, Informacijske i komunikacijske znanosti
Napomena
BEST PAPER AWARD