Pretražite po imenu i prezimenu autora, mentora, urednika, prevoditelja

Napredna pretraga

Pregled bibliografske jedinice broj: 717193

An experimental comparison of classification algorithm performances for highly imbalanced datasets


Oreški, Goran; Oreški, Stjepan
An experimental comparison of classification algorithm performances for highly imbalanced datasets // Proceedings of the 25th Central European Conference on Information and Intelligent Systems / Hunjak, Tihomir ; Lovrenčić, Sandra ; Tomičić, Igor (ur.).
Varaždin: Fakultet organizacije i informatike Sveučilišta u Zagrebu, 2014. str. 4-11 (predavanje, međunarodna recenzija, cjeloviti rad (in extenso), znanstveni)


CROSBI ID: 717193 Za ispravke kontaktirajte CROSBI podršku putem web obrasca

Naslov
An experimental comparison of classification algorithm performances for highly imbalanced datasets

Autori
Oreški, Goran ; Oreški, Stjepan

Vrsta, podvrsta i kategorija rada
Radovi u zbornicima skupova, cjeloviti rad (in extenso), znanstveni

Izvornik
Proceedings of the 25th Central European Conference on Information and Intelligent Systems / Hunjak, Tihomir ; Lovrenčić, Sandra ; Tomičić, Igor - Varaždin : Fakultet organizacije i informatike Sveučilišta u Zagrebu, 2014, 4-11

Skup
25th Central European Conference on Information and Intelligent Systems

Mjesto i datum
Varaždin, Hrvatska, 17.09.2014. - 19.09.2014

Vrsta sudjelovanja
Predavanje

Vrsta recenzije
Međunarodna recenzija

Ključne riječi
imbalanced data ; classification learning algorithm ; re-sampling technique ; reduction of class imbalance

Sažetak
Imbalanced learning data often emerges during the process of the knowledge discovery in data and presents a significant challenge for data mining methods. In this paper we investigate the influence of class imbalanced data on: artificial intelligence methods, i.e. neural networks and support vector machine and on classical classification methods represented by RIPPER and Naïve Bayes classifier. The research is conducted on classification problems and, in purpose of measuring the quality of classification, the accuracy and the area under ROC curve measures are used. For the reduction of the negative influence of imbalanced data, SMOTE oversampling technique is used. All experiments on 30 different data sets, obtained from KEEL (Knowledge Extraction based on Evolutionary Learning) repository, are conducted on original datasets, and repeated on balanced datasets generated using SMOTE technique. The results of the research indicate that imbalanced data have significant negative influence on AUC measure on neural network and support vector machine. The same methods are showing improvement of AUC measure when applied on balanced data, but at the same time, are showing the deterioration of results from aspect of the classification accuracy. RIPPER results are also similar, but the changes are of smaller magnitude, while results of Naïve Bayes classifier show overall deterioration of results on balanced distributions.

Izvorni jezik
Engleski

Znanstvena područja
Računarstvo, Informacijske i komunikacijske znanosti

Napomena
BEST PAPER AWARD



POVEZANOST RADA


Profili:

Avatar Url Stjepan Oreški (autor)

Avatar Url Goran Oreški (autor)

Poveznice na cjeloviti tekst rada:

Pristup cjelovitom tekstu rada

Citiraj ovu publikaciju:

Oreški, Goran; Oreški, Stjepan
An experimental comparison of classification algorithm performances for highly imbalanced datasets // Proceedings of the 25th Central European Conference on Information and Intelligent Systems / Hunjak, Tihomir ; Lovrenčić, Sandra ; Tomičić, Igor (ur.).
Varaždin: Fakultet organizacije i informatike Sveučilišta u Zagrebu, 2014. str. 4-11 (predavanje, međunarodna recenzija, cjeloviti rad (in extenso), znanstveni)
Oreški, G. & Oreški, S. (2014) An experimental comparison of classification algorithm performances for highly imbalanced datasets. U: Hunjak, T., Lovrenčić, S. & Tomičić, I. (ur.)Proceedings of the 25th Central European Conference on Information and Intelligent Systems.
@article{article, author = {Ore\v{s}ki, Goran and Ore\v{s}ki, Stjepan}, year = {2014}, pages = {4-11}, keywords = {imbalanced data, classification learning algorithm, re-sampling technique, reduction of class imbalance}, title = {An experimental comparison of classification algorithm performances for highly imbalanced datasets}, keyword = {imbalanced data, classification learning algorithm, re-sampling technique, reduction of class imbalance}, publisher = {Fakultet organizacije i informatike Sveu\v{c}ili\v{s}ta u Zagrebu}, publisherplace = {Vara\v{z}din, Hrvatska} }
@article{article, author = {Ore\v{s}ki, Goran and Ore\v{s}ki, Stjepan}, year = {2014}, pages = {4-11}, keywords = {imbalanced data, classification learning algorithm, re-sampling technique, reduction of class imbalance}, title = {An experimental comparison of classification algorithm performances for highly imbalanced datasets}, keyword = {imbalanced data, classification learning algorithm, re-sampling technique, reduction of class imbalance}, publisher = {Fakultet organizacije i informatike Sveu\v{c}ili\v{s}ta u Zagrebu}, publisherplace = {Vara\v{z}din, Hrvatska} }




Contrast
Increase Font
Decrease Font
Dyslexic Font