Pretražite po imenu i prezimenu autora, mentora, urednika, prevoditelja

Napredna pretraga

Pregled bibliografske jedinice broj: 791025

Two Stage Comparison of Classifier Performances for Highly Imbalanced Datasets


Oreški, Goran; Oreški, Stjepan
Two Stage Comparison of Classifier Performances for Highly Imbalanced Datasets // Journal of information and organizational sciences, 39 (2015), 2; 209-222 (međunarodna recenzija, članak, znanstveni)


CROSBI ID: 791025 Za ispravke kontaktirajte CROSBI podršku putem web obrasca

Naslov
Two Stage Comparison of Classifier Performances for Highly Imbalanced Datasets

Autori
Oreški, Goran ; Oreški, Stjepan

Izvornik
Journal of information and organizational sciences (1846-3312) 39 (2015), 2; 209-222

Vrsta, podvrsta i kategorija rada
Radovi u časopisima, članak, znanstveni

Ključne riječi
imbalanced data ; classification algorithm ; re-sampling technique ; dataset cardinality ; reduction of class imbalance

Sažetak
During the process of knowledge discovery in data, imbalanced learning data often emerges and presents a significant challenge for data mining methods. In this paper, we investigate the influence of class imbalanced data on the classification results of artificial intelligence methods, i.e. neural networks and support vector machine, and on the classification results of classical classification methods represented by RIPPER and the Naïve Bayes classifier. All experiments are conducted on 30 different imbalanced datasets obtained from KEEL (Knowledge Extraction based on Evolutionary Learning) repository. With the purpose of measuring the quality of classification, the accuracy and the a rea under ROC curve (AUC) measures are used. The results of the research indicate that the neural network and support vector machine show improvement of the AUC measure when applied to balanced data, but at the same time, they show the deterioration of results from the aspect of classification accuracy. RIPPER results are also similar, but the changes are of a smaller magnitude, while the results of the Naïve Bayes classifier show overall deterioration of results on balanced distributions. The number of instances in the presented highly imbalanced datasets has significant additional impact on the classification performances of the SVM classifier. The results have shown the potential of the SVM classifier for the ensemble creation on imbalanced datasets.

Izvorni jezik
Engleski

Znanstvena područja
Računarstvo, Informacijske i komunikacijske znanosti



POVEZANOST RADA


Ustanove:
Fakultet organizacije i informatike, Varaždin

Profili:

Avatar Url Goran Oreški (autor)

Avatar Url Stjepan Oreški (autor)

Poveznice na cjeloviti tekst rada:

Hrčak jios.foi.hr

Citiraj ovu publikaciju:

Oreški, Goran; Oreški, Stjepan
Two Stage Comparison of Classifier Performances for Highly Imbalanced Datasets // Journal of information and organizational sciences, 39 (2015), 2; 209-222 (međunarodna recenzija, članak, znanstveni)
Oreški, G. & Oreški, S. (2015) Two Stage Comparison of Classifier Performances for Highly Imbalanced Datasets. Journal of information and organizational sciences, 39 (2), 209-222.
@article{article, author = {Ore\v{s}ki, Goran and Ore\v{s}ki, Stjepan}, year = {2015}, pages = {209-222}, keywords = {imbalanced data, classification algorithm, re-sampling technique, dataset cardinality, reduction of class imbalance}, journal = {Journal of information and organizational sciences}, volume = {39}, number = {2}, issn = {1846-3312}, title = {Two Stage Comparison of Classifier Performances for Highly Imbalanced Datasets}, keyword = {imbalanced data, classification algorithm, re-sampling technique, dataset cardinality, reduction of class imbalance} }
@article{article, author = {Ore\v{s}ki, Goran and Ore\v{s}ki, Stjepan}, year = {2015}, pages = {209-222}, keywords = {imbalanced data, classification algorithm, re-sampling technique, dataset cardinality, reduction of class imbalance}, journal = {Journal of information and organizational sciences}, volume = {39}, number = {2}, issn = {1846-3312}, title = {Two Stage Comparison of Classifier Performances for Highly Imbalanced Datasets}, keyword = {imbalanced data, classification algorithm, re-sampling technique, dataset cardinality, reduction of class imbalance} }

Časopis indeksira:


  • Web of Science Core Collection (WoSCC)
    • Emerging Sources Citation Index (ESCI)
  • Scopus


Uključenost u ostale bibliografske baze podataka::


  • INSPEC
  • LISA: Library and Information Science Abstracts
  • Scopus





Contrast
Increase Font
Decrease Font
Dyslexic Font