Pretražite po imenu i prezimenu autora, mentora, urednika, prevoditelja

Napredna pretraga

Pregled bibliografske jedinice broj: 1132489

An empirical study of data intrinsic characteristics that make learning from imbalanced data difficult


Dudjak, Mario; Martinović, Goran
An empirical study of data intrinsic characteristics that make learning from imbalanced data difficult // Expert systems with applications, 182 (2021), 115297, 22 doi:10.1016/j.eswa.2021.115297 (međunarodna recenzija, članak, znanstveni)


CROSBI ID: 1132489 Za ispravke kontaktirajte CROSBI podršku putem web obrasca

Naslov
An empirical study of data intrinsic characteristics that make learning from imbalanced data difficult

Autori
Dudjak, Mario ; Martinović, Goran

Izvornik
Expert systems with applications (0957-4174) 182 (2021); 115297, 22

Vrsta, podvrsta i kategorija rada
Radovi u časopisima, članak, znanstveni

Ključne riječi
Classification ; Class imbalance ; Class overlapping ; Data intrinsic characteristics ; Noise ; Small disjuncts

Sažetak
Learning from data stemming from real-world problems is inherently challenging and difficult due to the numerous intrinsic characteristics present in datasets. The problem of class imbalance is known to significantly impair classification performance and has attracted increasing attention from researchers. On the other hand, some studies suggest that the detrimental effects of class imbalance occur only when the dataset encompasses other intrinsic characteristics such as small disjuncts, class overlapping, noise or data rarity. However, the literature is often ambiguous in terms of understanding and distinguishing the influence of these characteristics on the behaviour of standard classification algorithms. This paper provides a contemporary empirical study of the behaviour and performance of five well-known classifiers on a large number of imbalanced datasets exhibiting numerous combinations of the stated characteristics. The aim of the study is to identify and rank difficulty factors when learning from imbalanced data, depending on the type of classification algorithm used. In general, the obtained results suggest that if classifiers conceptually have no problem with class separation into sub-concepts, noise is the characteristic that most impairs their performance, closely followed by class overlapping and class imbalance. To alleviate these problems, oversampling and undersampling procedures were tested and directions are given for selecting appropriate techniques when dealing with the problem of class imbalance.

Izvorni jezik
Engleski

Znanstvena područja
Računarstvo



POVEZANOST RADA


Projekti:

Ustanove:
Fakultet elektrotehnike, računarstva i informacijskih tehnologija Osijek

Profili:

Avatar Url Goran Martinović (autor)

Avatar Url Mario Dudjak (autor)

Poveznice na cjeloviti tekst rada:

doi www.sciencedirect.com www.sciencedirect.com

Citiraj ovu publikaciju:

Dudjak, Mario; Martinović, Goran
An empirical study of data intrinsic characteristics that make learning from imbalanced data difficult // Expert systems with applications, 182 (2021), 115297, 22 doi:10.1016/j.eswa.2021.115297 (međunarodna recenzija, članak, znanstveni)
Dudjak, M. & Martinović, G. (2021) An empirical study of data intrinsic characteristics that make learning from imbalanced data difficult. Expert systems with applications, 182, 115297, 22 doi:10.1016/j.eswa.2021.115297.
@article{article, author = {Dudjak, Mario and Martinovi\'{c}, Goran}, year = {2021}, pages = {22}, DOI = {10.1016/j.eswa.2021.115297}, chapter = {115297}, keywords = {Classification, Class imbalance, Class overlapping, Data intrinsic characteristics, Noise, Small disjuncts}, journal = {Expert systems with applications}, doi = {10.1016/j.eswa.2021.115297}, volume = {182}, issn = {0957-4174}, title = {An empirical study of data intrinsic characteristics that make learning from imbalanced data difficult}, keyword = {Classification, Class imbalance, Class overlapping, Data intrinsic characteristics, Noise, Small disjuncts}, chapternumber = {115297} }
@article{article, author = {Dudjak, Mario and Martinovi\'{c}, Goran}, year = {2021}, pages = {22}, DOI = {10.1016/j.eswa.2021.115297}, chapter = {115297}, keywords = {Classification, Class imbalance, Class overlapping, Data intrinsic characteristics, Noise, Small disjuncts}, journal = {Expert systems with applications}, doi = {10.1016/j.eswa.2021.115297}, volume = {182}, issn = {0957-4174}, title = {An empirical study of data intrinsic characteristics that make learning from imbalanced data difficult}, keyword = {Classification, Class imbalance, Class overlapping, Data intrinsic characteristics, Noise, Small disjuncts}, chapternumber = {115297} }

Časopis indeksira:


  • Current Contents Connect (CCC)
  • Web of Science Core Collection (WoSCC)
    • Science Citation Index Expanded (SCI-EXP)
    • SCI-EXP, SSCI i/ili A&HCI
  • Scopus


Citati:





    Contrast
    Increase Font
    Decrease Font
    Dyslexic Font