Pregled bibliografske jedinice broj: 1275014
Impact of missing values on the performance of machine learning algorithms
Impact of missing values on the performance of machine learning algorithms // CEUR Workshop Proceedings: Recent Trends and Applications in Computer Science and Information Technology (RTA-CSIT 2023) / Xhina, Endrit ; Hoxha, Klesti (ur.).
Tirana: University of Tirana, Faculty of Natural Sciences, Department of Informatics, 2023. str. 54-62 (predavanje, međunarodna recenzija, cjeloviti rad (in extenso), znanstveni)
CROSBI ID: 1275014 Za ispravke kontaktirajte CROSBI podršku putem web obrasca
Naslov
Impact of missing values on the performance of
machine learning algorithms
Autori
Radišić, Bojan ; Seljan, Sanja ; Dunđer, Ivan
Vrsta, podvrsta i kategorija rada
Radovi u zbornicima skupova, cjeloviti rad (in extenso), znanstveni
Izvornik
CEUR Workshop Proceedings: Recent Trends and Applications in Computer Science and Information Technology (RTA-CSIT 2023)
/ Xhina, Endrit ; Hoxha, Klesti - Tirana : University of Tirana, Faculty of Natural Sciences, Department of Informatics, 2023, 54-62
Skup
5th International Conference on Recent Trends and Applications in Computer Science and Information Technology (RTA-CSIT)
Mjesto i datum
Tirana, Albanija, 26.04.2023. - 27.05.2023
Vrsta sudjelovanja
Predavanje
Vrsta recenzije
Međunarodna recenzija
Ključne riječi
machine learning ; neural network ; missing data ; confusion matrix ; accuracy
Sažetak
Machine learning (ML) can be used to analyze and predict student success outcome in order to avoid various problems and to plan future actions for helping students overcome difficulties during their study. This paper analyzes data from a digital system of 309 students who were enrolled in the Specialist Study in Trade Business at the Faculty of Tourism and Rural Development from 2010 to 2018. The paper explores the impact of four different data sets on the performance of ML algorithms. The first data set is with partially missing data on the length of study (around 7%), the second one uses arithmetic means in place of missing data, the third is based on median values, whereas the fourth uses the geometric mean instead. Four popular ML algorithms were considered: k-Nearest Neighbors (KNN), Naïve Bayes (NB), Random Forest (RF) and Probabilistic Neural Network (PNN). All of them are used for predicting student success based on achieved ECTS credit points. The aim of this paper is to compare and analyze the impact of missing values on the results of individual ML algorithms.
Izvorni jezik
Engleski
Znanstvena područja
Računarstvo, Informacijske i komunikacijske znanosti
POVEZANOST RADA
Projekti:
--11-933-1053 - Strojno učenje i obrada prirodnog jezika u domeni računalne sigurnosti – II. dio (Seljan, Sanja) ( CroRIS)
Ustanove:
Filozofski fakultet, Zagreb,
Fakultet turizma i ruralnog razvoja u Požegi
Citiraj ovu publikaciju:
Časopis indeksira:
- Scopus