Impact of missing values on the performance of machine learning algorithms (CROSBI ID 736757)
Prilog sa skupa u zborniku | izvorni znanstveni rad | međunarodna recenzija
Podaci o odgovornosti
Radišić, Bojan ; Seljan, Sanja ; Dunđer, Ivan
engleski
Impact of missing values on the performance of machine learning algorithms
Machine learning (ML) can be used to analyze and predict student success outcome in order to avoid various problems and to plan future actions for helping students overcome difficulties during their study. This paper analyzes data from a digital system of 309 students who were enrolled in the Specialist Study in Trade Business at the Faculty of Tourism and Rural Development from 2010 to 2018. The paper explores the impact of four different data sets on the performance of ML algorithms. The first data set is with partially missing data on the length of study (around 7%), the second one uses arithmetic means in place of missing data, the third is based on median values, whereas the fourth uses the geometric mean instead. Four popular ML algorithms were considered: k-Nearest Neighbors (KNN), Naïve Bayes (NB), Random Forest (RF) and Probabilistic Neural Network (PNN). All of them are used for predicting student success based on achieved ECTS credit points. The aim of this paper is to compare and analyze the impact of missing values on the results of individual ML algorithms.
machine learning ; neural network ; missing data ; confusion matrix ; accuracy
nije evidentirano
nije evidentirano
nije evidentirano
nije evidentirano
nije evidentirano
nije evidentirano
Podaci o prilogu
54-62.
2023.
objavljeno
Podaci o matičnoj publikaciji
CEUR Workshop Proceedings: Recent Trends and Applications in Computer Science and Information Technology (RTA-CSIT 2023)
Xhina, Endrit ; Hoxha, Klesti
Tirana: University of Tirana, Faculty of Natural Sciences, Department of Informatics
1613-0073
1613-0073
Podaci o skupu
5th International Conference on Recent Trends and Applications in Computer Science and Information Technology (RTA-CSIT)
predavanje
26.04.2023-27.05.2023
Tirana, Albanija
Povezanost rada
Informacijske i komunikacijske znanosti, Računarstvo