Pregled bibliografske jedinice broj: 1241658
Anomaly Detection in Netflow Network Traffic Using Supervised Machine Learning Algorithms
Anomaly Detection in Netflow Network Traffic Using Supervised Machine Learning Algorithms // Journal of industrial information integration, 33 (2023), 100466, 10 doi:10.1016/j.jii.2023.100466 (međunarodna recenzija, članak, znanstveni)
CROSBI ID: 1241658 Za ispravke kontaktirajte CROSBI podršku putem web obrasca
Naslov
Anomaly Detection in Netflow Network Traffic Using
Supervised Machine Learning Algorithms
Autori
Fosić, Igor ; Žagar, Drago ; Grgić, Krešimir ; Križanović, Višnja
Izvornik
Journal of industrial information integration (2467-964X) 33
(2023);
100466, 10
Vrsta, podvrsta i kategorija rada
Radovi u časopisima, članak, znanstveni
Ključne riječi
supervised algorithm ; machine learning ; anomaly classification ; NetFlow ; imbalanced dataset
Sažetak
Anomaly detection is an important method for monitoring network traffic where is important to successfully distinguish normal traffic from abnormal traffic. For this purpose, one could use the existing classification algorithms as a part of the machine learning (ML) process. In this paper, some of the classification algorithms (Stochastic Gradient Descent (SGD), Support Vector Machines (SVM), K-Nearest Neighbor (K-NN), Gaussian Naive Bayes (GNB), Decision Tree (DT), Random Forest (RF), AdaBoost (AB)) were tested on the public UNSW-NB15 dataset. Different encoding methods and ratios of training and test data resulted in the optimal parameters classifiers. Due to the imbalanced distribution of normal and abnormal network traffic data, both standard performance scores and additional classification performance scores (F2-score, Area Under ROC Curve (AUC)) were used, that better describe the obtained results. The RF Classifier with F2-score = 97.68% and AUC score = 98.47% obtained the best results using a representative subset within the original dataset due to the shorter duration of the computations. Features in the referential dataset were reduced by 82% and selected following the structure of the NetFlow data stream. Concerning similar studies, this paper compares several algorithms for anomaly detection and selects the best one for NetFlow data streams. The F2 and AUC metric is applied, which achieves very high accuracy compared to classic metrics that do not show realistic accuracy in imbalanced datasets. Less time was spent using Label enoding (LE) with the same accuracy compared to One-hot (OH) encoding used in similar research. The novelty introduced by this paper is in the optimization of the ML process and influence of the ratio of data for learning and testing, different encoding methods of categorical features, and feature reduction on the Netflow data streams
Izvorni jezik
Engleski
Znanstvena područja
Elektrotehnika, Informacijske i komunikacijske znanosti
POVEZANOST RADA
Ustanove:
Fakultet elektrotehnike, računarstva i informacijskih tehnologija Osijek
Citiraj ovu publikaciju:
Časopis indeksira:
- Current Contents Connect (CCC)
- Web of Science Core Collection (WoSCC)
- Science Citation Index Expanded (SCI-EXP)
- SCI-EXP, SSCI i/ili A&HCI
- Scopus