Anomaly Detection in Netflow Network Traffic Using Supervised Machine Learning Algorithms

Fosić, Igor; Žagar, Drago; Grgić, Krešimir; Križanović, Višnja

izvor podataka: crosbi ✓

Anomaly Detection in Netflow Network Traffic Using Supervised Machine Learning Algorithms (CROSBI ID 318605)

Prilog u časopisu | izvorni znanstveni rad | međunarodna recenzija

Fosić, Igor ; Žagar, Drago ; Grgić, Krešimir ; Križanović, Višnja Anomaly Detection in Netflow Network Traffic Using Supervised Machine Learning Algorithms // Journal of industrial information integration, 33 (2023), 100466, 10. doi: 10.1016/j.jii.2023.100466

Podaci o odgovornosti

Autori

Fosić, Igor ; Žagar, Drago ; Grgić, Krešimir ; Križanović, Višnja

Osnovni podaci na izvornom jeziku
Osnovni podaci na ostalim jezicima

Jezik

engleski

Naslov

Anomaly Detection in Netflow Network Traffic Using Supervised Machine Learning Algorithms

Sažetak

Anomaly detection is an important method for monitoring network traffic where is important to successfully distinguish normal traffic from abnormal traffic. For this purpose, one could use the existing classification algorithms as a part of the machine learning (ML) process. In this paper, some of the classification algorithms (Stochastic Gradient Descent (SGD), Support Vector Machines (SVM), K-Nearest Neighbor (K-NN), Gaussian Naive Bayes (GNB), Decision Tree (DT), Random Forest (RF), AdaBoost (AB)) were tested on the public UNSW-NB15 dataset. Different encoding methods and ratios of training and test data resulted in the optimal parameters classifiers. Due to the imbalanced distribution of normal and abnormal network traffic data, both standard performance scores and additional classification performance scores (F2-score, Area Under ROC Curve (AUC)) were used, that better describe the obtained results. The RF Classifier with F2-score = 97.68% and AUC score = 98.47% obtained the best results using a representative subset within the original dataset due to the shorter duration of the computations. Features in the referential dataset were reduced by 82% and selected following the structure of the NetFlow data stream. Concerning similar studies, this paper compares several algorithms for anomaly detection and selects the best one for NetFlow data streams. The F2 and AUC metric is applied, which achieves very high accuracy compared to classic metrics that do not show realistic accuracy in imbalanced datasets. Less time was spent using Label enoding (LE) with the same accuracy compared to One-hot (OH) encoding used in similar research. The novelty introduced by this paper is in the optimization of the ML process and influence of the ratio of data for learning and testing, different encoding methods of categorical features, and feature reduction on the Netflow data streams

Ključne riječi

supervised algorithm ; machine learning ; anomaly classification ; NetFlow ; imbalanced dataset

Napomena

nije evidentirano

Jezik

nije evidentirano

Naslov

nije evidentirano

Sažetak

nije evidentirano

Ključne riječi

nije evidentirano

Napomena

nije evidentirano

Podaci o izdanju

Časopis

Journal of industrial information integration

Volumen (broj)

Godina

2023.

Broj rada

100466

Broj stranica

Status objave rada

objavljeno

ISSN

2467-964X

e-ISSN

2452-414X

DOI

10.1016/j.jii.2023.100466

Povezanost rada

Povezane osobe

Drago Žagar (autor/i)

Krešimir Grgić (autor/i)

Višnja Križanović (autor/i)

Povezane ustanove

Fakultet elektrotehnike, računarstva i informacijskih tehnologija Osijek (165) (autorova ustanova)

Područje

Elektrotehnika, Informacijske i komunikacijske znanosti

Poveznice

doi.org

sciencedirect.com

papers.ssrn.com

Indeksiranost

Scopus

Current Contents Connect (CCC)

Web of Science Core Collection, Science Citation Index Expanded (WoSCC-SCI-Exp)

Web of Science Core Collection, SCI-Exp, SSCI & A&HCI (WoSCC-SCI-Exp, SSCI, A&HCI)