Pregled bibliografske jedinice broj: 1277746
Information Extraction from Security-Related Datasets
Information Extraction from Security-Related Datasets // MIPRO Proceedings - ICT and Electronics Convention / Skala, Karolj (ur.).
Rijeka: Hrvatska udruga za informacijsku i komunikacijsku tehnologiju, elektroniku i mikroelektroniku - MIPRO, 2023. str. 595-600 (predavanje, međunarodna recenzija, cjeloviti rad (in extenso), znanstveni)
CROSBI ID: 1277746 Za ispravke kontaktirajte CROSBI podršku putem web obrasca
Naslov
Information Extraction from Security-Related
Datasets
Autori
Seljan, Sanja ; Tolj, Nevenka ; Dunđer, Ivan
Vrsta, podvrsta i kategorija rada
Radovi u zbornicima skupova, cjeloviti rad (in extenso), znanstveni
Izvornik
MIPRO Proceedings - ICT and Electronics Convention
/ Skala, Karolj - Rijeka : Hrvatska udruga za informacijsku i komunikacijsku tehnologiju, elektroniku i mikroelektroniku - MIPRO, 2023, 595-600
Skup
46th ICT and Electronic Convention - MIPRO 2023
Mjesto i datum
Opatija, Hrvatska, 22.04.2023. - 26.04.2023
Vrsta sudjelovanja
Predavanje
Vrsta recenzije
Međunarodna recenzija
Ključne riječi
information extraction ; machine learning ; corpus analysis ; security datasets ; information security ; information and communication sciences
Sažetak
There are various approaches to executing security breaches which are nowadays massively occurring in electronic communication environments, and phishing attacks are one of the most applied ones. A vast majority of phishing attacks are initiated using electronic messages, which attackers utilize to direct users to harmful or fake websites, to infect computers or to obtain personal or sensitive data for malicious purposes. Consequently, it is necessary to identify phishing messages in order to provide suitable user protection. Research and numerous studies have included machine learning algorithms and techniques from the field of artificial intelligence which predominantly depend on language-specific datasets and characteristics of phishing messages, and which have demonstrated to be effective for extracting critical information and for data-driven decision making. However, phishing datasets exist mainly for the English language. The aim of this paper is to present an information extraction pipeline that encompasses phases, such as corpus pre-processing, generating predictions of phishing messages using selected machine learning algorithms, along with a basic analysis, confusion matrices and evaluation scores for Croatian phishing messages. This type of key information can be used for teaching in higher education, e.g. in security-related courses or subjects that deal with artificial intelligence, machine learning, big data analysis, computational linguistics etc. This is essential as it can provide deeper insights into phishing attack strategies and potential countermeasures.
Izvorni jezik
Engleski
Znanstvena područja
Računarstvo, Informacijske i komunikacijske znanosti
POVEZANOST RADA
Projekti:
--11-933-1053 - Strojno učenje i obrada prirodnog jezika u domeni računalne sigurnosti – II. dio (Seljan, Sanja) ( CroRIS)
EK-EFRR-KK.01.2.1.02.0267 - Istraživanje obrade prirodnog jezika (za hrvatski jezik) i razvoj proizvoda PhisHRban za povećanje kibernetičke sigurnosti (PhisHRban) (Pejić Bach, Mirjana; Seljan, Sanja, EK - KK.01.2.1.02) ( CroRIS)
Ustanove:
Filozofski fakultet, Zagreb