Data Acquisition and Corpus Creation for Phishing Detection (CROSBI ID 737423)
Prilog sa skupa u zborniku | izvorni znanstveni rad | međunarodna recenzija
Podaci o odgovornosti
Dunđer, Ivan ; Seljan, Sanja ; Odak, Marko
engleski
Data Acquisition and Corpus Creation for Phishing Detection
Detecting phishing attacks is not straightforward, since there are many obstacles that derive from language complexity and technical aspects. Studying phishing attacks and other related issues heavily relies on computer datasets, i.e. digital corpora that reflect these linguistic and technical intricacies. Diverse studies using phishing datasets have been performed, but mainly for the English language. Research for other languages is scarce, and especially for not widely spoken languages. For the Croatian language there is an evident lack of corpora that are essential for diverse analyses and for constructing models that are capable of recognizing phishing attacks and protecting users. These datasets are necessary for natural language processing and building machine learning workflows, where results largely depend on corpora that must be specifically crafted for this purpose. Therefore, creating high-quality domain-specific corpora is of great importance in the domain of information security. Such corpora can be employed for teaching purposes in various courses in higher education, and could be analyzed in numerous ways in order to understand the underlying principles of phishing attack strategies. The aim of this paper is to demonstrate the entire process of data acquisition and corpus creation for the phishing detection domain. In addition, an analysis of the corpus is presented with regard to different aspects, such as descriptive attributes, terminology characteristics, metadata and language.
data acquisition ; digital corpus creation ; computational data analysis ; natural language processing ; phishing ; information privacy ; information security
nije evidentirano
nije evidentirano
nije evidentirano
nije evidentirano
nije evidentirano
nije evidentirano
Podaci o prilogu
589-594.
2023.
nije evidentirano
objavljeno
Podaci o matičnoj publikaciji
MIPRO Proceedings - ICT and Electronics Convention
Skala, Karolj
Rijeka: Hrvatska udruga za informacijsku i komunikacijsku tehnologiju, elektroniku i mikroelektroniku - MIPRO
1847-3938
1847-3946
Podaci o skupu
46th ICT and Electronics Convention
predavanje
22.05.2023-26.05.2023
Opatija, Hrvatska