Pregled bibliografske jedinice broj: 1277747
Data Acquisition and Corpus Creation for Phishing Detection
Data Acquisition and Corpus Creation for Phishing Detection // MIPRO Proceedings - ICT and Electronics Convention / Skala, Karolj (ur.).
Rijeka: Hrvatska udruga za informacijsku i komunikacijsku tehnologiju, elektroniku i mikroelektroniku - MIPRO, 2023. str. 589-594 (predavanje, međunarodna recenzija, cjeloviti rad (in extenso), znanstveni)
CROSBI ID: 1277747 Za ispravke kontaktirajte CROSBI podršku putem web obrasca
Naslov
Data Acquisition and Corpus Creation for Phishing
Detection
Autori
Dunđer, Ivan ; Seljan, Sanja ; Odak, Marko
Vrsta, podvrsta i kategorija rada
Radovi u zbornicima skupova, cjeloviti rad (in extenso), znanstveni
Izvornik
MIPRO Proceedings - ICT and Electronics Convention
/ Skala, Karolj - Rijeka : Hrvatska udruga za informacijsku i komunikacijsku tehnologiju, elektroniku i mikroelektroniku - MIPRO, 2023, 589-594
Skup
46th ICT and Electronics Convention
Mjesto i datum
Opatija, Hrvatska, 22.05.2023. - 26.05.2023
Vrsta sudjelovanja
Predavanje
Vrsta recenzije
Međunarodna recenzija
Ključne riječi
data acquisition ; digital corpus creation ; computational data analysis ; natural language processing ; phishing ; information privacy ; information security
Sažetak
Detecting phishing attacks is not straightforward, since there are many obstacles that derive from language complexity and technical aspects. Studying phishing attacks and other related issues heavily relies on computer datasets, i.e. digital corpora that reflect these linguistic and technical intricacies. Diverse studies using phishing datasets have been performed, but mainly for the English language. Research for other languages is scarce, and especially for not widely spoken languages. For the Croatian language there is an evident lack of corpora that are essential for diverse analyses and for constructing models that are capable of recognizing phishing attacks and protecting users. These datasets are necessary for natural language processing and building machine learning workflows, where results largely depend on corpora that must be specifically crafted for this purpose. Therefore, creating high-quality domain-specific corpora is of great importance in the domain of information security. Such corpora can be employed for teaching purposes in various courses in higher education, and could be analyzed in numerous ways in order to understand the underlying principles of phishing attack strategies. The aim of this paper is to demonstrate the entire process of data acquisition and corpus creation for the phishing detection domain. In addition, an analysis of the corpus is presented with regard to different aspects, such as descriptive attributes, terminology characteristics, metadata and language.
Izvorni jezik
Engleski
Znanstvena područja
Računarstvo, Informacijske i komunikacijske znanosti
POVEZANOST RADA
Projekti:
--11-933-1053 - Strojno učenje i obrada prirodnog jezika u domeni računalne sigurnosti – II. dio (Seljan, Sanja) ( CroRIS)
EK-EFRR-KK.01.2.1.02.0267 - Istraživanje obrade prirodnog jezika (za hrvatski jezik) i razvoj proizvoda PhisHRban za povećanje kibernetičke sigurnosti (PhisHRban) (Pejić Bach, Mirjana; Seljan, Sanja, EK - KK.01.2.1.02) ( CroRIS)
Ustanove:
Filozofski fakultet, Zagreb