Nalazite se na CroRIS probnoj okolini. Ovdje evidentirani podaci neće biti pohranjeni u Informacijskom sustavu znanosti RH. Ako je ovo greška, CroRIS produkcijskoj okolini moguće je pristupi putem poveznice www.croris.hr
izvor podataka: crosbi !

Data Acquisition and Corpus Creation for Phishing Detection (CROSBI ID 737423)

Prilog sa skupa u zborniku | izvorni znanstveni rad | međunarodna recenzija

Dunđer, Ivan ; Seljan, Sanja ; Odak, Marko Data Acquisition and Corpus Creation for Phishing Detection // MIPRO / Skala, Karolj (ur.). 2023. str. 589-594

Podaci o odgovornosti

Dunđer, Ivan ; Seljan, Sanja ; Odak, Marko

engleski

Data Acquisition and Corpus Creation for Phishing Detection

Detecting phishing attacks is not straightforward, since there are many obstacles that derive from language complexity and technical aspects. Studying phishing attacks and other related issues heavily relies on computer datasets, i.e. digital corpora that reflect these linguistic and technical intricacies. Diverse studies using phishing datasets have been performed, but mainly for the English language. Research for other languages is scarce, and especially for not widely spoken languages. For the Croatian language there is an evident lack of corpora that are essential for diverse analyses and for constructing models that are capable of recognizing phishing attacks and protecting users. These datasets are necessary for natural language processing and building machine learning workflows, where results largely depend on corpora that must be specifically crafted for this purpose. Therefore, creating high-quality domain-specific corpora is of great importance in the domain of information security. Such corpora can be employed for teaching purposes in various courses in higher education, and could be analyzed in numerous ways in order to understand the underlying principles of phishing attack strategies. The aim of this paper is to demonstrate the entire process of data acquisition and corpus creation for the phishing detection domain. In addition, an analysis of the corpus is presented with regard to different aspects, such as descriptive attributes, terminology characteristics, metadata and language.

data acquisition ; digital corpus creation ; computational data analysis ; natural language processing ; phishing ; information privacy ; information security

nije evidentirano

nije evidentirano

nije evidentirano

nije evidentirano

nije evidentirano

nije evidentirano

Podaci o prilogu

589-594.

2023.

nije evidentirano

objavljeno

Podaci o matičnoj publikaciji

MIPRO Proceedings - ICT and Electronics Convention

Skala, Karolj

Rijeka: Hrvatska udruga za informacijsku i komunikacijsku tehnologiju, elektroniku i mikroelektroniku - MIPRO

1847-3938

1847-3946

Podaci o skupu

46th ICT and Electronics Convention

predavanje

22.05.2023-26.05.2023

Opatija, Hrvatska

Povezanost rada

Trošak objave rada u otvorenom pristupu

Informacijske i komunikacijske znanosti, Računarstvo