Methods for Automatic Sensitive Data Detection inLarge Datasets: a Review

Kužina, Vjeko; Vušak, Eugen; Jović, Alan

Pregled bibliografske jedinice broj: 1148395

Methods for Automatic Sensitive Data Detection in Large Datasets: a Review

Kužina, Vjeko; Vušak, Eugen; Jović, Alan

Methods for Automatic Sensitive Data Detection in Large Datasets: a Review // MIPRO 2021 Proceeedings / Skala, Karolj (ur.).
Rijeka: Hrvatska udruga za informacijsku i komunikacijsku tehnologiju, elektroniku i mikroelektroniku - MIPRO, 2021. str. 213-218 (predavanje, međunarodna recenzija, cjeloviti rad (in extenso), znanstveni)

CROSBI ID: 1148395 Za ispravke kontaktirajte CROSBI podršku putem web obrasca

Naslov
Methods for Automatic Sensitive Data Detection in Large Datasets: a Review

Autori
Kužina, Vjeko ; Vušak, Eugen ; Jović, Alan

Vrsta, podvrsta i kategorija rada
Radovi u zbornicima skupova, cjeloviti rad (in extenso), znanstveni

Izvornik
MIPRO 2021 Proceeedings / Skala, Karolj - Rijeka : Hrvatska udruga za informacijsku i komunikacijsku tehnologiju, elektroniku i mikroelektroniku - MIPRO, 2021, 213-218

Skup
44th International Convention on Information and Communication Technology, Electronics and Microelectronics (MIPRO 2021)

Mjesto i datum
Opatija, Hrvatska, 27.09.2021. - 01.10.2021

Vrsta sudjelovanja
Predavanje

Vrsta recenzije
Međunarodna recenzija

Ključne riječi
sensitive data ; detection ; de-identification ; unstructured data ; machine learning ; named entity recognition

Sažetak
In recent years, the need for detection and deidentification of sensitive data in both structured and unstructured forms has increased. The methods used for these tasks have evolved accordingly and currently there are many solutions in different areas of interest. This paper describes the need for the detection of sensitive data in large datasets and describes the challenges associated with automating the detection process. It gives a brief overview of the rule-based and machine learning methods used in this area and examples of their application. The advantages and disadvantages of the described methods are also discussed. We show that the most recent detection solutions are based on the latest and most advanced models proposed in the field of natural language processing, but that there are still some rule-based methods used for certain types of sensitive data. In recent years, the need for detection and de-identification of sensitive data in both structured and unstructured forms has increased. The methods used for these tasks have evolved accordingly and currently there are many solutions in different areas of interest. This paper describes the need for the detection of sensitive data in large datasets and describes the challenges associated with automating the detection process. It gives a brief overview of the rule-based and machine learning methods used in this area and examples of their application. The advantages and disadvantages of the described methods are also discussed. We show that the most recent detection solutions are based on the latest and most advanced models proposed in the field of natural language processing, but that there are still some rule-based methods used for certain types of sensitive data.

Izvorni jezik
Engleski

Znanstvena područja
Računarstvo

POVEZANOST RADA

Ustanove:
Fakultet elektrotehnike i računarstva, Zagreb

Profili:

Vjeko Kužina (autor)