Methods for Automatic Sensitive Data Detection in Large Datasets: a Review

Kužina, Vjeko; Vušak, Eugen; Jović, Alan

izvor podataka: crosbi !

Methods for Automatic Sensitive Data Detection in Large Datasets: a Review (CROSBI ID 707892)

Prilog sa skupa u zborniku | izvorni znanstveni rad | međunarodna recenzija

Kužina, Vjeko ; Vušak, Eugen ; Jović, Alan Methods for Automatic Sensitive Data Detection in Large Datasets: a Review // MIPRO / Skala, Karolj (ur.). 2021. str. 213-218

Podaci o odgovornosti

Autori

Kužina, Vjeko ; Vušak, Eugen ; Jović, Alan

Osnovni podaci na izvornom jeziku
Osnovni podaci na ostalim jezicima

Jezik

engleski

Naslov

Methods for Automatic Sensitive Data Detection in Large Datasets: a Review

Sažetak

In recent years, the need for detection and deidentification of sensitive data in both structured and unstructured forms has increased. The methods used for these tasks have evolved accordingly and currently there are many solutions in different areas of interest. This paper describes the need for the detection of sensitive data in large datasets and describes the challenges associated with automating the detection process. It gives a brief overview of the rule-based and machine learning methods used in this area and examples of their application. The advantages and disadvantages of the described methods are also discussed. We show that the most recent detection solutions are based on the latest and most advanced models proposed in the field of natural language processing, but that there are still some rule-based methods used for certain types of sensitive data. In recent years, the need for detection and de-identification of sensitive data in both structured and unstructured forms has increased. The methods used for these tasks have evolved accordingly and currently there are many solutions in different areas of interest. This paper describes the need for the detection of sensitive data in large datasets and describes the challenges associated with automating the detection process. It gives a brief overview of the rule-based and machine learning methods used in this area and examples of their application. The advantages and disadvantages of the described methods are also discussed. We show that the most recent detection solutions are based on the latest and most advanced models proposed in the field of natural language processing, but that there are still some rule-based methods used for certain types of sensitive data.

Ključne riječi

sensitive data ; detection ; de-identification ; unstructured data ; machine learning ; named entity recognition

Napomena

nije evidentirano

Jezik

nije evidentirano

Naslov

nije evidentirano

Sažetak

nije evidentirano

Ključne riječi

nije evidentirano

Napomena

nije evidentirano

Podaci o prilogu

Stranice rada

213-218.

Godina izdavanja

2021.

Status objave rada

objavljeno

Podaci o matičnoj publikaciji

Naslov

MIPRO 2021 Proceeedings

Urednici

Skala, Karolj

Izdavač

Rijeka: Hrvatska udruga za informacijsku i komunikacijsku tehnologiju, elektroniku i mikroelektroniku - MIPRO

ISSN

1847-3938

e-ISSN

1847-3946

Podaci o skupu

Skup

MIPRO 2021

Vrsta sudjelovanja

predavanje

Datum održavanja skupa

27.09.2021-01.10.2021

Mjesto održavanja skupa

Opatija, Hrvatska

Povezanost rada

Povezane osobe

Vjeko Kužina (autor/i)

Alan Jović (autor/i)

Povezane ustanove

Fakultet elektrotehnike i računarstva (036) (autorova ustanova)

Područje

Računarstvo