Methods for Automatic Sensitive Data Detection in Large Datasets: a Review (CROSBI ID 707892)
Prilog sa skupa u zborniku | izvorni znanstveni rad | međunarodna recenzija
Podaci o odgovornosti
Kužina, Vjeko ; Vušak, Eugen ; Jović, Alan
engleski
Methods for Automatic Sensitive Data Detection in Large Datasets: a Review
In recent years, the need for detection and deidentification of sensitive data in both structured and unstructured forms has increased. The methods used for these tasks have evolved accordingly and currently there are many solutions in different areas of interest. This paper describes the need for the detection of sensitive data in large datasets and describes the challenges associated with automating the detection process. It gives a brief overview of the rule-based and machine learning methods used in this area and examples of their application. The advantages and disadvantages of the described methods are also discussed. We show that the most recent detection solutions are based on the latest and most advanced models proposed in the field of natural language processing, but that there are still some rule-based methods used for certain types of sensitive data. In recent years, the need for detection and de-identification of sensitive data in both structured and unstructured forms has increased. The methods used for these tasks have evolved accordingly and currently there are many solutions in different areas of interest. This paper describes the need for the detection of sensitive data in large datasets and describes the challenges associated with automating the detection process. It gives a brief overview of the rule-based and machine learning methods used in this area and examples of their application. The advantages and disadvantages of the described methods are also discussed. We show that the most recent detection solutions are based on the latest and most advanced models proposed in the field of natural language processing, but that there are still some rule-based methods used for certain types of sensitive data.
sensitive data ; detection ; de-identification ; unstructured data ; machine learning ; named entity recognition
nije evidentirano
nije evidentirano
nije evidentirano
nije evidentirano
nije evidentirano
nije evidentirano
Podaci o prilogu
213-218.
2021.
objavljeno
Podaci o matičnoj publikaciji
MIPRO 2021 Proceeedings
Skala, Karolj
Rijeka: Hrvatska udruga za informacijsku i komunikacijsku tehnologiju, elektroniku i mikroelektroniku - MIPRO
1847-3938
1847-3946
Podaci o skupu
MIPRO 2021
predavanje
27.09.2021-01.10.2021
Opatija, Hrvatska