Nalazite se na CroRIS probnoj okolini. Ovdje evidentirani podaci neće biti pohranjeni u Informacijskom sustavu znanosti RH. Ako je ovo greška, CroRIS produkcijskoj okolini moguće je pristupi putem poveznice www.croris.hr
izvor podataka: crosbi !

CASSED: Context-based Approach for Structured Sensitive Data Detection (CROSBI ID 323821)

Prilog u časopisu | izvorni znanstveni rad | međunarodna recenzija

Kužina, Vjeko ; Petric, Ana-Marija ; Barišić, Marko ; Jović, Alan CASSED: Context-based Approach for Structured Sensitive Data Detection // Expert systems with applications, 223 (2023), 119924, 10. doi: 10.1016/j.eswa.2023.119924

Podaci o odgovornosti

Kužina, Vjeko ; Petric, Ana-Marija ; Barišić, Marko ; Jović, Alan

engleski

CASSED: Context-based Approach for Structured Sensitive Data Detection

The need for sensitive data detection and identification has increased in recent years. Sensitive data detection and identification are necessary steps for privacy protection. The focus in this field has been on unstructured data detection using natural language processing (NLP) approaches, while there has been little progress in the field of structured data. Most of the structured data approaches consider independent feature representations of cells, without taking potentially relevant context into account. In this work, we introduce a novel context-based approach named CASSED, which stands for Context-based Approach for Structured SEnsitive Data Detection. CASSED addresses the problem of sensitive data detection in structured data through the lens of NLP, using the transformer-based BERT method. Our approach aims to actively capture relations both within and between cells in the same column as the assumption is that the data present in the same column in a table are mostly very similar. CASSED works as a classifier for columns in database tables with the task of predicting a label or multiple labels for different types of sensitive data that a column may represent. Since there is no officially recognized dataset for the task, we compared CASSED on datasets used for similar tasks from related work. Furthermore, we created our own dataset focused on sensitive data to evaluate CASSED. Our method outperformed methods from related work both on their datasets and achieved significantly better results on our own dataset compared to our baseline model as well as models from related work. Our research suggests that treating structured data as context-rich is a viable strategy for sensitive data detection and identification.

sensitive data detection ; privacy protection ; structured data ; machine learning ; transformers ; context-based detection

nije evidentirano

nije evidentirano

nije evidentirano

nije evidentirano

nije evidentirano

nije evidentirano

Podaci o izdanju

223

2023.

119924

10

objavljeno

0957-4174

1873-6793

10.1016/j.eswa.2023.119924

Trošak objave rada u otvorenom pristupu

Povezanost rada

Računarstvo

Poveznice
Indeksiranost