Pretražite po imenu i prezimenu autora, mentora, urednika, prevoditelja

Napredna pretraga

Pregled bibliografske jedinice broj: 1263868

CASSED: Context-based Approach for Structured Sensitive Data Detection


Kužina, Vjeko; Petric, Ana-Marija; Barišić, Marko; Jović, Alan
CASSED: Context-based Approach for Structured Sensitive Data Detection // Expert systems with applications, 223 (2023), 119924, 10 doi:10.1016/j.eswa.2023.119924 (međunarodna recenzija, članak, znanstveni)


CROSBI ID: 1263868 Za ispravke kontaktirajte CROSBI podršku putem web obrasca

Naslov
CASSED: Context-based Approach for Structured Sensitive Data Detection

Autori
Kužina, Vjeko ; Petric, Ana-Marija ; Barišić, Marko ; Jović, Alan

Izvornik
Expert systems with applications (0957-4174) 223 (2023); 119924, 10

Vrsta, podvrsta i kategorija rada
Radovi u časopisima, članak, znanstveni

Ključne riječi
sensitive data detection ; privacy protection ; structured data ; machine learning ; transformers ; context-based detection

Sažetak
The need for sensitive data detection and identification has increased in recent years. Sensitive data detection and identification are necessary steps for privacy protection. The focus in this field has been on unstructured data detection using natural language processing (NLP) approaches, while there has been little progress in the field of structured data. Most of the structured data approaches consider independent feature representations of cells, without taking potentially relevant context into account. In this work, we introduce a novel context-based approach named CASSED, which stands for Context-based Approach for Structured SEnsitive Data Detection. CASSED addresses the problem of sensitive data detection in structured data through the lens of NLP, using the transformer-based BERT method. Our approach aims to actively capture relations both within and between cells in the same column as the assumption is that the data present in the same column in a table are mostly very similar. CASSED works as a classifier for columns in database tables with the task of predicting a label or multiple labels for different types of sensitive data that a column may represent. Since there is no officially recognized dataset for the task, we compared CASSED on datasets used for similar tasks from related work. Furthermore, we created our own dataset focused on sensitive data to evaluate CASSED. Our method outperformed methods from related work both on their datasets and achieved significantly better results on our own dataset compared to our baseline model as well as models from related work. Our research suggests that treating structured data as context-rich is a viable strategy for sensitive data detection and identification.

Izvorni jezik
Engleski

Znanstvena područja
Računarstvo



POVEZANOST RADA


Projekti:
EK-EFRR-KK.01.2.1.02.0038 - Digitalna platforma za zaštitu privatnosti i sprječavanje zlouporaba upravljanjem životnim ciklusom osobnih podataka (AIPD2) (Golub, Marin, EK ) ( CroRIS)

Ustanove:
Fakultet elektrotehnike i računarstva, Zagreb

Profili:

Avatar Url Vjeko Kužina (autor)

Avatar Url Alan Jović (autor)

Citiraj ovu publikaciju:

Kužina, Vjeko; Petric, Ana-Marija; Barišić, Marko; Jović, Alan
CASSED: Context-based Approach for Structured Sensitive Data Detection // Expert systems with applications, 223 (2023), 119924, 10 doi:10.1016/j.eswa.2023.119924 (međunarodna recenzija, članak, znanstveni)
Kužina, V., Petric, A., Barišić, M. & Jović, A. (2023) CASSED: Context-based Approach for Structured Sensitive Data Detection. Expert systems with applications, 223, 119924, 10 doi:10.1016/j.eswa.2023.119924.
@article{article, author = {Ku\v{z}ina, Vjeko and Petric, Ana-Marija and Bari\v{s}i\'{c}, Marko and Jovi\'{c}, Alan}, year = {2023}, pages = {10}, DOI = {10.1016/j.eswa.2023.119924}, chapter = {119924}, keywords = {sensitive data detection, privacy protection, structured data, machine learning, transformers, context-based detection}, journal = {Expert systems with applications}, doi = {10.1016/j.eswa.2023.119924}, volume = {223}, issn = {0957-4174}, title = {CASSED: Context-based Approach for Structured Sensitive Data Detection}, keyword = {sensitive data detection, privacy protection, structured data, machine learning, transformers, context-based detection}, chapternumber = {119924} }
@article{article, author = {Ku\v{z}ina, Vjeko and Petric, Ana-Marija and Bari\v{s}i\'{c}, Marko and Jovi\'{c}, Alan}, year = {2023}, pages = {10}, DOI = {10.1016/j.eswa.2023.119924}, chapter = {119924}, keywords = {sensitive data detection, privacy protection, structured data, machine learning, transformers, context-based detection}, journal = {Expert systems with applications}, doi = {10.1016/j.eswa.2023.119924}, volume = {223}, issn = {0957-4174}, title = {CASSED: Context-based Approach for Structured Sensitive Data Detection}, keyword = {sensitive data detection, privacy protection, structured data, machine learning, transformers, context-based detection}, chapternumber = {119924} }

Časopis indeksira:


  • Current Contents Connect (CCC)
  • Web of Science Core Collection (WoSCC)
    • Science Citation Index Expanded (SCI-EXP)
    • SCI-EXP, SSCI i/ili A&HCI
  • Scopus


Citati:





    Contrast
    Increase Font
    Decrease Font
    Dyslexic Font