Solution for detecting sensitive data inside a data lake (CROSBI ID 664358)
Prilog sa skupa u zborniku | izvorni znanstveni rad | međunarodna recenzija
Podaci o odgovornosti
Tovernić, Silvija ; Banović, Vlaho ; Hrastić, Zlatko ; Plantić, Katarina ; Šandić, Agneza ; Baranović, Mirta
engleski
Solution for detecting sensitive data inside a data lake
This paper is the result of a project realized by a team of current master’s degree students. The team created an algorithm for recognition of sensitive data, primarily name, surname and OIB (Croatian personal identification number). Same algorithm iterates across given unstructured texts and appoints tags for documents considering the existence of specific sensitive data. This process offers a way for companies to narrow down the search for personal information if a client demands removal of his data. Similar algorithm was implemented for working with server logs as well, which are represented as data streams and analyzed in real time. To provide insight on the quantity of sensitive information and how it is distributed across different types of documents the team created a dashboard that shows statistical data accumulated by developed algorithms. The solution is stored on Cloudera, Apache Hadoop-based open source platform designed for data management and analytics, which is deployed on Microsoft Azure cloud infrastructure.
Data lake ; sensitive data ; GDPR ; data streams
nije evidentirano
nije evidentirano
nije evidentirano
nije evidentirano
nije evidentirano
nije evidentirano
Podaci o prilogu
1474-1478.
2018.
objavljeno
10.23919/MIPRO.2018.8400232
Podaci o matičnoj publikaciji
MIPRO 2018 : 41st International Convention: proceedings
Skala, Karolj
Rijeka: Hrvatska udruga za informacijsku i komunikacijsku tehnologiju, elektroniku i mikroelektroniku - MIPRO
978-953-233-096-0
Podaci o skupu
MIPRO 2018
predavanje
21.05.2018-25.05.2018
Opatija, Hrvatska
Povezanost rada
Elektrotehnika