Pregled bibliografske jedinice broj: 948938
Solution for detecting sensitive data inside a data lake
Solution for detecting sensitive data inside a data lake // MIPRO 2018 : 41st International Convention: proceedings / Skala, Karolj (ur.).
Rijeka: Hrvatska udruga za informacijsku i komunikacijsku tehnologiju, elektroniku i mikroelektroniku - MIPRO, 2018. str. 1474-1478 doi:10.23919/MIPRO.2018.8400232 (predavanje, međunarodna recenzija, cjeloviti rad (in extenso), znanstveni)
CROSBI ID: 948938 Za ispravke kontaktirajte CROSBI podršku putem web obrasca
Naslov
Solution for detecting sensitive data inside a data
lake
Autori
Tovernić, Silvija ; Banović, Vlaho ; Hrastić, Zlatko ; Plantić, Katarina ; Šandić, Agneza ; Baranović, Mirta
Vrsta, podvrsta i kategorija rada
Radovi u zbornicima skupova, cjeloviti rad (in extenso), znanstveni
Izvornik
MIPRO 2018 : 41st International Convention: proceedings
/ Skala, Karolj - Rijeka : Hrvatska udruga za informacijsku i komunikacijsku tehnologiju, elektroniku i mikroelektroniku - MIPRO, 2018, 1474-1478
ISBN
978-953-233-096-0
Skup
41st International Convention on Information and Communication Technology, Electronics and Microelectronics (MIPRO 2018)
Mjesto i datum
Opatija, Hrvatska, 21.05.2018. - 25.05.2018
Vrsta sudjelovanja
Predavanje
Vrsta recenzije
Međunarodna recenzija
Ključne riječi
Data lake ; sensitive data ; GDPR ; data streams
Sažetak
This paper is the result of a project realized by a team of current master’s degree students. The team created an algorithm for recognition of sensitive data, primarily name, surname and OIB (Croatian personal identification number). Same algorithm iterates across given unstructured texts and appoints tags for documents considering the existence of specific sensitive data. This process offers a way for companies to narrow down the search for personal information if a client demands removal of his data. Similar algorithm was implemented for working with server logs as well, which are represented as data streams and analyzed in real time. To provide insight on the quantity of sensitive information and how it is distributed across different types of documents the team created a dashboard that shows statistical data accumulated by developed algorithms. The solution is stored on Cloudera, Apache Hadoop-based open source platform designed for data management and analytics, which is deployed on Microsoft Azure cloud infrastructure.
Izvorni jezik
Engleski
Znanstvena područja
Elektrotehnika
POVEZANOST RADA
Ustanove:
Fakultet elektrotehnike i računarstva, Zagreb
Citiraj ovu publikaciju:
Časopis indeksira:
- Web of Science Core Collection (WoSCC)
- Conference Proceedings Citation Index - Science (CPCI-S)
- Scopus