Pretražite po imenu i prezimenu autora, mentora, urednika, prevoditelja

Napredna pretraga

Pregled bibliografske jedinice broj: 1077270

Statistical hierarchical clustering algorithm for outlier detection in evolving data streams


Krleža, Dalibor; Vrdoljak, Boris; Brčić, Mario
Statistical hierarchical clustering algorithm for outlier detection in evolving data streams // Machine learning, 1 (2020), 1, 40 doi:10.1007/s10994-020-05905-4 (međunarodna recenzija, članak, znanstveni)


CROSBI ID: 1077270 Za ispravke kontaktirajte CROSBI podršku putem web obrasca

Naslov
Statistical hierarchical clustering algorithm for outlier detection in evolving data streams

Autori
Krleža, Dalibor ; Vrdoljak, Boris ; Brčić, Mario

Izvornik
Machine learning (0885-6125) 1 (2020); 1, 40

Vrsta, podvrsta i kategorija rada
Radovi u časopisima, članak, znanstveni

Ključne riječi
Big data ; Clustering ; Anomaly detection ; Fraud detection

Sažetak
Anomaly detection is a hard data analysis process that requires constant creation and improvement of data analysis algorithms. Using traditional clustering algorithms to analyse data streams is impossible due to processing power and memory issues. To solve this, the traditional clustering algorithm complexity needed to be reduced, which led to the creation of sequential clustering algorithms. The usual approach is two-phase clustering, which uses online phase to relax data details and complexity, and offline phase to cluster concepts created in the online phase. Detecting anomalies in a data stream is usually solved in the online phase, as it requires unreduced data. Contrarily, producing good macro- clustering is done in the offline phase, which is the reason why two-phase clustering algorithms have difficulty being equally good in anomaly detection and macro-clustering. In this paper, we propose a statistical hierarchical clustering algorithm equally suitable for both detecting anomalies and macro-clustering. The proposed algorithm is single-phased and uses statistical inference on the input data stream, resulting in statistical distributions that are constantly updated. This makes the classification adaptable, allowing agglomeration of outliers into clusters, tracking population evolution, and to be used without knowing the expected number of clusters and outliers. The proposed algorithm was tested against typical clustering algorithms, including two-phase algorithms suitable for data stream analysis. A number of typical test cases were selected, to show the universality and qualities of the proposed clustering algorithm.

Izvorni jezik
Engleski

Znanstvena područja
Matematika, Računarstvo, Informacijske i komunikacijske znanosti



POVEZANOST RADA


Projekti:
KK.01.1.1.01.0009 - Napredne metode i tehnologije u znanosti o podatcima i kooperativnim sustavima (EK )

Ustanove:
Fakultet elektrotehnike i računarstva, Zagreb

Profili:

Avatar Url Mario Brčić (autor)

Avatar Url Boris Vrdoljak (autor)

Avatar Url Dalibor Krleža (autor)

Poveznice na cjeloviti tekst rada:

doi

Citiraj ovu publikaciju:

Krleža, Dalibor; Vrdoljak, Boris; Brčić, Mario
Statistical hierarchical clustering algorithm for outlier detection in evolving data streams // Machine learning, 1 (2020), 1, 40 doi:10.1007/s10994-020-05905-4 (međunarodna recenzija, članak, znanstveni)
Krleža, D., Vrdoljak, B. & Brčić, M. (2020) Statistical hierarchical clustering algorithm for outlier detection in evolving data streams. Machine learning, 1, 1, 40 doi:10.1007/s10994-020-05905-4.
@article{article, author = {Krle\v{z}a, Dalibor and Vrdoljak, Boris and Br\v{c}i\'{c}, Mario}, year = {2020}, pages = {40}, DOI = {10.1007/s10994-020-05905-4}, chapter = {1}, keywords = {Big data, Clustering, Anomaly detection, Fraud detection}, journal = {Machine learning}, doi = {10.1007/s10994-020-05905-4}, volume = {1}, issn = {0885-6125}, title = {Statistical hierarchical clustering algorithm for outlier detection in evolving data streams}, keyword = {Big data, Clustering, Anomaly detection, Fraud detection}, chapternumber = {1} }
@article{article, author = {Krle\v{z}a, Dalibor and Vrdoljak, Boris and Br\v{c}i\'{c}, Mario}, year = {2020}, pages = {40}, DOI = {10.1007/s10994-020-05905-4}, chapter = {1}, keywords = {Big data, Clustering, Anomaly detection, Fraud detection}, journal = {Machine learning}, doi = {10.1007/s10994-020-05905-4}, volume = {1}, issn = {0885-6125}, title = {Statistical hierarchical clustering algorithm for outlier detection in evolving data streams}, keyword = {Big data, Clustering, Anomaly detection, Fraud detection}, chapternumber = {1} }

Časopis indeksira:


  • Current Contents Connect (CCC)
  • Web of Science Core Collection (WoSCC)
    • Science Citation Index Expanded (SCI-EXP)
    • SCI-EXP, SSCI i/ili A&HCI
  • Scopus


Citati:





    Contrast
    Increase Font
    Decrease Font
    Dyslexic Font