Pregled bibliografske jedinice broj: 1271176
A hybrid dissimilarity measure for mixed-type data clustering
A hybrid dissimilarity measure for mixed-type data clustering // Book of Abstracts BIOSTAT 2021 - 25th Int. Scientific Symposium on Biometrics / Jazbec, Anamarija ; Pecina, Marija ; Sonicki, Zdrenko ; Šimić, Diana ; Vedriš, Mislav ; Sović, Slavica (ur.).
Zagreb: Hrvatsko biometrijsko društvo, 2021. str. 18-18 (predavanje, međunarodna recenzija, sažetak, znanstveni)
CROSBI ID: 1271176 Za ispravke kontaktirajte CROSBI podršku putem web obrasca
Naslov
A hybrid dissimilarity measure for mixed-type data
clustering
Autori
Perišić, Ana
Vrsta, podvrsta i kategorija rada
Sažeci sa skupova, sažetak, znanstveni
Izvornik
Book of Abstracts BIOSTAT 2021 - 25th Int. Scientific Symposium on Biometrics
/ Jazbec, Anamarija ; Pecina, Marija ; Sonicki, Zdrenko ; Šimić, Diana ; Vedriš, Mislav ; Sović, Slavica - Zagreb : Hrvatsko biometrijsko društvo, 2021, 18-18
Skup
BIOSTAT 2021 - 25th International Scientific Symposium on Biometrics
Mjesto i datum
Poreč, Hrvatska, 08.09.2021. - 10.09.2021
Vrsta sudjelovanja
Predavanje
Vrsta recenzije
Međunarodna recenzija
Ključne riječi
clustering, mixed data, hybrid dissimilarity measure, robust
Sažetak
One of the greatest challenges in clustering mixed-type data is finding the adequate distance function between objects. Most distance metrics work either with continuous only or categorical-only data, but in applications, however, mixed-type data are prevalent in many real-world applications. Hybrid distance methods involve selecting a distance function that can accommodate mixed-type variables where a popular hybrid distance function is Gower’s distance. This work presents a hybrid dissimilarity measure for mixed-type data where distances are calculated conditional on the feature type. The proposed dissimilarity measure is established as a normalized linear combination of distances following the principles of Gower’s coefficient calculation. For numerical features, distances are calculated by applying a modified winsorized Huber loss, while for categorical features, a distance measure based on variable entropy is incorporated. The established measure is robust to outliers, skewed and sparse data, and can handle unbalanced categorical features and highly skewed numerical features.
Izvorni jezik
Engleski
Znanstvena područja
Matematika, Interdisciplinarne prirodne znanosti