Applying subclustering and Lp distance in Weighted K-Means with distributed centroids

de Amorim, Renato Cordeiro et Makarenkov, Vladimir (2016). « Applying subclustering and Lp distance in Weighted K-Means with distributed centroids ». Neurocomputing, 173(part 3), pp. 700-707.

Fichier(s) associé(s) à ce document :
[img]
Prévisualisation
PDF
Télécharger (351kB)

Résumé

We consider the weighted K-Means algorithm with distributed centroids aimed at clustering data sets with numerical, categorical and mixed types of data. Our approach allows given features (i.e., variables) to have different weights at different clusters. Thus, it supports the intuitive idea that features may have different degrees of relevance at different clusters. We use the Minkowski metric in a way that feature weights become feature re-scaling factors for any considered exponent. Moreover, the traditional Silhouette clustering validity index was adapted to deal with both numerical and categorical types of features. Finally, we show that our new method usually outperforms traditional K-Means as well as the recently proposed WK-DC clustering algorithm.

Type: Article de revue scientifique
Mots-clés ou Sujets: clustering, mixed data, feature weighting, K-Means, Minkowski metric.
Unité d'appartenance: Faculté des sciences > Département d'informatique
Déposé par: Vladimir Makarenkov
Date de dépôt: 10 févr. 2016 14:51
Dernière modification: 20 avr. 2016 18:43
Adresse URL : http://archipel.uqam.ca/id/eprint/7780

Statistiques

Voir les statistiques sur cinq ans...