• 中国计算机学会会刊
  • 中国科技核心期刊
  • 中文核心期刊

计算机工程与科学

• 图形与图像 • 上一篇    下一篇

基于密度权重Canopy的改进K-medoids算法

陈胜发,贾瑞玉   

  1. (安徽大学计算机科学与技术学院,安徽 合肥 230601)
  • 收稿日期:2019-02-23 修回日期:2019-04-24 出版日期:2019-10-25 发布日期:2019-10-25
  • 基金资助:

    国家科技支撑计划项目(2015BAK24B01)

An improved K-medoids algorithm #br# based on density weight Canopy

CHEN Sheng-fa,JIA Rui-yu   

  1. (School of Computer Science and Technology,Anhui University,Hefei 230601,China)
  • Received:2019-02-23 Revised:2019-04-24 Online:2019-10-25 Published:2019-10-25

摘要:

为了提高K-medoids算法的精度和稳定性,并解决K-medoids算法的聚类数目需要人工给定和对初始聚类中心点敏感的问题,提出了基于密度权重Canopy的改进K-medoids算法。该算法首先计算数据集中每个样本点的密度值,选择密度值最大的样本点作为第1个聚类中心,并从数据集中删除这个密度簇;然后通过计算剩下样本点的权重,选择出其他聚类中心;最后将密度权重Canopy作为K-medoids的预处理过程,其结果作为K-medoids算法的聚类数目和初始聚类中心。UCI真实数据集和人工模拟数据集上的仿真实验表明,该算法具有较高的精度和较好的稳定性。

关键词: 聚类, 密度, 权重, 数据挖掘

Abstract:

In order to improve the accuracy and stability of the K-medoids algorithm and solve the problem that the number of clusters of K-medoids algorithm needs to be manually given and is sensitive to the initial cluster center point, we propose an improved K-medoids algorithm based on density weight Canopy. Firstly, we calculate the density value of each sample point in the data set, select the sample point with maximum density value as the first cluster center and remove the density cluster from the data set. Secondly, we select other cluster centers by calculating the weight of the remaining sample points. Finally, the density weight Canopy is used as the preprocessing procedure of the K-medoids and its result is used as the cluster number and initial clustering center of the K-medoids algorithm. The new algorithm is tested on some well-known data sets from UCI real dataset and some artificial simulated data sets. Simulation results show that the new algorithm has higher clustering accuracy and better clustering stability.
 

Key words: clustering, density, weight, data mining