基于密度权重Canopy的改进K-medoids算法

计算机工程与科学

基于密度权重Canopy的改进K-medoids算法

陈胜发，贾瑞玉

（安徽大学计算机科学与技术学院，安徽合肥 230601）

收稿日期:2019-02-23 修回日期:2019-04-24 出版日期:2019-10-25 发布日期:2019-10-25
基金资助:
国家科技支撑计划项目(2015BAK24B01)

An improved K-medoids algorithm #br# based on density weight Canopy

CHEN Sheng-fa，JIA Rui-yu

（School of Computer Science and Technology,Anhui University,Hefei 230601,China）

Received:2019-02-23 Revised:2019-04-24 Online:2019-10-25 Published:2019-10-25

摘要/Abstract

摘要：

为了提高K-medoids算法的精度和稳定性，并解决K-medoids算法的聚类数目需要人工给定和对初始聚类中心点敏感的问题，提出了基于密度权重Canopy的改进K-medoids算法。该算法首先计算数据集中每个样本点的密度值，选择密度值最大的样本点作为第1个聚类中心，并从数据集中删除这个密度簇；然后通过计算剩下样本点的权重，选择出其他聚类中心；最后将密度权重Canopy作为K-medoids的预处理过程，其结果作为K-medoids算法的聚类数目和初始聚类中心。UCI真实数据集和人工模拟数据集上的仿真实验表明，该算法具有较高的精度和较好的稳定性。

关键词: 聚类, 密度, 权重, 数据挖掘

Abstract:

In order to improve the accuracy and stability of the K-medoids algorithm and solve the problem that the number of clusters of K-medoids algorithm needs to be manually given and is sensitive to the initial cluster center point, we propose an improved K-medoids algorithm based on density weight Canopy. Firstly, we calculate the density value of each sample point in the data set, select the sample point with maximum density value as the first cluster center and remove the density cluster from the data set. Secondly, we select other cluster centers by calculating the weight of the remaining sample points. Finally, the density weight Canopy is used as the preprocessing procedure of the K-medoids and its result is used as the cluster number and initial clustering center of the K-medoids algorithm. The new algorithm is tested on some well-known data sets from UCI real dataset and some artificial simulated data sets. Simulation results show that the new algorithm has higher clustering accuracy and better clustering stability.

Key words: clustering, density, weight, data mining

陈胜发，贾瑞玉. 基于密度权重Canopy的改进K-medoids算法[J]. 计算机工程与科学.

CHEN Sheng-fa，JIA Rui-yu. An improved K-medoids algorithm #br# based on density weight Canopy[J]. Computer Engineering & Science.

[1]	俞丁翠, 罗龙飞, 宋云鹏, 李文通, 石亮. 面向高密度闪存的内存页大小探索[J]. 计算机工程与科学, 2024, 46(07): 1167-1174.
[2]	柴旭清, 乔一航, 范黎林, . 一种基于随机森林分类器构建高性能应用程序性能分析模型的方法[J]. 计算机工程与科学, 2024, 46(07): 1218-1228.
[3]	于勤, 吴非, 张猛, 谢长生. 全息存储中的纠错码研究综述[J]. 计算机工程与科学, 2024, 46(04): 571-579.
[4]	赵琰, 马慧芳, 王文涛, 童海斌, 贺相春. 可靠响应表示增强的知识追踪方法[J]. 计算机工程与科学, 2024, 46(03): 535-544.
[5]	宋鑫海, 韩京宇, 郎杭, 毛毅. 滑动窗口投票策略的QRS波群形态识别[J]. 计算机工程与科学, 2024, 46(02): 272-281.
[6]	刘振超, 苑迎春, 王克俭, 何晨. 融合特征权重与改进粒子群优化的特征选择算法[J]. 计算机工程与科学, 2024, 46(02): 282-291.
[7]	钟卓辉, 陈黎飞, . 基于模型的非凸聚类算法[J]. 计算机工程与科学, 2024, 46(02): 292-302.
[8]	肖振国, 陈林书, 孙少杰, 梅本霞, 柳媛慧, 赵磊. 基于代数粒的聚类方法[J]. 计算机工程与科学, 2024, 46(01): 150-158.
[9]	王若宾, 耿芳东, 张永梅, 宋威, 王伟锋, 徐琳. 基于改进自适应DBSCAN的混合式MOOC视频观看模式挖掘[J]. 计算机工程与科学, 2023, 45(09): 1670-1678.
[10]	雷轩, 程光, 张玉健, 郭靓, 张付存. 基于电力网络态势感知平台的告警信息关联分析[J]. 计算机工程与科学, 2023, 45(07): 1197-1208.
[11]	柴岩, 朱玉, 任生. 多策略协同的改进鲸鱼优化算法[J]. 计算机工程与科学, 2023, 45(07): 1308-1319.
[12]	黄学雨, 罗华. 自适应变异蝴蝶优化算法[J]. 计算机工程与科学, 2023, 45(06): 1123-1133.
[13]	陈彪, 陈才, 张坤, 叶琴. FCBGA封装的CPU芯片散热性能影响因素研究[J]. 计算机工程与科学, 2023, 45(03): 406-410.
[14]	李超, 涂国庆, . 高密度LoRa网络优化方法研究[J]. 计算机工程与科学, 2023, 45(03): 426-433.
[15]	董佩杰, 牛新, 魏自勉, 陈学晖. 单次神经网络结构搜索研究综述[J]. 计算机工程与科学, 2023, 45(02): 191-203.