基于多层次密度中心图的聚类算法

计算机工程与科学 ›› 2025, Vol. 47 ›› Issue (02): 327-335.

基于多层次密度中心图的聚类算法

卢建云1,2,邵俊明1

(1.电子科技大学计算机科学与工程学院（网络空间安全学院）,四川成都 611731；
2.重庆电子科技职业大学人工智能与大数据学院,重庆 401331)

收稿日期:2024-07-02 修回日期:2024-08-23 接受日期:2025-02-25 出版日期:2025-02-25 发布日期:2025-02-24
基金资助:
国家自然科学基金(62376054);重庆市教委科学技术研究项目(KJQN202103109)

A clustering algorithm based on the multi-level density center graph

LU Jianyun1,2,SHAO Junming1

(1.School of Computer Science and Engineering(School of Cybersecurity),
University of Electronic Science and Technology of China,Chengdu 611731;
2.Artificial Intelligence and Big Data College,
Chongqing Polytechnic University of Electronic Technology,Chongqing 401331,China)

Received:2024-07-02 Revised:2024-08-23 Accepted:2025-02-25 Online:2025-02-25 Published:2025-02-24

摘要/Abstract

摘要： 密度聚类是一种依据数据对象之间的密度关系进行聚类的算法。密度聚类通过判断数据集中低密度对象与密度中心对象的隶属关系实现对数据集的划分,能够有效地处理数据集中各种大小、不同形状和密度的簇。然而,受到数据集变密度、噪声和复杂分布的影响,如何准确估计数据对象的局部密度并通过密度中心确定聚类数目仍是需要研究的问题。针对上述密度聚类问题提出一种多层次密度中心图的聚类算法CMDCG。首先,基于每个数据对象的邻域，利用信息熵计算其局部密度；其次,依据局部密度和邻域空间确定每个数据对象的隶属关系并确定密度中心；最后,通过变化邻域空间得到多层次密度中心,根据多层次密度中心的隶属关系构建图结构,得到图的连通分量即为初始聚类,其他数据对象根据隶属关系划归到对应的初始聚类。在人工和真实数据集上的实验结果表明,CMDCG算法能够准确地识别聚类数目并形成正确的初始聚类,算法对变密度和噪声情况下的数据集有很好的鲁棒性。

关键词: 密度聚类, 多层次密度中心, 连通图, 信息熵, 邻域空间

Abstract: Density-based clustering is an algorithm that partitions a dataset based on the density relationships among data objects. By determining the membership relationships between low-density objects and density-center objects within the dataset, density-based clustering can effectively handle clusters of various sizes, shapes, and densities. However, due to the impact of variable densities, noise and complex distributions within datasets, how to accurately estimate the local density of data objects and determine the number of clusters through density centers remain challenges that require further research. To address these issues in density-based clustering, a clustering algorithm based on the multi-level density center graph (CMDCG) is proposed. Firstly, the local density of each data object is calculated using information entropy based on its neighborhood. Secondly, the membership relationships of each data object are statistically analyzed according to its local density and neighborhood space, and density centers are determined. Finally, multi-level density centers are obtained by varying the neighborhood space, and a graph structure is constructed based on the membership relationships among these multi-level density centers. The connected components of the graph are identified as initial clusters, and other data objects are assigned to these initial clusters based on their membership relationships. Experimental results on both synthetic and real dataset demonstrate that the CMDCG algorithm can accurately identify the number of clusters and form correct initial clusters, with clustering results that are robust to varying densities and noise.

Key words: density clustering, multi-level density center, connected graph, information entropy, neighborhood space

卢建云, 邵俊明. 基于多层次密度中心图的聚类算法[J]. 计算机工程与科学, 2025, 47(02): 327-335.

LU Jianyun, SHAO Junming. A clustering algorithm based on the multi-level density center graph[J]. Computer Engineering & Science, 2025, 47(02): 327-335.

编辑推荐

Metrics

阅读次数

全文

111

HTML			PDF

最新录用	在线预览	正式出版	最新录用	在线预览	正式出版
0	0	0	0	0	111

来源	本网站	其他网站

次数	73	38
比例	66%	34%

摘要

最新录用	在线预览	正式出版

0	0	65

	来源	本网站

	次数	64
	比例	100%

[1]	王若宾, 耿芳东, 张永梅, 宋威, 王伟锋, 徐琳. 基于改进自适应DBSCAN的混合式MOOC视频观看模式挖掘[J]. 计算机工程与科学, 2023, 45(09): 1670-1678.
[2]	肖雪,薛善良. 基于改进的OPTICS聚类和LOPW的离群数据检测算法[J]. 计算机工程与科学, 2019, 41(05): 885-892.
[3]	刘国梁，钱晓东. 基于加权网络结构的冷门资源推荐算法[J]. 计算机工程与科学, 2018, 40(05): 916-923.
[4]	黄冬梅1，杜艳玲1,2，张律文1. 基于信息熵种子点选取的流线可视化[J]. 计算机工程与科学, 2018, 40(03): 411-417.
[5]	高双，桑庆兵，严大卫. 基于非下采样轮廓波变换和多核学习的盲图像质量评价[J]. 计算机工程与科学, 2017, 39(06): 1171-1178.
[6]	陈曦，成韵姿. 一种优化组合相似度的协同过滤推荐算法[J]. 计算机工程与科学, 2017, 39(01): 180-187.
[7]	吴思博，陈志刚，黄瑞. 基于相关系数的ID3优化算法[J]. 计算机工程与科学, 2016, 38(11): 2342-2347.
[8]	李少年，吴良刚. 基于邻域信息熵度量数值属性快速约简算法[J]. J4, 2016, 38(02): 350-355.
[9]	梅松青，周洪建. 基于信息熵的局部线性嵌入[J]. J4, 2014, 36(09): 1806-1811.
[10]	秦立龙1,王振宇2. 改进粒子群算法在调制模式识别中的应用[J]. J4, 2013, 35(7): 102-107.
[11]	朱文杰1,2，王强2，翟献军1. 基于信息熵的SVM入侵检测技术[J]. J4, 2013, 35(6): 47-51.
[12]	史志才，夏永祥. 基于粗糙集的入侵检测方法研究[J]. J4, 2012, 34(2): 13-18.
[13]	赵茜，王新生. 一种动态信任度量与预测方法研究[J]. J4, 2010, 32(9): 20-22.
[14]	陈凤娟，孙静. 粗糙集信息观中的绝对约简[J]. J4, 2010, 32(5): 97-99.
[15]	徐果毅朱宁波朱晓林卢晓阳. 基于颜色直方图熵值及分块主色的图像检索[J]. J4, 2008, 30(9): 44-46.