• 中国计算机学会会刊
  • 中国科技核心期刊
  • 中文核心期刊

计算机工程与科学 ›› 2025, Vol. 47 ›› Issue (02): 327-335.

• 人工智能与数据挖掘 • 上一篇    下一篇

基于多层次密度中心图的聚类算法

卢建云1,2,邵俊明1   

  1. (1.电子科技大学计算机科学与工程学院(网络空间安全学院),四川 成都 611731;
    2.重庆电子科技职业大学人工智能与大数据学院,重庆  401331)

  • 收稿日期:2024-07-02 修回日期:2024-08-23 接受日期:2025-02-25 出版日期:2025-02-25 发布日期:2025-02-24
  • 基金资助:
    国家自然科学基金(62376054);重庆市教委科学技术研究项目(KJQN202103109)

A clustering algorithm based on the multi-level density center graph

LU Jianyun1,2,SHAO Junming1   

  1.  (1.School of Computer Science and Engineering(School of Cybersecurity),
    University of Electronic Science and Technology of China,Chengdu 611731;
    2.Artificial Intelligence and Big Data College,
    Chongqing Polytechnic University of Electronic Technology,Chongqing 401331,China)
  • Received:2024-07-02 Revised:2024-08-23 Accepted:2025-02-25 Online:2025-02-25 Published:2025-02-24

摘要: 密度聚类是一种依据数据对象之间的密度关系进行聚类的算法。密度聚类通过判断数据集中低密度对象与密度中心对象的隶属关系实现对数据集的划分,能够有效地处理数据集中各种大小、不同形状和密度的簇。然而,受到数据集变密度、噪声和复杂分布的影响,如何准确估计数据对象的局部密度并通过密度中心确定聚类数目仍是需要研究的问题。针对上述密度聚类问题提出一种多层次密度中心图的聚类算法CMDCG。首先,基于每个数据对象的邻域,利用信息熵计算其局部密度;其次,依据局部密度和邻域空间确定每个数据对象的隶属关系并确定密度中心;最后,通过变化邻域空间得到多层次密度中心,根据多层次密度中心的隶属关系构建图结构,得到图的连通分量即为初始聚类,其他数据对象根据隶属关系划归到对应的初始聚类。在人工和真实数据集上的实验结果表明,CMDCG算法能够准确地识别聚类数目并形成正确的初始聚类,算法对变密度和噪声情况下的数据集有很好的鲁棒性。

关键词: 密度聚类, 多层次密度中心, 连通图, 信息熵, 邻域空间

Abstract: Density-based clustering is an algorithm that partitions a dataset based on the density relationships among data objects. By determining the membership relationships between low-density objects and density-center objects within the dataset, density-based clustering can effectively handle clusters of various sizes, shapes, and densities. However, due to the impact of variable densities, noise and complex distributions within datasets, how to accurately estimate the local density of data objects and determine the number of clusters through density centers remain challenges that require further research. To address these issues in density-based clustering, a clustering algorithm based on the multi-level density center graph (CMDCG) is proposed. Firstly, the local density of each data object is calculated using information entropy based on its neighborhood. Secondly, the membership relationships of each data object are statistically analyzed according to its local density and neighborhood space, and density centers are determined. Finally, multi-level density centers are obtained by varying the neighborhood space, and a graph structure is constructed based on the membership relationships among these multi-level density centers. The connected components of the graph are identified as initial clusters, and other data objects are assigned to these initial clusters based on their membership relationships. Experimental results on both synthetic and real dataset demonstrate that the CMDCG algorithm can accurately identify the number of clusters and form correct initial clusters, with clustering results that are robust to varying densities and noise.

Key words: density clustering, multi-level density center, connected graph, information entropy, neighborhood space