• 中国计算机学会会刊
  • 中国科技核心期刊
  • 中文核心期刊

计算机工程与科学 ›› 2021, Vol. 43 ›› Issue (12): 2243-2252.

• 人工智能与数据挖掘 • 上一篇    下一篇

基于局部中心度量的边界点划分密度聚类算法

张梅,陈梅,李明   

  1. (兰州交通大学电子与信息工程学院,甘肃 兰州 730070)

  • 收稿日期:2020-05-06 修回日期:2020-08-12 接受日期:2021-12-25 出版日期:2021-12-25 发布日期:2021-12-31
  • 基金资助:
    国家自然科学基金(61762057)

A density clustering algorithm of boundary point division based on local center measure

ZHANG Mei,CHEN Mei,LI Ming   

  1. (School of Electronic and Information Engineering,Lanzhou Jiaotong University,Lanzhou 730070,China)
  • Received:2020-05-06 Revised:2020-08-12 Accepted:2021-12-25 Online:2021-12-25 Published:2021-12-31

摘要: 针对聚类算法在检测任意簇时精确度不高、迭代次数多及效果不佳等缺点,提出了基于局部中心度量的边界点划分密度聚类算法——DBLCM。在局部中心度量的限制下,数据点被划分到核心区域或边界区域。核心区域的点按照互近邻优先成簇的分配方式形成初始簇,边界区域的点参考互近邻中距离最近点所在簇进行分配,从而得到最终簇。为验证算法的有效性,将DBLCM与3个经典算法和3个近几年新提出的优秀算法,在包含任意形状、任意密度的二维数据集和任意维度的多维数据集上进行测试。另外,为了验证DBLCM算法中参数k的敏感性,在所用的数据集上做了k值与簇质量的相关性测试。实验结果表明,DBLCM算法具有识别精度高,检测任意簇效果好和无需迭代等优点,综合性能优于6个对比算法。


关键词: 局部中心度量, 核心区域, 边界区域, 互近邻

Abstract: Aiming at the shortcomings of clustering algorithms in detecting arbitrary clusters, such as low recognition accuracy, large number of iterations, and poor detection effect, this paper proposes a density clustering algorithm of boundary point division based on local center measure (named DBLCM). Under the limitation of the local center measure, data points are divided into core areas or boundary areas. The points of core region are grouped together to form initial clusters according to the allocation mode of the priority of mutual nearest neighbors, and the points in the boundary region are allocated according to the clusters of the nearest points among its mutual nearest neighbors to obtain the final cluster structure. To verify the algorithm effectiveness, DBLCM is compared with three classic algorithms and three outstanding algorithms newly proposed in recent years on the two-dimensional datasets con

taining arbitrary shapes and arbitrary densities as well as the multi-dimensional datasets with arbitrary dimensions. In addition, to verify the sensitivity of the parameter k in the DBLCM algorithm, a correlation test is conducted between the k value and the cluster quality on different types of datasets. The experimental results show that the DBLCM algorithm has the advantages such as high recognition accuracy, good ability to detect arbitrary clusters, and no iteration, and it has better comprehensive performance than the six comparison algorithms.


Key words: local central measure, core region, boundary region, mutual nearest neighbor