• 中国计算机学会会刊
  • 中国科技核心期刊
  • 中文核心期刊

Computer Engineering & Science ›› 2022, Vol. 44 ›› Issue (02): 372-380.

Previous Articles    

A Canopy bisecting K-Means algorithm based on density and central index

SHEN Guo-xin1,JIANG Zhong-yun2   

  1. (1.College of Information,Shanghai Ocean University,Shanghai 201306;

    2.College of Information,Shanghai Jian Qiao University,Shanghai 201306,China)

  • Received:2020-05-26 Revised:2020-09-21 Accepted:2022-02-25 Online:2022-02-25 Published:2022-02-18

Abstract: Aiming at the problem of unstable clustering results caused by the random selection of initial centers and artificially defining the number of clusters in the bisecting K-means algorithm, a Canopy bisecting K-means algorithm based on density and center index is proposed. Firstly, the algorithm calculates the data density in the sample and its neighborhood radius. Secondly, the data with the smallest density are selected and the ideas of the Canopy algorithm is combined for clustering. The number of clusters and cluster centers are obtained as the input parameters of the bisecting K-means algorithm. Finally, based on the bisecting K-means algorithm, the exponential function and central index are introduced to cluster the original samples. UCI data set and self-built data set were used to compare simulation experiments. The results show that the algorithm not only makes the clustering results more accurate and faster, but also has better stability.


Key words: clustering, bisecting K-Means algorithm, density, neighborhood radius, exponential function, central index