• 中国计算机学会会刊
  • 中国科技核心期刊
  • 中文核心期刊

J4 ›› 2016, Vol. 38 ›› Issue (05): 839-847.

• 论文 •     Next Articles

A sparse local scaling parallel
spectral clustering algorithm based on MPI    

LI Ruilin1,2 ,ZHAO Yonghua1,HUANG Xiaolei2,3   

  1. (1.The Department of High Performance Computing,Computer Network Information Center,
    Chinese Academy of Sciences,Beijing 100190;
    2.University of Chinese Academy of Sciences,Beijing 100190;
    3.Computer Network Information Center,Chinese Academy of Sciences,Beijing 100190,China)
  • Received:2015-12-14 Revised:2016-02-17 Online:2016-05-25 Published:2016-05-25

Abstract:

The spectral clustering algorithm is widely used in many fields because of its advantages of identifying the nonconvex data distribution, and effectively avoiding the local optimal solution without the dimension limitation of data points. However, with the growth of the amount and dimension of the data, it is very necessary to reduce the algorithm’s computation time on the premise of guaranteeing the clustering accuracy. Moreover, besides the data set itself, the factors affecting the clustering quality of the spectral clustering algorithm include the method of solving distance matrix, the scale parameters of similarity matrix and the form of Laplacian matrix. Aiming at the problems mentioned above, we apply the message passing interface (MPI) parallel programming model to the spectral clustering algorithm. The tnearest neighbor method is then used in the transformation of Laplacian matrix approximation in the spectral clustering algorithm. Meanwhile, we select the local scaling parameter as the selftuning scaling parameter in the algorithm. Based on the above analysis, we propose a parallel implementation of the sparse local scaling parallel spectral clustering (SLSPSC), then conduct experiments on four different data sets, and analyze and compare the results with those of the current parallel spectral clustering (PSC) in running time and clustering quality. Experimental results show that the total computation time of the SLSPSC is greatly reduced when calculating the Laplacian matrix, and the quality of some data sets is improved.

Key words: parallel spectral clustering;sparsification;local scaling;MPI