• 中国计算机学会会刊
  • 中国科技核心期刊
  • 中文核心期刊

Computer Engineering & Science

Previous Articles     Next Articles

Design and implementation of a  parallel peer pressure
clustering algorithm based on CombBLAS

ZOU Pei-gang1,2,CHEN Jun1   

  1. (1.Institute of Applied Physics and Computational Mathematics,Beijing 100088;
    2.Graduate School,China Academy of Engineering Physics,Beijing 100088,China)
     
  • Received:2016-08-26 Revised:2016-10-21 Online:2017-03-25 Published:2017-03-25

Abstract:

Graph clustering is a problem of determining natural groups with high connectivity in a graph. This can be useful in fields such as machine learning, data mining, pattern recognition, image analysis and bioinformatics. To meet the graph-theoretic analysis demands of emerging“big data” applications, it is essential to speed up the underlying graph problems of current parallel systems. However, it is difficult to parallelize large-scale graph computation and achieve good performance using traditional approaches due to their irregular graph structure and low operation intensity. We implement a scalable distributed-memory algorithm for peer pressure graph clustering using the sparse matrix infrastructure in Combinatorial BLAS. We first convert the peer pressure graph clustering algorithm to sparse matrix computation, which allows irregular data structures and access patterns in parallel applications to be represented and can efficiently address the graph parallel challenge. Finally, the proposed algorithm is parallelized based on the MPI programming model. Experiments show that when the scale of the graph represented by a sparse matrix is up to 4.3 billion, the  parallel  peer pressure clustering algorithm based on linear algebraic has high performance and is well scalable on the Dawning Supercomputer, and the speedup can be up to 40.1x when the number of core scales to 64.
 

Key words: graph computation, peer pressure clustering, parallel, Combinatorial BLAS, sparse matrix, large-scale graph, MPI