• 中国计算机学会会刊
  • 中国科技核心期刊
  • 中文核心期刊

Computer Engineering & Science ›› 2022, Vol. 44 ›› Issue (02): 191-198.

Previous Articles     Next Articles

Analysis  and optimization of the connected component algorithm for large-scale graph computing

BAI Hao1,4,GAN Xin-biao1,YANG Wen-xiang2,JIA Meng-han1,TU Xu-ping3,ZHANG Yi-ming1,GUO Min1,LAI Le1,ZHANG Yi4,ZHU Chun-ping4#br#

#br#
  

  1. (1.College of Computer Science and Technology,National University of Defense Technology,Changsha 410073;

    2.Institute of Computational Aerodynamics,China Aerodynamics Research and Development Center,Mianyang 621000;

    3.College of Computer Science,Huanggang Normal University,Huanggang 438000;

    4.Troop  66061 of PLA,Beijing 100144,China)
  • Received:2021-02-04 Revised:2021-09-14 Accepted:2022-02-25 Online:2022-02-25 Published:2022-02-17

Abstract: In recent years, graph computing has played an increasingly important role in many fields. The connected component algorithm is an important basic algorithm for graph computing, which can be applied to multiple scenarios such as reachability query. This paper is oriented to the large-scale graph traversal Graph500 standard test, and optimizes the connected component algorithm and data structure. The main innovations are as follows: (1) A shortcutting-vector algorithm is proposed for the union-find set, and the degree of coordination between the algorithm and the data structure is tested. (2) The multi-threaded iterative rotation algorithm is used to achieve the parallel acceleration of the algorithm. (3) The advantages and disadvantages of different implementation methods are compared from multiple dimensions. 
Based on the optimization method, the performance is evaluated and analyzed. When scale =25 (including 225 vertices), the speedup ratio of the shortcutting-vector algorithm to the rank-merging algorithm based on two-dimensional vectors and linked lists is 1.38 times and 1.40 times, respectively. The speedup ratios of BFS and DFS are 4.76 times and 4.70 times respectively, and the space occupation is 4.1%~4.6% of the two algorithms. In addition, the speedup ratio of parallel to serial is 1.57 times. 

Key words: graph computing, graph traversal, connected component algorithm, Graph500, shortcutting-vector algorithm