• 中国计算机学会会刊
  • 中国科技核心期刊
  • 中文核心期刊

计算机工程与科学 ›› 2022, Vol. 44 ›› Issue (02): 191-198.

• 高性能计算 • 上一篇    下一篇

面向大规模图计算的连通分量算法分析与优化

白皓1,4,甘新标1,杨文祥2,贾孟涵1,涂旭平3,张一鸣1,郭敏1,来乐1,张意4,朱春平4   

  1. (1.国防科技大学计算机学院,湖南 长沙 410073;
    2.中国空气动力研究与发展中心计算空气动力研究所,四川 绵阳  621000;
    3.黄冈师范学院计算机学院,湖北 黄冈 438000;4.中国人民解放军66061部队,北京 100144)

  • 收稿日期:2021-02-04 修回日期:2021-09-14 接受日期:2022-02-25 出版日期:2022-02-25 发布日期:2022-02-17
  • 基金资助:
    国家数值风洞项目(NNW2019ZT6-B21);国家自然科学基金(61772541,61872376,61932001);国家重点研发计划(2018YFB0204301);湖南省自然科学基金(2020JJ4669); 并行分布处理实验室基金(6142110190206)

Analysis  and optimization of the connected component algorithm for large-scale graph computing

BAI Hao1,4,GAN Xin-biao1,YANG Wen-xiang2,JIA Meng-han1,TU Xu-ping3,ZHANG Yi-ming1,GUO Min1,LAI Le1,ZHANG Yi4,ZHU Chun-ping4#br#

#br#
  

  1. (1.College of Computer Science and Technology,National University of Defense Technology,Changsha 410073;

    2.Institute of Computational Aerodynamics,China Aerodynamics Research and Development Center,Mianyang 621000;

    3.College of Computer Science,Huanggang Normal University,Huanggang 438000;

    4.Troop  66061 of PLA,Beijing 100144,China)
  • Received:2021-02-04 Revised:2021-09-14 Accepted:2022-02-25 Online:2022-02-25 Published:2022-02-17

摘要: 近年来,图计算在诸多领域发挥着越来越重要的作用。连通分量算法是图计算的重要基础算法,可以应用于可达性查询、一致性检测等众多场景。面向大规模图遍历Graph500标准测试,对连通分量算法进行了算法和数据结构优化。主要有以下创新:(1)对并查集提出了捷径向量算法,并测试了算法和数据结构的配合程度;
(2)利用多线程迭代轮转对算法实现并行加速;(3)从多个维度比较了不同实现方法的优缺点。基于优化方法,对性能进行了评估分析,当scale=25(包含225个节点)时,捷径向量算法对基于二维向量和链表的按秩合并算法的加速比分别是1.38倍和1.40倍,对BFS和DFS的加速比分别为4.76倍和4.70倍,且空间占用为该2
个算法的4.1%~4.6%,此外,并行对串行的加速比为1.57倍。


关键词: 图计算, 图遍历, 连通分量算法, Graph500, 捷径向量算法

Abstract: In recent years, graph computing has played an increasingly important role in many fields. The connected component algorithm is an important basic algorithm for graph computing, which can be applied to multiple scenarios such as reachability query. This paper is oriented to the large-scale graph traversal Graph500 standard test, and optimizes the connected component algorithm and data structure. The main innovations are as follows: (1) A shortcutting-vector algorithm is proposed for the union-find set, and the degree of coordination between the algorithm and the data structure is tested. (2) The multi-threaded iterative rotation algorithm is used to achieve the parallel acceleration of the algorithm. (3) The advantages and disadvantages of different implementation methods are compared from multiple dimensions. 
Based on the optimization method, the performance is evaluated and analyzed. When scale =25 (including 225 vertices), the speedup ratio of the shortcutting-vector algorithm to the rank-merging algorithm based on two-dimensional vectors and linked lists is 1.38 times and 1.40 times, respectively. The speedup ratios of BFS and DFS are 4.76 times and 4.70 times respectively, and the space occupation is 4.1%~4.6% of the two algorithms. In addition, the speedup ratio of parallel to serial is 1.57 times. 

Key words: graph computing, graph traversal, connected component algorithm, Graph500, shortcutting-vector algorithm