基于CombBLAS的同辈压力图聚类并行算法的设计与实现

计算机工程与科学

基于CombBLAS的同辈压力图聚类并行算法的设计与实现

邹佩钢1,2，陈军1

（1.北京应用物理与计算数学研究所,北京 100088;2.中国工程物理研究院研究生院,北京 100088）

收稿日期:2016-08-26 修回日期:2016-10-21 出版日期:2017-03-25 发布日期:2017-03-25
基金资助:
国家自然科学基金（61672003）

Design and implementation of a parallel peer pressure

clustering algorithm based on CombBLAS

ZOU Pei-gang1,2，CHEN Jun1

(1.Institute of Applied Physics and Computational Mathematics,Beijing 100088;

2.Graduate School,China Academy of Engineering Physics,Beijing 100088,China）

Received:2016-08-26 Revised:2016-10-21 Online:2017-03-25 Published:2017-03-25

摘要/Abstract

摘要：

图聚类是指把图中相对连接紧密的顶点及其相关的边分组形成一个子图的过程，在包括机器学习、数据挖掘、模式识别、图像分析及生物信息等领域有着广泛应用。但是，随着大数据时代的到来，图数据海量增长。面对广泛的大规模图计算需求，由于图结构本身的不规则性，单机算法运行效率低下，用传统的并行计算方法进行图计算难以获得高性能。使用线性代数的方法在Combinatorial BLAS上实现了同辈压力(Peer Pressure)图聚类的分布式算法，首先将该图聚类的算法转换为对稀疏矩阵的运算，从而结构化表示图的不规则数据结构及接入模式，然后基于MPI 编程模型将其并行实现。实验结果表明，在并行处理规模达到43亿的由稀疏矩阵表示的超大规模图时，基于线性代数表示的同辈压力图聚类算法在曙光超级计算机上取得了较高的并行性能及良好的可扩展性，在64个核上获得了40.1的并行加速。

关键词: 图计算, 同辈压力聚类, 并行, Combinatorial BLAS, 稀疏矩阵, 大规模图, MPI

Abstract:

Graph clustering is a problem of determining natural groups with high connectivity in a graph. This can be useful in fields such as machine learning, data mining, pattern recognition, image analysis and bioinformatics. To meet the graph-theoretic analysis demands of emerging“big data” applications, it is essential to speed up the underlying graph problems of current parallel systems. However, it is difficult to parallelize large-scale graph computation and achieve good performance using traditional approaches due to their irregular graph structure and low operation intensity. We implement a scalable distributed-memory algorithm for peer pressure graph clustering using the sparse matrix infrastructure in Combinatorial BLAS. We first convert the peer pressure graph clustering algorithm to sparse matrix computation, which allows irregular data structures and access patterns in parallel applications to be represented and can efficiently address the graph parallel challenge. Finally, the proposed algorithm is parallelized based on the MPI programming model. Experiments show that when the scale of the graph represented by a sparse matrix is up to 4.3 billion, the parallel peer pressure clustering algorithm based on linear algebraic has high performance and is well scalable on the Dawning Supercomputer, and the speedup can be up to 40.1x when the number of core scales to 64.

Key words: graph computation, peer pressure clustering, parallel, Combinatorial BLAS, sparse matrix, large-scale graph, MPI

邹佩钢1,2，陈军1. 基于CombBLAS的同辈压力图聚类并行算法的设计与实现[J]. 计算机工程与科学.

ZOU Pei-gang1,2，CHEN Jun1.

Design and implementation of a parallel peer pressure

clustering algorithm based on CombBLAS

[J]. Computer Engineering & Science.

编辑推荐

Metrics

阅读次数

全文

222

HTML			PDF

最新录用	在线预览	正式出版	最新录用	在线预览	正式出版
0	0	0	222	0	0

来源	本网站	其他网站

次数	190	32
比例	86%	14%

摘要

181

最新录用	在线预览	正式出版

181	0	0

	来源	本网站

	次数	181
	比例	100%

[1]	杨航, 山蕊, 杨坤, 崔馨月. 基于动态自重构结构的3D-HEVC帧内预测算法并行化实现[J]. 计算机工程与科学, 2024, 46(11): 1931-1939.
[2]	黄山, 吴煜凡, 吕鹤轩, 段晓东, . 异构微差同步并行训练算法[J]. 计算机工程与科学, 2024, 46(11): 1949-1959.
[3]	张宗茂, 董德尊, 王子聪, 常俊胜, 张晓云, 王绍聪. 基于便笺式存储器的向量化SpMV算法的性能评估与分析[J]. 计算机工程与科学, 2024, 46(09): 1521-1528.
[4]	李胜国, 廖霞, 于恒彪, 黄春, 姜浩, 逯喜燕, 王华林, 成礼智. 面向结构矩阵的可扩展并行矩阵乘算法框架[J]. 计算机工程与科学, 2024, 46(09): 1529-1538.
[5]	周智, 高建花, 计卫星. 基于FPGA和行折叠的稀疏矩阵向量乘优化[J]. 计算机工程与科学, 2024, 46(08): 1340-1348.
[6]	代长威, 孔瑞林, 季哲, . 面向离散粒子多尺度分析CPU/GPU架构的并行近邻搜索算法[J]. 计算机工程与科学, 2024, 46(08): 1349-1360.
[7]	姜晶菲, 何源宏, 许金伟, 许诗瑶, 钱希福. NM-SpMM：面向国产异构向量处理器的半结构化稀疏矩阵乘算法[J]. 计算机工程与科学, 2024, 46(07): 1141-1150.
[8]	施禹, 董攀, 张利军. 一种不规则稀疏矩阵的SpMV方法[J]. 计算机工程与科学, 2024, 46(07): 1175-1184.
[9]	杨仕琦, 武优西, 耿萌, 李艳. 一次性条件下的三支序列模式挖掘[J]. 计算机工程与科学, 2024, 46(07): 1286-1295.
[10]	郭宸良, 阎少宏, 宗晨琪. 线云隐私攻击算法的并行加速研究[J]. 计算机工程与科学, 2024, 46(04): 615-625.
[11]	肖调杰, 周峰, 郑翾宇, 刘剑, 陈琳, 刘杰, 易明宽, 陈旭光, 龚春叶, 杨博, 甘新标, 李胜国, 左克, . 大规模三维频率域电磁积分方程法数值模拟[J]. 计算机工程与科学, 2023, 45(11): 1901-1910.
[12]	王继昌, 吕高锋, 刘忠沛, 杨翔瑞. 基于数据处理器的QUIC加密/解密卸载[J]. 计算机工程与科学, 2023, 45(11): 1960-1969.
[13]	吴超, 卫谦, 周俊伟, 李会民, 孙广中. 基于异构计算平台的背景噪声预处理并行算法[J]. 计算机工程与科学, 2023, 45(10): 1711-1719.
[14]	王鑫, 彭健. 基于HYB格式SpMV在新一代申威架构上的实现与优化[J]. 计算机工程与科学, 2023, 45(10): 1754-1762.
[15]	朱文龙, 江嘉治, 黄聃, 肖侬. ParM:基于国产处理器的异构并行编程模型[J]. 计算机工程与科学, 2023, 45(09): 1521-1531.