Cholesky分解细粒度并行算法
收稿日期: 2010-03-11
修回日期: 2010-06-19
网络出版日期: 2010-09-02
基金资助
国家自然科学基金资助项目(60633050,60833004)
A FineGrained Parallel Algorithm for the Cholesky Decomposition
Received date: 2010-03-11
Revised date: 2010-06-19
Online published: 2010-09-02
本文提出了一种Cholesky分解细粒度流水线并行算法,该算法可以处理任意规模的数据,可以充分开发FPGA加速器提供的细粒度并行。实验表明,该算法具有很好的可扩展性,在Xilinx XC5VLX330 FPGA上能够集成36个处理单元(PE),当矩阵的阶为16 384、运行频率为200MHz时性能达到14.3 GFLOPS。
关键词: Cholesky分解; 细粒度并行; FPGA
邬贵明,窦勇,王淼 . Cholesky分解细粒度并行算法[J]. 计算机工程与科学, 2010 , 32(9) : 102 -106 . DOI: 10.3969/j.issn.1007130X.2010.
This paper presents a finegrained pipeline parallel algorithm for the Cholesky decomposition, which is applicable to the matrices of arbitrary orders and can exploit finegrained parallelism of the FPGA accelerators. The experimental results show this algorithm has good scalability. 36 processing elements (PEs) can be integrated into a Xilinx XC5VLX330 FPGA, achieving a performance of 14.3 Gflops when the matrix order is 16 384 at the clock speed of 200 MHz.
/
| 〈 |
|
〉 |