大矩阵QR分解的FPGA设计与实现
收稿日期: 2009-04-13
修回日期: 2009-07-10
网络出版日期: 2010-09-29
基金资助
国家自然科学基金资助项目(60633050,60833004)
The FPGA Implementation of LargeScale QR Decomposition
Received date: 2009-04-13
Revised date: 2009-07-10
Online published: 2010-09-29
大规模QR分解在信号处理、图像处理、计算结构力学等领域有着广泛的应用。大规模矩阵QR分解主要在高性能并行机上进行运算,目前还没有基于FPGA平台的加速实现。本文在分析快速Givens Rotation QR分解算法特征的基础上,提出并实现了一种细粒度并行QR分解算法,并在Altera StratixII FPGA平台上实现可扩展QR分解线性阵列处理器。相对于单处理单元,该阵列处理器可取得近似线性加速比,显示了良好的可扩展性。在100MHz频率下的性能测试结果表明,相对于2.0GHz的Pentium双核通用微处理器,该阵列处理器可取得19倍的加速比。
周杰1,陈啸洋1,赵建勋2,窦勇1 . 大矩阵QR分解的FPGA设计与实现[J]. 计算机工程与科学, 2010 , 32(10) : 34 -37 . DOI: 10.3969/j.issn.1007130X.2010.
Largescale QR decomposition is widely used in many fields,such as signal processing,large image processing,and computational structure dynamics,and so on. Traditional methods adopt parallel computers to accelerate largescale QR decomposition,which is a computationintensive algorithm. This paper presents a finegrained parallel implementation of Givens Rotation QR decomposition on FPGA. A scalable linear array of processing elements (PEs),which is the core component of our hardware design,is proposed to implement this algorithm. To our knowledge,this is the first FPGAbased implementation of largescale QR decomposition. A total of 15 GRPEs can be integrated into an Altera StratixII EP2S130F1020C5 FPGA.The experimental results show that a speedup up to 19 can be achieved relative to the Pentium Dual CPU.
/
| 〈 |
|
〉 |