一种基于Matrix的QR分解向量化方法

J4 ›› 2016, Vol. 38 ›› Issue (02): 210-216.

一种基于Matrix的QR分解向量化方法

鲁庆男,刘仲

（国防科学技术大学计算机学院,湖南长沙 410073）

收稿日期:2015-02-28 修回日期:2015-05-26 出版日期:2016-02-25 发布日期:2016-02-25
基金资助:
千核级通用微处理器共享存储体系结构研究基金(61472432)

A vectorization method of QR
decomposition based on Matrix

LU Qingnan,LIU Zhong

（College of Computer,National University of Defense Technology,Changsha 410073,China）

Received:2015-02-28 Revised:2015-05-26 Online:2016-02-25 Published:2016-02-25

摘要/Abstract

摘要：

提出一种基于Matrix的Givens旋转的QR分解向量化方法。针对Matrix的体系结构特点,对向量数据访存和计算进行优化,使计算均衡分布到各个向量处理单元；设计双缓冲DMA的数据传输策略,使得内核的计算与DMA数据搬移的时间完全重迭,内核始终处于峰值计算,从而取得最佳的计算效率。实验结果表明,该方法能够取得较高的计算效率和性能加速比。

关键词: QR分解, 向量处理器, Givens旋转, 软件流水

Abstract:

We propose a vectorization method of QR decomposition with Givens rotation on Matrix processors. According to the systematic characteristics of Matrix architecture, the computation tasks are evenly distributed to all vector processing elements by optimizing the memory access to vector data and calculation. We also design a double DMA buffering scheme to smooth the data transfers, which can fully overlap the kernel computation time and the DMA data transfer time so that the kernel computation is always at its peak speed and the best computation efficiency is achieved. Experimental results show that the proposal can achieve higher computation efficiency and performance speedup.

Key words: QR decomposition;vector processor;Givens rotation;software pipeline

鲁庆男,刘仲. 一种基于Matrix的QR分解向量化方法[J]. J4, 2016, 38(02): 210-216.

LU Qingnan,LIU Zhong. A vectorization method of QR
decomposition based on Matrix [J]. J4, 2016, 38(02): 210-216.

编辑推荐

Metrics

阅读次数

全文

257

HTML			PDF

最新录用	在线预览	正式出版	最新录用	在线预览	正式出版
0	0	0	0	0	257

来源	本网站	其他网站

次数	223	34
比例	87%	13%

摘要

最新录用	在线预览	正式出版

0	0	116

[1]	姜晶菲, 何源宏, 许金伟, 许诗瑶, 钱希福. NM-SpMM：面向国产异构向量处理器的半结构化稀疏矩阵乘算法[J]. 计算机工程与科学, 2024, 46(07): 1141-1150.
[2]	刘仲, 李程, 田希, 刘胜, 邓让钰, 钱程东. MVSim：面向VLIW多核向量处理器的快速、可扩展和精确的体系结构模拟器[J]. 计算机工程与科学, 2024, 46(02): 191-199.
[3]	郭艳君，许道云，秦永彬. 基于QR分解重构虚拟样本的人脸识别算法[J]. 计算机工程与科学, 2016, 38(11): 2275-2281.
[4]	周杰1，陈啸洋1，赵建勋2，窦勇1. 大矩阵QR分解的FPGA设计与实现[J]. J4, 2010, 32(10): 34-37.