[1] |
Zhang He, Chen Kesong. Design and implementation of sparse matrix vector multiplication on FPGA[J].Application Research of Computers,2014,31(6):17561759.(in Chinese)
|
[2] |
Shan Y,Wu T,Wang Y,et al.FPGA and GPU implementation of large scale SpMV[C]∥Proc of IEEE 8th Symposium on Application Specific Processors (SASP’10),2010:6770.
|
[3] |
Boyer B,Dumas JG,Giorgi P.Exact sparse matrixvector multiplication on GPU’s and multicore architectures[C]∥Proc of the 4th International Workshop on Parallel and Symbolic Computation (PASCO’10),2010:8088.
|
[4] |
Ohshima S,Kise K,Katagiri T,et al.Parallel processing of matrix multiplication in a CPU and GPU heterogeneous environment[C]∥Proc of the 7th International Meeting on High Performance Computing for Computational Science (VECPAR’06),2006:4150.
|
[5] |
Bell N,Garland M.Efficient sparse matrixvector multiplication on CUDA:NVIDIA Technical Report NVR2008004[R].[S.l:s.n],2008.
|
[6] |
Qin Jin,Gong Chunye,Hu Qingfeng,et al.Optimization of sparse diagonal matrixvector multiplication based on CUDA program model[J].Computer Engineering & Science,2012,32(7):7883.(in Chinese)
|
[7] |
Bai Hongtao,Ouyang Dantong,Li Ximing.Optimizing of sparse matrixvector multiplicaiton based on GPU[J].Computer Science,2010,37(8):168172.(in Chinese)
|
[8] |
Buatois L,Caumon G,Levy B,Concurrent number cruncher:A GPU implementation of a general sparse linear solver[J].Int J Parallel Emerg Distrib Syst,2009,24(3):205223.
|
[9] |
Yuan E,Zhang Yunquan,Liu Fangfang,et al.Automatic performance tuning of sparse matrixvector multiplication:Implementation techniques and its application research[J].Journal of Computer Research and Development,2009,46(7):11171126.(in Chinese)
|
[10] |
Belgin M,Back G,Ribbens C J.Pattern based sparse matrix representation for memoryefficient SMVM kernels[C]∥Proc of the 23rd International Conference on Supercomputing,2009:100109.
|
[11] |
Monakov A, Lokhmotov A, Avetisyan A. Automatically tuning sparse matrix vector multiplication for GPU architectures[M]∥High Performance Embedded Architectures and Compilers.Berlin:Springer Berlin Heidelberg,2010:111125.
|
[12] |
Yang Wangdong,Li Kenli,Shi Lin.A quasidiagonal matrix hybrid compression algorithm and implementation for SpMV on GPU[J].Computer Science,2014,41(7):290296.(in Chinese)
|
[13] |
Williams S,Oliker L,Vuduc R W,et al.Optimization of sparse matrixvector multiplication on emerging multicore platforms:IBM Research Report RC24704(W0812047)[R].[S.l:s.n],2008.
|
[14] |
Baskaran M M,Bordawekar R.Optimizing sparse matrixvector multiplication on GPUs:IBM Research Report RC24704[R].RC24704 New York:IBM Corporation,2009.
|
[15] |
Wang Wei,Chen Jianping,Zeng Guosun,et al.Optimization of parallel principal rigenvectors computing for largescale sparse matrixes[J].Journal of Frontiers of Computer Science and Technology,2012,6(2):118124.(in Chinese)
|
[16] |
Deng Lin,Dou Yong,Zheng Yi.Memory access behavior characteristicsoriented cache partition for SpMV[J].Computer Engineering & Science,2012,32(9):6469.(in Chinese)
|
[17] |
Computing Developer Home Page[EB/OL].[20150317].http://developer.nvidia.com/object/gpucomputing.html .
|
[18] |
Chapman B,Jost G,Van Der Pas R.Using OpenMP:Portable shared memory parallel programming (Vol.10)[M].Cambridge:The MIT Press,2008.
|
[19] |
The University of Florida Sparse Matrix Collection[EB/OL].[20150312].http://www.cise.ufl.edu/research/sparse/matrices/groups.html.
|
[20] |
NVIDIA Corporation.The NVIDIA CUDA sparse matrix library (cuSPARSE)[EB/OL].[20150312].http://developer.nvidia.com/cuda/cusparse.
|
|
附中文参考文献:
|
[1] |
张禾,陈客松.基于FPGA 的稀疏矩阵向量乘的设计研究[J].计算机应用研究,2014,31(6):17561759.
|
[6] |
秦晋,龚春叶,胡庆丰,等.基于CUDA编程模型的稀疏对角矩阵向量乘优化[J].计算机工程与科学,2012,32(7):7883.
|
[7] |
白洪涛,欧阳丹彤,李熙铭.基于GPU的稀疏矩阵向量乘优化[J].计算机科学,2010,37(8):168172.
|
[9] |
袁娥,张云泉,刘芳芳,等.SpMV的自动性能优化实现技术及其应用研究[J].计算机研究与发展,2009,46(7):11171126.
|
[12] |
阳王东,李肯立,石林.一种准对角矩阵的混合压缩算法及其与向量相乘在GPU上的实现[J].计算机科学,2014,41(7):290296.
|
[15] |
王伟,陈建平,曾国荪.大规模稀疏矩阵的主特征向量计算优化方法[J].计算机科学与探索,2012,6(2):118124.
|
[16] |
邓林,窦勇,郑义.面向稀疏矩阵访存特性的Cache划分[J].计算机工程与科学,2012,32(9):6469.
|