[1] |
lvarez-Farré X,Gorobets A,Trias F X.A hierarchical parallel implementation for heterogeneous computing.Application to algebra-based CFD simulations on hybrid supercomputers[J].Computers & Fluids,2021,214:104768.
|
[2] |
Xu Z,Cambier L,Alonso J J,et al.Towards a scalable hierarchical high-order CFD solver[C]∥Proc of AIAA Scitech 2021 Forum,2021:0494.
|
[3] |
Xing L Y,Wang Z S,Ding Z Z,et al.An efficient sparse stiffness matrix vector multiplication using compressed sparse row storage format on AMD GPU[J].Concurrency and Computation:Practice and Experience,2022,34(23):e7186.
|
[4] |
Borrell R, Dosimont D,Garcia-Gasulla M,et al.Heterogeneous CPU/GPU co-execution of CFD simulations on the POWER9 architecture:Application to airplane aerodynamics[J].Future Generation Computer Systems,2020,107(C):31-48.
|
[5] |
O’Hearn K A,Alperen A,Aktulga H M.Performance optimization of reactive molecular dynamics simulations with dynamic charge distribution models on distributed memory platforms[C]∥Proc of the ACM International Conference on Supercomputing,2019:150-159.
|
[6] |
Muhammed T,Mehmood R,Albeshri A,et al.HPC-smart infrastructures:A review and outlook on performance analysis methods and tools[M]∥Smart Infrastructure and Applications.Cham:Springer,2020:427-451.
|
[7] |
刘杰,迟利华,胡庆丰,等.并行计算稀疏矩阵乘以向量的负载平衡算法[J].计算机工程与科学,2006,28(3):76-77.
|
|
Liu Jie,Chi Li-hua,Hu Qing-feng,et al.A load-balancing algorithm for sparse matrix-vector multiplication on parallel computers[J].Computer Engineering & Science,2006,28(3):76-77.
|
[8] |
Bell N,Garland M.Implementing sparse matrix-vector multiplication on throughput-oriented processors[C]∥Proc of the Conference on High Performance Computing Networking,Storage and Analysis,2009:Article No.:18.
|
[9] |
Bell N,Garland M.Efficient sparse matrix-vector multiplication on CUDA:Technical report NVR-2008-004[R].California:NVIDIA Corporation,2008.
|
[10] |
He G X,Chen Q,Gao J Q.A new diagonal storage for efficient implementation of sparse matrix-vector multiplication on graphics processing unit[J].Concurrency and Computation:Practice and Experience,2021,33(13):e6230.
|
[11] |
Sun X Z,Zhang Y Q,Wang T,et al.CRSD:Application specific auto-tuning of SpMV for diagonal sparse matrices[C]∥Proc of European Conference on Parallel Processing,2011:316-327.
|
[12] |
孙相征,张云泉,王婷,等.对角线稀疏矩阵的SpMV自适应性能优化[J].计算机研究与发展,2013,50(3):648-656.
|
|
Sun Xiang-zheng,Zhang Yun-quan,Wang Ting,et al.Auto-tuning of SpMV for diagonal sparse matrices[J].Journal of Computer Research and Development,2013,50(3):648-656.
|
[13] |
Yuan L,Zhang Y Q,Sun X Z,et al.Optimizing sparse matrix vector multiplication using diagonal storage matrix format[C]∥Proc of 2010 IEEE 12th International Conference on High Performance Computing and Communications,2010:585-590.
|
[14] |
Barbieri D, Cardellini V,Fanfarillo A,et al.Three storage formats for sparse matrices on GPGPUs:Technical report RR-15.6[R].Roma:Universita di Roma Tor Vergata,2015.
|
[15] |
顾越,赵银亮.基于RISC-V向量指令的稀疏矩阵向量乘法实现与优化[J].计算机工程与科学,2022,44(1):1-8.
|
|
Gu Yue,Zhao Yin-liang.Implementation and optimization of sparse matrix vector multiplication based on RISC-V vector instruction[J].Computer Engineering & Science,2022,44(1):1-8.
|
[16] |
Gao J Q,Xia Y F,Yin R J,et al.Adaptive diagonal sparse matrix-vector multiplication on GPU[J].Journal of Parallel and Distributed Computing,2021,157(C):287-302.
|
[17] |
Choi J W,Singh A,Vuduc R W.Model-driven autotuning of sparse matrix-vector multiply on GPUs[J].ACM SIGPLAN Notices,2010,45(5):115-126.
|
[18] |
Yang W D,Li K L,Liu Y,et al.Optimization of quasi-diagonal matrix-vector multiplication on GPU[J]. International Journal of High Performance Computing Applications,2014,28(2):183-195.
|
[19] |
阳王东,李肯立,石林.一种准对角矩阵的混合压缩算法及其与向量相乘在GPU上的实现[J].计算机科学,2014,41(7):290-296.
|
|
Yang Wang-dong,Li Ken-li,Shi Lin.Quasi-diagonal matrix hybrid compression algorithm and implementation for SpMV on GPU [J].Computer Science,2014,41(7):290-296.
|
[20] |
Liu W F,Vinter B.CSR5:An efficient storage format for cross-platform sparse matrix-vector multiplication[C]∥Proc of the 29th ACM International Conference on Supercomputing,2015:339-350.
|
[21] |
Yang W D,Li K L,Li K Q.A parallel computing method using blocked format with optimal partitioning for SpMV on GPU[J].Journal of Computer and System Sciences,2018,92(C):152-170.
|
[22] |
Cui H Y,Wang N B,Wang Y H,et al.An effective SpMV based on block strategy and hybrid compression on GPU[J].The Journal of Supercomputing,2022,78(5):6318-6339.
|
[23] |
Bian H D,Huang J Q,Dong R T,et al.A simple and efficient storage format for SIMD-accelerated SpMV[J]. Cluster Computing,2021,24(4):3431-3448.
|
[24] |
Gao J H,Ji W X,Liu J,et al.AMF-CSR:Adaptive multi-row folding of CSR for SpMV on GPU[C]∥Proc of 2021 IEEE 27th International Conference on Parallel and Distri- buted Systems,2021:418-425.
|
[25] |
Davis T A,Hu Y F.The university of Florida sparse matrix collection[J].ACM Transactions on Mathematical Software,2011,38(1):1-25.
|
[26] |
Pikle N K,Sathe S R,Vyavhare A Y.GPGPU-based parallel computing applied in the FEM using the conjugate gradient algorithm:A review[J].Sādhanā,2018,43(7):1-21.
|
[27] |
Elafrou A,Goumas G,Koziris N.Performance analysis and optimization of sparse matrix-vector multiplication on modern multi-and many-core processors[C]∥Proc of 2017 46th International Conference on Parallel Processing,2017:292-301.
|
[28] |
Gao J Q,Chen Q,He G X.A thread-adaptive sparse approximate inverse preconditioning algorithm on multi-GPUs[J].Parallel Computing,2021,101:102724.
|
[29] |
Isotton G,Janna C,Bernaschi M.A GPU-accelerated adaptive FSAI preconditioner for massively parallel simulations[J]. International Journal of High Performance Computing Applications,2022,36(2):153-166.
|