• 中国计算机学会会刊
  • 中国科技核心期刊
  • 中文核心期刊

Computer Engineering & Science ›› 2025, Vol. 47 ›› Issue (9): 1544-1554.

• High Performance Computing • Previous Articles     Next Articles

Optimization of ILU decomposition parallel algorithm on MIMD many-core architecture

SHI Yongzhen1,2,MO Haotian1,2,HU Xingyu1,2,LIU Jie1,2,WANG Qinglin1,2   

  1. (1.Laboratory of Digitizing Software for Frontier Equipment,National University of Defense Technology,Changsha 410073;
    2.National Key Laboratory of Parallel and Distributed Computing,
    National University of Defense Technology,Changsha 410073,China)
  • Received:2024-05-21 Revised:2024-09-15 Online:2025-09-25 Published:2025-09-22

Abstract: ILU (Incomplete LU) factorization is widely used in solving large-scale sparse linear systems. It can effectively reduce the number of iterations and improve solving efficiency. However, due to the data dependence of linear systems and the irregularity of computation and memory access during the decomposition process, it is difficult to perform efficient parallel optimization. In the multiple instruction  multiple data (MIMD) many-core architecture, numerous parallel computing threads can execute different instructions, which has a natural adaptability to algorithms with irregular control flow. This paper conducts research on the parallel algorithm optimization of ILU factorization  on the MIMD many-core architecture PEZY-SC3s processor, proposes an ILU parallel algorithm for the MIMD architecture, and adopts measures such as graph coloring-based parallelism optimization, vector unit-based memory access optimization, thread grouping-based load balancing optimization, and on-chip local storage-based data locality optimization to optimize the algorithm performance. Experimental results show that the proposed ILU parallel factorization  algorithm achieves an average speedup of 16.70 and 1.39 compared with the MKL implementation on Intel Xeon 4314 CPU and the cuSPARSE implementation on NVIDIA A30 GPU, respectively.

Key words: incomplete LU factorization, MIMD many-core architecture, parallel computing