• 中国计算机学会会刊
  • 中国科技核心期刊
  • 中文核心期刊

Computer Engineering & Science

Previous Articles     Next Articles

A combined storage structure for
image processing algorithms on GPU

ZUO Xian-yu1,2,ZHANG Zhe1,5,HUANG Xiang-zhi4,5,GE Qiang1,2,ZHANG Li-tao3,ZANG Wen-qian4,5   

  1. (1.Henan Key Laboratory of Big Data Analysis and Processing,Kaifeng 475004;
    2.Institute of Data and Knowledge Engineering,
    College of Computer and Information Engineering,Henan University,Kaifeng 475004;
    3.College of Science,Zhengzhou University of Aeronautics,Zhengzhou 450015;
    4.Aerospace Information Research Institute,Chinese Academy of Sciences,Beijing 100094;
    5.Zhongke Langfang Institute of Spacial Information Application,Langfang 065000,China)

     
  • Received:2019-08-13 Revised:2019-11-05 Online:2020-02-25 Published:2020-02-25

Abstract:

Most image processing algorithms optimized by GPU can achieve better performance, but the scheduling strategy between data transmission and kernel execution is still the main bottleneck for further improvement in efficiency. To solve this problem, streams are usually used to overlap data transmission and kernel execution, in order to hide some of the data transmission and kernel execution time. However, due to the characteristics of the CUDA programming model and the limitations of GPU resources at hardware level, operations are still serialized when there are so many operations to be execute, even if numerous streams are created. In this paper, a new data storage structure, named Combined Storage Structure (CSS), is proposed, which improves the performance by merging small data transmissions on the single stream into a large one to reduce the fixed cost and the call gap of the operations of data transmission and kernel execution. Experimental results show that CSS can not only improve the performance of GPU-based image processing algorithms in the case of single stream, but also improve the acceleration performance in the case of multiple streams. CSS has good practicability and scalability, and it is suitable for the image processing operations that contain more operators or a large number of small-scale images. In addition, the proposed method provides a new research idea for GPU acceleration of image processing algorithms.

 

Key words: image processing, GPU, CUDA stream;Combined Storage Structure (CSS), overlap