• 中国计算机学会会刊
  • 中国科技核心期刊
  • 中文核心期刊

计算机工程与科学

• 高性能计算 • 上一篇    下一篇

基于TI 6678多核DSP的OpenCV并行优化

李津1,罗昕颉2,扈啸1,陈跃跃1   

  1. (1.国防科技大学计算机学院,湖南 长沙 410073;2.北京大学信息科学技术学院,北京 100871)
  • 收稿日期:2017-11-03 修回日期:2018-02-10 出版日期:2018-05-25 发布日期:2018-05-25

OpenCV parallel optimization on TI 6678 DSP

LI Jin1,LUO Xin-jie2,HU Xiao1,CHEN Yue-yue1   

  1. (1.College of Computer,National University of Defense Technology,Changsha 410073;
    2.School of Electronics Engineering and Computer Science,Peking University,Beijing 100871,China)
  • Received:2017-11-03 Revised:2018-02-10 Online:2018-05-25 Published:2018-05-25

摘要:

数字信号处理器(DSP)广泛应用于各类工业领域和军事装备领域,OpenCV是业界通用的开源图像处理算法库,但目前鲜有针对DSP平台的OpenCV移植和优化实现。在TI 6678平台上实现了OpenCV的移植,生成了支持绝大多数OpenCV功能的TI 6678底层支持库。在此基础上,深入分析了一类OpenCV库函数在TI 6678硬件平台运行的计算特征和数据流,提出了一种针对这类OpenCV库函数的优化方法,将TI 6678体系结构支持的DMA和Cache操作与OpenMP并行框架高效结合,实现这类OpenCV库函数在TI 6678芯片上的优化和多核并行。依据本文的方法,优化改造的OpenCV库函数在TI 6678上单核运行性能最多可提升3.6倍,在单核优化基础上并行改造的这类库函数8核加速比达到2.55~7.06。
 

关键词: TMS320C6678, OpenCV, OpenMP, 多核并行

Abstract:

Digital Signal Processing (DSP) is widely used in various industrial fields and military equipment fields. OpenCV is a common open source image processing algorithm library. However, there are few implementations for OpenCV transplantation and optimization on DSP platforms. In this paper, OpenCV is successfully transplanted on a TMS320C6678 DSP platform and generates an underlying support library with most functions preserved. Based on this, we deeply analyze the computational features and data flow of some OpenCV library functions running on this platform. As a result, an optimization method for these OpenCV library functions is proposed. This method combines DMA, Cache operations and OpenMP parallel frameworks, which are supported by TI 6678 architecture. According to this method, we implement the optimization and multi-core parallelism for a class of OpenCV library functions on the TI 6678 chip. With the help of our method, the optimized OpenCV library function running on a single core of TI 6678 chip can be speeded up by up to 3.6 times. On this basis, we parallelize this class of library functions on 8 cores, obtaining the speedup of 2.55 to 7.06.
 

Key words: TMS320C6678, OpenCV, OpenMP, multi-core parallelism