• 中国计算机学会会刊
  • 中国科技核心期刊
  • 中文核心期刊

计算机工程与科学 ›› 2021, Vol. 43 ›› Issue (04): 586-593.

• 高性能计算 • 上一篇    下一篇

基于Soft-NMS的候选框去冗余加速器设计

李景琳,姜晶菲,窦勇,许金伟,温冬   

  1. (国防科技大学计算机学院,湖南 长沙  410073)
  • 收稿日期:2020-06-11 修回日期:2020-07-21 接受日期:2021-04-25 出版日期:2021-04-25 发布日期:2021-04-21
  • 基金资助:
    国家核高基重大专项(2018ZX01028101)

A redundacy-reduced candidate box accelerator based on soft-non-maximum suppression

LI Jing-lin,JIANG Jing-fei,DOU Yong,XU Jin-wei,WEN Dong   

  1. (College of Computer Science and Technology,National University of Defense Technology,Changsha 410073,China)

  • Received:2020-06-11 Revised:2020-07-21 Accepted:2021-04-25 Online:2021-04-25 Published:2021-04-21

摘要: 目标检测任务通常使用非极大值抑制算法(NMS)删除卷积神经网络输出的冗余候选框。Soft-NMS使用逐步衰减候选框得分值的方法代替Hard-NMS中直接删除大于预定义阈值候选框的方法,可以避免误删图像中重叠的目标候选框,提高目标检测任务的准确率。但是,频繁地改变候选框得分值使得Soft-NMS较Hard-NMS更为复杂,为了实现高准确率、低延时、低功耗的候选框去冗余效果,提出一种基于Soft-NMS的体系结构,利用对数函数优化复杂的浮点计算,细粒度流水和粗粒度并行组成2级优化结构进一步提升算法的吞吐率。在XILINX KU-115 FPGA开发板上对该体系结构进行了评估,评估结果表明,该体系结构的功耗为6.107 W,处理992个候选框的延时为168.95 μs,与CPU实现的Soft-NMS相比,该体系结构实现了36倍的性能提升,性能功耗比为CPU实现的264倍。

关键词: 可重构计算, 目标检测, 非极大值抑制

Abstract: Object detection tasks usually use the non-maximum suppression algorithm (NMS) to remove redundant candidate boxes of convolutional neural network's outputs. Soft-NMS uses the method of gradually attenuating the score of candidate box to replace the method of directly deleting the candidate box larger than the predefined threshold in Hard-NMS, which can avoid deleting the overlapping object in the picture by mistake and improve the accuracy of the object detection task. However, the frequent change of candidate box score makes Soft-NMS more complex than Hard-NMS. In order to achieve high accurate, low-delay and low-power candidate box redundancy removals, this paper proposes a Soft-NMS based architecture, which uses logarithmic functions to optimize complex floating-point calculations and a two-level optimization structure with fine-grained flow and coarse-grained parallelism to improve the throughput of the algorithm. Experiments on Xilinx KU-115 FPGA show that our power consumption is 6.107 W, and the delay of processing 1000 boxes is 168.95μs. Compared with the Soft-NMS implemented by the CPU, the architecture achieves 36 times performance improvement and the performance power consumption ratio is 264 times that of CPU implementation. 


Key words: reconfigurable computing, object detection, non-maximum suppression