基于Soft-NMS的候选框去冗余加速器设计

计算机工程与科学 ›› 2021, Vol. 43 ›› Issue (04): 586-593.

基于Soft-NMS的候选框去冗余加速器设计

李景琳，姜晶菲，窦勇，许金伟，温冬

（国防科技大学计算机学院,湖南长沙 410073）

收稿日期:2020-06-11 修回日期:2020-07-21 接受日期:2021-04-25 出版日期:2021-04-25 发布日期:2021-04-21
基金资助:
国家核高基重大专项（2018ZX01028101）

A redundacy-reduced candidate box accelerator based on soft-non-maximum suppression

LI Jing-lin,JIANG Jing-fei,DOU Yong,XU Jin-wei,WEN Dong

（College of Computer Science and Technology,National University of Defense Technology,Changsha 410073,China）

Received:2020-06-11 Revised:2020-07-21 Accepted:2021-04-25 Online:2021-04-25 Published:2021-04-21

摘要/Abstract

摘要： 目标检测任务通常使用非极大值抑制算法（NMS）删除卷积神经网络输出的冗余候选框。Soft-NMS使用逐步衰减候选框得分值的方法代替Hard-NMS中直接删除大于预定义阈值候选框的方法，可以避免误删图像中重叠的目标候选框，提高目标检测任务的准确率。但是，频繁地改变候选框得分值使得Soft-NMS较Hard-NMS更为复杂，为了实现高准确率、低延时、低功耗的候选框去冗余效果，提出一种基于Soft-NMS的体系结构，利用对数函数优化复杂的浮点计算，细粒度流水和粗粒度并行组成2级优化结构进一步提升算法的吞吐率。在XILINX KU-115 FPGA开发板上对该体系结构进行了评估，评估结果表明，该体系结构的功耗为6.107 W，处理992个候选框的延时为168.95 μs,与CPU实现的Soft-NMS相比，该体系结构实现了36倍的性能提升,性能功耗比为CPU实现的264倍。

关键词: 可重构计算, 目标检测, 非极大值抑制

Abstract: Object detection tasks usually use the non-maximum suppression algorithm (NMS) to remove redundant candidate boxes of convolutional neural network's outputs. Soft-NMS uses the method of gradually attenuating the score of candidate box to replace the method of directly deleting the candidate box larger than the predefined threshold in Hard-NMS, which can avoid deleting the overlapping object in the picture by mistake and improve the accuracy of the object detection task. However, the frequent change of candidate box score makes Soft-NMS more complex than Hard-NMS. In order to achieve high accurate, low-delay and low-power candidate box redundancy removals, this paper proposes a Soft-NMS based architecture, which uses logarithmic functions to optimize complex floating-point calculations and a two-level optimization structure with fine-grained flow and coarse-grained parallelism to improve the throughput of the algorithm. Experiments on Xilinx KU-115 FPGA show that our power consumption is 6.107 W, and the delay of processing 1000 boxes is 168.95μs. Compared with the Soft-NMS implemented by the CPU, the architecture achieves 36 times performance improvement and the performance power consumption ratio is 264 times that of CPU implementation.

Key words: reconfigurable computing, object detection, non-maximum suppression

李景琳, 姜晶菲, 窦勇, 许金伟, 温冬. 基于Soft-NMS的候选框去冗余加速器设计[J]. 计算机工程与科学, 2021, 43(04): 586-593.

LI Jing-lin, JIANG Jing-fei, DOU Yong, XU Jin-wei, WEN Dong. A redundacy-reduced candidate box accelerator based on soft-non-maximum suppression[J]. Computer Engineering & Science, 2021, 43(04): 586-593.

[1]	戴康佳, 徐慧英, 朱信忠, 黄晓, 李琛, 刘巍, 曹雨淇, 王拔龙, 刘子洋, 陈国强. 基于轻量化目标检测网络的RGB-D视觉SLAM系统[J]. 计算机工程与科学, 2024, 46(11): 2017-2026.
[2]	曹雨淇, 徐慧英, 朱信忠, 黄晓, 陈晨, 周思瑜, 盛轲. 基于YOLOv8改进的打架斗殴行为识别算法：EFD-YOLO[J]. 计算机工程与科学, 2024, 46(10): 1825-1834.
[3]	刘子洋, 徐慧英, 朱信忠, 李琛, 王泽宇, 曹雨淇, 戴康佳. Bi-YOLO：一种基于YOLOv8n改进的轻量化目标检测算法[J]. 计算机工程与科学, 2024, 46(08): 1444-1454.
[4]	陈晨, 徐慧英, 朱信忠, 黄晓, 宋杰, 曹雨淇, 周思瑜, 盛轲. 基于YOLOv8 改进的室内行人跌倒检测算法FDW-YOLO[J]. 计算机工程与科学, 2024, 46(08): 1455-1465.
[5]	张永智, 何可人, 戈珏. 改进YOLOv7网络在低空遥感图像目标检测中的应用[J]. 计算机工程与科学, 2024, 46(07): 1269-1277.
[6]	王泽宇, 徐慧英, 朱信忠, 李琛, 刘子洋, 王子奕. 基于YOLOv8改进的密集行人检测算法：MER-YOLO[J]. 计算机工程与科学, 2024, 46(06): 1050-1062.
[7]	胡昭华, 王长富, . 改进Faster R-CNN的遥感图像小目标检测算法[J]. 计算机工程与科学, 2024, 46(06): 1063-1071.
[8]	赵金源, 贾迪. 改进YOLOv5的多人姿态估计修正算法[J]. 计算机工程与科学, 2024, 46(05): 852-860.
[9]	黄珍伟, 陈伟, 王文杰, 路锦通. 基于改进 RetinaNet网络的水下机器人目标检测与实验[J]. 计算机工程与科学, 2024, 46(02): 264-271.
[10]	江志鹏, 王自全, 张永生, 于英, 程彬彬, 赵龙海, 张梦唯. 基于改进Deformable DETR的无人机视频流车辆目标检测算法[J]. 计算机工程与科学, 2024, 46(01): 91-101.
[11]	张骞, 陈紫强, 孙宗威, 赖镜安. 融合高分辨率网络的雾天目标检测算法[J]. 计算机工程与科学, 2023, 45(11): 1970-1981.
[12]	赵玥, 肖梦燕, 邱宝军, 罗军, 王小强, 罗道军. 基于机器视觉的集成电路声扫图像缺陷检测软件设计[J]. 计算机工程与科学, 2023, 45(10): 1806-1813.
[13]	刘浩翰, 孙铖, 贺怀清, 惠康华. 基于改进YOLOv3的金属表面缺陷检测[J]. 计算机工程与科学, 2023, 45(07): 1226-1235.
[14]	李校林, 王复港, 张鹏飞, 张琳玉, . 基于多尺度特征提取的YOLOv5s算法优化[J]. 计算机工程与科学, 2023, 45(06): 1054-1062.
[15]	邓姗姗, 黄慧, 马燕. 基于改进Faster R-CNN的小目标检测算法[J]. 计算机工程与科学, 2023, 45(05): 869-877.