• 中国计算机学会会刊
  • 中国科技核心期刊
  • 中文核心期刊

计算机工程与科学

• • 上一篇    下一篇

基于多编码特征融合的三维目标检测及其并行训练

朱虎明,程新跃,舒德龙,蒋翔宇   

  1. (西安电子科技大学人工智能学院,陕西 西安 710126)
  • 出版日期:2026-03-12 发布日期:2026-03-12

3D Object Detection Algorithm Based on Multi-encoding Feature Fusion and Its Parallel Training

ZHU Huming,CHENG Xinyue,SHU Delong,JIANG Xiangyu   

  1. (School of Artificial Intelligence, Xidian University, Xi'an  710126,China)

  • Online:2026-03-12 Published:2026-03-12

摘要: 针对复杂场景下三维目标检测存在的误检漏检率高和单卡模型训练时间长等问题,本文提出基于多编码特征融合的两阶段算法 (3D object detection based on Multi-encoding Feature Fusion,MFF)。MFF利用三维稀疏卷积体素编码器与多尺度支柱编码器分层提取点云特征,并通过 Transformer 注意力机制捕获全局语义依赖关系。在 nuScenes 数据集上,MFF 实现了 64.7% mAP 与 69.7% NDS,检测精度显著优于同类算法。针对训练效率瓶颈,MFF 采用分布式数据并行(DDP)框架,结合 Ring-Allreduce 通信策略与 Warmup 学习率优化策略,在 64 卡并行训练场景下,相较于 8 卡训练获得了 6.89 倍加速比,训练时间大幅缩短,有效克服了大规模模型训练的资源限制。实验充分验证了 MFF 在检测精度与训练效率上的双重突破性提升,为自动驾驶环境感知提供了高性能与高效率并重的解决方案。

关键词: 三维目标检测;多编码特征融合;并行训练, Transformer;自动驾驶

Abstract: To address the core challenges in 3D object detection under complex scenarios, such as high false negative , false positive rates, , and long training times for single-GPU models, this paper proposes a two-stage algorithm based on Multi-Encoding Feature Fusion (MFF). MFF utilizes a 3D sparse convolutional voxel encoder and a multi-scale pillar encoder to extract point cloud features hierarchically, while capturing global semantic dependencies through a Transformer attention mechanism. On the nuScenes dataset, MFF achieves 64.7% mAP and 69.7% NDS, with detection accuracy significantly outperforming comparable algorithms. To tackle the bottleneck of training efficiency, MFF adopts a distributed data parallel (DDP) framework, combined with a Ring-Allreduce communication strategy and a Warmup learning rate optimization strategy. In a 64-GPU parallel training scenario, it achieves a 6.89× speedup compared to 8-GPU training, greatly shortening the training time and effectively overcoming the resource constraints of large-scale model training. Experiments fully validate the dual breakthrough improvements of MFF in detection accuracy and training efficiency, providing a high-performance and efficient solution for environmental perception in autonomous driving.


Key words: 3D Object Detection, Multi-Encoder Feature Fusion, Parallel Training, Transformer, Autonomous Driving