• 中国计算机学会会刊
  • 中国科技核心期刊
  • 中文核心期刊

Computer Engineering & Science ›› 2023, Vol. 45 ›› Issue (06): 1097-1105.

• Artificial Intelligence and Data Mining • Previous Articles     Next Articles

A sound event localization and detection algorithm based on feature fusion and Transformer  model

PU Zi-jun,ZHANG Shou-ming   

  1. (Faculty of Information Engineering and Automation,
    Kunming University of Science and Technology,Kunming 650500,China)
  • Received:2021-08-02 Revised:2021-12-13 Accepted:2023-06-25 Online:2023-06-25 Published:2023-06-16

Abstract: Aiming at the problem of multi-channel environmental sound detection, a feature fusion network model TBCF-MTNN is proposed, which introduces the Transformer structure. The network structure takes logarithmic Mel-spectrum and generalized cross-correlation spectrum as input. Firstly, the local features of the spectrum and the temporal context relationship features are obtained through CNN and GRU, and then the two feature maps are merged through the Cross-stitch module, which can effectively solve the traditional problem that multi-feature information cannot be shared in the network. Secondly, the fused feature map is sent to Transformer for re-collection of features. Finally the classification and positioning results are output through the full link layer. The verification on TAU-NIGENS 2020 data set show that, compared with the Baseline model, the TBCF-MTNN network can reduce the classification error rate to 0.26 in the sound detection task, and reduce the localization error to 4.7° in the sound source localization task. Compared with Baseline, FPN, EIN and other models, the proposed model has a better recognition effect.

Key words: sound event localization and detection, deep learning, Transformer model, Cross-stitch, feature fusion