• 中国计算机学会会刊
  • 中国科技核心期刊
  • 中文核心期刊

Computer Engineering & Science ›› 2026, Vol. 48 ›› Issue (4): 743-751.

• Artificial Intelligence and Data Mining • Previous Articles     Next Articles

Sound event detection & localization based on saliency detector and decay mask self-attention module

WANG Chunli,CHEN Shanli,LIU Suqian,ZHAO Xiaochun   

  1. (1.School of Electronic and Information Engineering,Lanzhou Jiaotong University,Lanzhou 730070;
    2.Department of Rehabilitation Medicine,Gansu Provincial Maternity and Child-Care Hospital 
    (Gansu Provincial Central Hospital),Lanzhou 730050,China)
  • Received:2024-05-21 Revised:2024-08-16 Online:2026-04-25 Published:2026-04-30

Abstract: A novel acoustic module is proposed, which combines a saliency detector with multi-head self-attention equipped with a decay mask. This model aids in better focusing on spatial information when performing sound event localization & detection tasks. By utilizing the saliency detector to concentrate on highly salient regions within local information, the model pays more attention to categories with rich information content. Secondly, a decay mask is introduced into the multi-head self-attention module, enabling the model to focus more on local information. Additionally, adaptive constraints are incorporated to diversify the attention heads. Experimental results demonstrate that the proposed model outperforms the baseline models. When compared with models that fuse Transformer and Multi-scale architectures, the proposed model exhibits superior detection  & localization performance. Finally, lev- eraging video information as additional data to enhance performance, the model demonstrates excellent overall capabilities.

Key words: sound event detection &, localization;saliency detector;multi-head self-attention;adaptive constrained decay mask