• 中国计算机学会会刊
  • 中国科技核心期刊
  • 中文核心期刊

Computer Engineering & Science ›› 2025, Vol. 47 ›› Issue (11): 2038-2044.

• Artificial Intelligence and Data Mining • Previous Articles     Next Articles

Environmental sound classification based on spatial attention mechanism and multi-feature data enhancement

LIU Xiang,LI Chuankun,GUO Jinming,LIU Yu#br#   

  1. (1.School of Information and Communication Engineering,North University of China,Taiyuan 030051;
    2.School of Mechanical and Electrical Engineering,China University of Mining and Technology,Xuzhou  221116;
    3.Shanxi Institute of Technology,Yangquan 045000,China)
  • Received:2024-03-13 Revised:2024-08-08 Online:2025-11-25 Published:2025-12-08

Abstract: To address the issues of low signal-to-noise ratio (SNR) in dataset samples and insufficient feature representation capability of Log-Mel spectrograms in environmental sound classification (ESC) tasks, this paper proposes an improved model for environmental sound classification based on high- and low-frequency separation. Firstly, phase spectrum is incorporated as a supplement to Log-Mel spectrograms in the input features, constructing a multi-feature parameter input comprising phase, Log-Mel, and  spectrogram spectrum, thereby enhancing the expressive power of the input features. Secondly, an attention mechanism is added to the input section of the neural network to improve its resistance to noise interference, enhancing the network’s robustness and generalization capability. Experiments demonstrate that this proposed model effectively improves the recognition accuracy of environmental sounds, achieving classification accuracies of 97.25%, 89.00%, and 83.45% on ESC10, ESC50, and UrbanSound8K datasets, respectively. Compared to the original model, the accuracy improvements are 2.25%, 10.50%, and 2.22%, respectively.

Key words: environmental sound classification, multi-feature, attention mechanism