• 中国计算机学会会刊
  • 中国科技核心期刊
  • 中文核心期刊

计算机工程与科学 ›› 2025, Vol. 47 ›› Issue (11): 2038-2044.

• 人工智能与数据挖掘 • 上一篇    下一篇

基于空间注意力机制和多特征数据增强的环境声分类

刘翔,李传坤,郭锦铭,刘宇   

  1. (1.中北大学信息与通信工程学院,山西 太原 030051;
    2.中国矿业大学机电工程学院,江苏 徐州 221116;3.山西工程技术学院,山西 阳泉 045000)

  • 收稿日期:2024-03-13 修回日期:2024-08-08 出版日期:2025-11-25 发布日期:2025-12-08
  • 基金资助:
    国家自然科学基金(62101512)

Environmental sound classification based on spatial attention mechanism and multi-feature data enhancement

LIU Xiang,LI Chuankun,GUO Jinming,LIU Yu#br#   

  1. (1.School of Information and Communication Engineering,North University of China,Taiyuan 030051;
    2.School of Mechanical and Electrical Engineering,China University of Mining and Technology,Xuzhou  221116;
    3.Shanxi Institute of Technology,Yangquan 045000,China)
  • Received:2024-03-13 Revised:2024-08-08 Online:2025-11-25 Published:2025-12-08

摘要: 针对环境声分类任务中,数据集样本信噪比低、对数梅尔谱(Log-Mel)谱图特征表达能力不足等问题,提出了一种基于高低频分离的环境声分类改进模型。首先在输入特征中增加了相位谱作为Log-Mel谱图的补充,构建相位、Log-Mel和时频谱的多特征参数输入,增强了模型输入特征的表达能力;其次在神经网络的输入部分添加注意力机制,提升网络模型的抗噪声干扰能力并提高了网络的鲁棒性和泛化能力。实验表明,所提模型有效地提升了对环境声的识别准确率,在ESC10,ESC50和UrbanSound8K数据集上的分类准确率达到了97.25%,89.00%和83.45%,与原有的模型相比准确率提升了2.25%,10.50%和2.22%。

关键词: 环境声分类, 多特征, 注意力机制

Abstract: To address the issues of low signal-to-noise ratio (SNR) in dataset samples and insufficient feature representation capability of Log-Mel spectrograms in environmental sound classification (ESC) tasks, this paper proposes an improved model for environmental sound classification based on high- and low-frequency separation. Firstly, phase spectrum is incorporated as a supplement to Log-Mel spectrograms in the input features, constructing a multi-feature parameter input comprising phase, Log-Mel, and  spectrogram spectrum, thereby enhancing the expressive power of the input features. Secondly, an attention mechanism is added to the input section of the neural network to improve its resistance to noise interference, enhancing the network’s robustness and generalization capability. Experiments demonstrate that this proposed model effectively improves the recognition accuracy of environmental sounds, achieving classification accuracies of 97.25%, 89.00%, and 83.45% on ESC10, ESC50, and UrbanSound8K datasets, respectively. Compared to the original model, the accuracy improvements are 2.25%, 10.50%, and 2.22%, respectively.

Key words: environmental sound classification, multi-feature, attention mechanism