基于显著性检测器与衰减掩码自注意力模块的声音事件检测和定位研究

doi:10.3969/j.issn.1007-130X.2026.04.018

计算机工程与科学 ›› 2026, Vol. 48 ›› Issue (4): 743-751.doi: 10.3969/j.issn.1007-130X.2026.04.018

基于显著性检测器与衰减掩码自注意力模块的声音事件检测和定位研究

王春丽,陈善立,刘素倩，赵小春

(1.兰州交通大学电子与信息工程学院，甘肃兰州 730070;
2.甘肃省妇幼保健院（甘肃省中心医院）康复医学科，甘肃兰州 730050)

收稿日期:2024-05-21 修回日期:2024-08-16 出版日期:2026-04-25 发布日期:2026-04-30
基金资助:
内蒙古重点研发及成果转化项目（2023YFSH0043，2023YFDZ0043）；甘肃省重点人才项目和兰州交通大学青年基金项目（LH2019005）

Sound event detection & localization based on saliency detector and decay mask self-attention module

WANG Chunli,CHEN Shanli,LIU Suqian,ZHAO Xiaochun

(1.School of Electronic and Information Engineering,Lanzhou Jiaotong University,Lanzhou 730070;
2.Department of Rehabilitation Medicine,Gansu Provincial Maternity and Child-Care Hospital
(Gansu Provincial Central Hospital),Lanzhou 730050,China)

Received:2024-05-21 Revised:2024-08-16 Online:2026-04-25 Published:2026-04-30

摘要/Abstract

摘要： 提出了一种基于显著性检测器与具有衰减掩码的多头自注意力结合的声学模型，此模型可以在执行声音事件检测与定位任务时更好地关注空间信息。通过显著性检测器在局部信息内关注显著性高的部分，使模型更加关注信息丰富度高的类别。其次在多头自注意力模块中引入了衰减掩码，这种设计可以使模型更加专注于局部信息，引入自适应约束使注意力头多样化。实验结果表明，提出的模型相较于基线模型性能更好，与融合Transformer和Multi-scale模型相比较，所提模型具有更优的检测与定位效果。最后利用视频信息充当额外数据来提升性能，表现出良好的性能。

关键词: 声音事件检测和定位, 显著性检测器, 多头自注意力, 自适应约束衰减掩码 ,

Abstract: A novel acoustic module is proposed, which combines a saliency detector with multi-head self-attention equipped with a decay mask. This model aids in better focusing on spatial information when performing sound event localization & detection tasks. By utilizing the saliency detector to concentrate on highly salient regions within local information, the model pays more attention to categories with rich information content. Secondly, a decay mask is introduced into the multi-head self-attention module, enabling the model to focus more on local information. Additionally, adaptive constraints are incorporated to diversify the attention heads. Experimental results demonstrate that the proposed model outperforms the baseline models. When compared with models that fuse Transformer and Multi-scale architectures, the proposed model exhibits superior detection & localization performance. Finally, lev- eraging video information as additional data to enhance performance, the model demonstrates excellent overall capabilities.

Key words: sound event detection &, localization;saliency detector;multi-head self-attention;adaptive constrained decay mask

王春丽, 陈善立, 刘素倩, 赵小春. 基于显著性检测器与衰减掩码自注意力模块的声音事件检测和定位研究[J]. 计算机工程与科学, 2026, 48(4): 743-751.

WANG Chunli, CHEN Shanli, LIU Suqian, ZHAO Xiaochun. Sound event detection & localization based on saliency detector and decay mask self-attention module[J]. Computer Engineering & Science, 2026, 48(4): 743-751.

[1]	王海群, 赵涛, 王柄楠, 晁帅. 复杂天气下交通标志识别算法研究[J]. 计算机工程与科学, 2026, 48(4): 676-688.
[2]	张朝然, 马玉骐, 张三峰, 杨望. 一种基于强化学习的PE恶意软件对抗样本生成方法[J]. 计算机工程与科学, 2026, 48(4): 617-627.
[3]	余学雯, 陈海燕, 黄鹏程. 深度驱动图划分的关键路径延时优化研究[J]. 计算机工程与科学, 2026, 48(4): 590-598.
[4]	滕尚志, 梅长旺, 游新冬, 吕学强. 融合多尺度信息和特征映射关系的层次多粒度图像分类[J]. 计算机工程与科学, 2026, 48(3): 488-499.
[5]	王艺焱, 王海荣, 王怡梦, 王文龙. 自适应融合的多模态实体对齐方法[J]. 计算机工程与科学, 2026, 48(2): 372-380.
[6]	付启航, 秦永彬, 黄瑞章, 周裕林, 胡青青. 基于多阶段协同推理的大语言模型司法问答框架[J]. 计算机工程与科学, 2026, 48(2): 268-276.
[7]	江艺, 吴向军, 张经纬. 基于发送速率梯度的数据中心网络拥塞控制[J]. 计算机工程与科学, 2026, 48(2): 209-215.
[8]	张涵, 王小平. 面向大规模系统的并行进化策略框架[J]. 计算机工程与科学, 2026, 48(1): 11-19.
[9]	刘畅, 徐炜遐. CNN-ViTAMR：一种基于Transformer的自动信号调制识别算法及其轻量化实现#br#[J]. 计算机工程与科学, 2025, 47(8): 1408-1416.
[10]	王莹, 杨青, 王翔宇, 张勇, . 基于非对称空间特征的脑电信号情感分析研究[J]. 计算机工程与科学, 2025, 47(5): 921-930.
[11]	李勇慧, 吴雨悦, 邓凤贤, 司守奎, 赵文飞. 网络战环境下的通信网络结构及负载配置研究[J]. 计算机工程与科学, 2025, 47(12): 2150-2159.
[12]	李文婷, 衡子灵, 李晓茹. 基于MDS码和NMDS码的（几乎）最优可扩展码构造[J]. 计算机工程与科学, 2025, 47(12): 2139-2149.
[13]	刘翔, 李传坤, 郭锦铭, 刘宇. 基于空间注意力机制和多特征数据增强的环境声分类[J]. 计算机工程与科学, 2025, 47(11): 2038-2044.
[14]	刘晓华, 徐茹枝, 杨成月. 一种基于多特征融合嵌入的中文命名实体识别模型研究[J]. 计算机工程与科学, 2024, 46(8): 1473-1481.

基于显著性检测器与衰减掩码自注意力模块的声音事件检测和定位研究

Sound event detection & localization based on saliency detector and decay mask self-attention module

PDF

可视化

摘要/Abstract

引用本文

使用本文

相关文章 14

编辑推荐

Metrics

本文评价