TransGNN: 一种高速高效解码脑电听觉注意力的时空频双分支融合网络

摘要/Abstract

摘要： 听力正常者在多说话人场景中可专注于某一特定话者声源，听觉注意力检测(auditory attention detection, AAD)通过分析正常听者脑电信号(electroencephalogram, EEG) 解码其关注话者语音特征，建模AAD选择机制。现有AAD方法多局限于单一时域和频域分析，忽略了时频域间的内在关系及空间域信息，导致解码精度受限。鉴于图神经网络(Graph Neural Network, GNN)在处理空间非欧几里得数据方面的卓越能力，本研究提出了一种高速且高效的AAD模型。该模型由时空注意分支和频率注意分支组成，前者通过Transformer捕捉全局上下文信息， GNN建模局部空间拓扑结构，后者则通过残差卷积网络提取多频带EEG频谱特征，两分支融合后，综合考虑时间、空间和频率特征，最终输出AAD分类结果。在公开KUL数据集上进行算法验证，结果表明，该方法在0.1s和1s决策窗口下解码精度分别达88.75%和95.31%，较基线模型显著提升14.45%和14.51%， 5s决策窗口下实现了94.88%解码精度，进一步的消融实验也充分验证了该模型的有效性和必要性。

关键词: 听觉注意力检测, 脑电信号, 图神经网络, 时空频融合机制, 解码精度

Abstract: Normal humans can focus on specific speakers in multi-speaker environments, and Auditory Attention Detection (AAD) aims to analyze EEG decoding characteristics related to the attended speaker's speech waveform to model auditory attention selection. However, existing AAD methods are limited by single-domain analyses, neglecting the interplay between time, frequency, and spatial information. Graph Neural Network (GNN) performs well in processing spatial non-Euclidean data. Based on this, a dual-branch network architecture of spatial-temporal frequency fusion for high-speed and efficient auditory attention detection is studied in this paper. The network includes spatial-temporal attention branch and frequency attention branch. The former uses Transformer to capture global context information and GNN to model local spatial topology; the latter uses residual convolutional network to extract multi-band EEG spectral features. Finally, the two branches are fused to comprehensively consider time, space and frequency features. The classification results of auditory attention detection were obtained. Validation on the KUL dataset showed significant improvements in decoding accuracy, achieving 88.75% and 95.31% within 0.1s and 1s decision windows, respectively, and 94.88% in the 5s window, confirming the efficacy of our proposed time-space-frequency attention mechanism.

Key words: auditory attention, electroencephalogram, graph neural network, time-spatial-frequency attention, decoding accuracy

王春丽, 高玉鑫, 李金絮, 张珈豪, 王晨名. TransGNN: 一种高速高效解码脑电听觉注意力的时空频双分支融合网络[J]. 计算机工程与科学.

WANG Chunli, GAO Yuxin, LI Jinxu, ZHANG Jiahao, WANG Chenming. TransGNN: A spatial-temporal-frequency network for decoding EEG auditory attention efficiently[J]. Computer Engineering & Science.

[1]	王煜恒, 刘强, 伍晓洁. RCGNN：图注入攻击下的图神经网络鲁棒性认证方法[J]. 计算机工程与科学, 2025, 47(3): 434-447.
[2]	景永俊, 王浩, 邵堃, 王晓峰. 一种基于图热核扩散卷积的网络入侵检测方法[J]. 计算机工程与科学, 2025, 47(3): 459-471.
[3]	陈宇灵, 李翔. 基于图结构提示实现低资源场景下的节点分类[J]. 计算机工程与科学, 2025, 47(3): 534-547.
[4]	侯萱, 梁志贞, 张磊, 刘佰龙, 张雪飞. 基于上下文全局空间图的轨迹用户链接[J]. 计算机工程与科学, 2025, 47(2): 336-348.
[5]	朱嘉骏, 包美凯, 张凯, 刘烨, 刘淇. 基于多源知识注入的常识问答方法研究[J]. 计算机工程与科学, 2025, 47(2): 349-360.
[6]	李瑞红, 李晓红, 姚锦, 王闪闪. 基于双通道异质超图神经网络的引文推荐方法[J]. 计算机工程与科学, 2025, 47(2): 361-369.
[7]	吴斯琦, 赵清华, 于雨晨. 基于元学习的图神经网络冷启动推荐[J]. 计算机工程与科学, 2024, 46(9): 1675-1684.
[8]	袁佳伟, 赵进. 基于图神经网络的OMCI模型相似性计算[J]. 计算机工程与科学, 2024, 46(9): 1576-1586.
[9]	李清风, 金柳, 马慧芳, 张若一. 双视图对比学习引导的多行为推荐方法[J]. 计算机工程与科学, 2024, 46(4): 707-715.
[10]	余天赐, 高尚. 融合多结构信息的代码注释生成模型[J]. 计算机工程与科学, 2024, 46(4): 667-675.
[11]	王谢中, 陈旭, 景永俊, 王叔洋. 基于异构图神经网络的半监督网站主题分类[J]. 计算机工程与科学, 2024, 46(4): 635-646.
[12]	马雪, 何星星, 兰咏琪, 李莹芳. 一阶逻辑中基于treelet图神经网络的前提选择[J]. 计算机工程与科学, 2024, 46(2): 374-380.
[13]	李易霖, 周彪. 基于通道筛选和自适应熵阈值的眼电伪迹自动去除算法[J]. 计算机工程与科学, 2024, 46(12): 2271-2280.
[14]	张悦, 张磊, 刘佰龙, 梁志贞, 张雪飞. 基于时空Transformer的多空间尺度交通预测模型[J]. 计算机工程与科学, 2024, 46(10): 1852-1863.
[15]	陈昌奉, 赵宏州, 周恺卿. 基于图神经网络的代码抄袭检测方法[J]. 计算机工程与科学, 2024, 46(10): 1815-1824.

TransGNN: 一种高速高效解码脑电听觉注意力的时空频双分支融合网络

TransGNN: A spatial-temporal-frequency network for decoding EEG auditory attention efficiently

PDF

可视化

摘要/Abstract

引用本文

使用本文

相关文章 15

编辑推荐

Metrics

本文评价