• 中国计算机学会会刊
  • 中国科技核心期刊
  • 中文核心期刊

计算机工程与科学

• • 上一篇    下一篇

TransGNN: 一种高速高效解码脑电听觉注意力的时空频双分支融合网络

王春丽;高玉鑫;李金絮;张珈豪;王晨名   

  1. (兰州交通大学电子与信息工程学院,甘肃 兰州 730070)

TransGNN: A spatial-temporal-frequency network for decoding EEG auditory attention efficiently

WANG Chunli, GAO Yuxin, LI Jinxu, ZHANG Jiahao, WANG Chenming   

  1. (College of Electronic and Information Engineering, Lanzhou Jiao tong University, Gansu 730070, China)

摘要: 听力正常者在多说话人场景中可专注于某一特定话者声源,听觉注意力检测(auditory attention detection, AAD)通过分析正常听者脑电信号(electroencephalogram, EEG) 解码其关注话者语音特征,建模AAD选择机制。现有AAD方法多局限于单一时域和频域分析,忽略了时频域间的内在关系及空间域信息,导致解码精度受限。鉴于图神经网络(Graph Neural Network, GNN)在处理空间非欧几里得数据方面的卓越能力,本研究提出了一种高速且高效的AAD模型。该模型由时空注意分支和频率注意分支组成,前者通过Transformer捕捉全局上下文信息, GNN建模局部空间拓扑结构,后者则通过残差卷积网络提取多频带EEG频谱特征,两分支融合后,综合考虑时间、空间和频率特征,最终输出AAD分类结果。在公开KUL数据集上进行算法验证,结果表明,该方法在0.1s和1s决策窗口下解码精度分别达88.75%和95.31%,较基线模型显著提升14.45%和14.51%, 5s决策窗口下实现了94.88%解码精度,进一步的消融实验也充分验证了该模型的有效性和必要性。

关键词: 听觉注意力检测, 脑电信号, 图神经网络, 时空频融合机制, 解码精度

Abstract: Normal humans can focus on specific speakers in multi-speaker environments, and Auditory Attention Detection (AAD) aims to analyze EEG decoding characteristics related to the attended speaker's speech waveform to model auditory attention selection. However, existing AAD methods are limited by single-domain analyses, neglecting the interplay between time, frequency, and spatial information. Graph Neural Network (GNN) performs well in processing spatial non-Euclidean data. Based on this, a dual-branch network architecture of spatial-temporal frequency fusion for high-speed and efficient auditory attention detection is studied in this paper. The network includes spatial-temporal attention branch and frequency attention branch. The former uses Transformer to capture global context information and GNN to model local spatial topology; the latter uses residual convolutional network to extract multi-band EEG spectral features. Finally, the two branches are fused to comprehensively consider time, space and frequency features. The classification results of auditory attention detection were obtained. Validation on the KUL dataset showed significant improvements in decoding accuracy, achieving 88.75% and 95.31% within 0.1s and 1s decision windows, respectively, and 94.88% in the 5s window, confirming the efficacy of our proposed time-space-frequency attention mechanism.

Key words: auditory attention, electroencephalogram, graph neural network, time-spatial-frequency attention, decoding accuracy