基于混合分布注意力机制与混合神经网络的语音情绪识别方法

计算机工程与科学 ›› 2022, Vol. 44 ›› Issue (12): 2246-2254.

基于混合分布注意力机制与混合神经网络的语音情绪识别方法

陈巧红，于泽源，贾宇波

（浙江理工大学信息学院，浙江杭州 310018）

收稿日期:2021-03-19 修回日期:2021-06-25 出版日期:2022-12-25 发布日期:2023-01-05

A speech emotion recognition method using mixed distributed attention mechanism and hybrid neural network

CHEN Qiao-hong,YU Ze-yuan,JIA Yu-bo

(School of Information Science and Technology,Zhejiang Sci-Tech University,Hangzhou 310018,China)

Received:2021-03-19 Revised:2021-06-25 Online:2022-12-25 Published:2023-01-05

摘要/Abstract

摘要： 针对现有语音情绪识别中存在无关特征多和准确率较差的问题，提出一种基于混合分布注意力机制与混合神经网络的语音情绪识别方法。该方法在2个通道内，分别使用卷积神经网络和双向长短时记忆网络进行语音的空间特征和时序特征提取，然后将2个网络的输出同时作为多头注意力机制的输入矩阵。同时，考虑到现有多头注意力机制存在的低秩分布问题，在注意力机制计算方式上进行改进，将低秩分布与2个神经网络的输出特征的相似性做混合分布叠加，再经过归一化操作后将所有子空间结果进行拼接，最后经过全连接层进行分类输出。实验结果表明，基于混合分布注意力机制与混合神经网络的语音情绪识别方法比现有其他方法的准确率更高,验证了所提方法的有效性。

关键词: 语音情绪识别, 梅尔频率倒谱系数, 双向长短时记忆网络, 卷积神经网络, 多头注意力机制

Abstract: Aiming at the problem that there are many irrelevant features and low accuracy in the existing speech emotion recognition, a speech emotion recognition method based on mixed distributed attention mechanism and hybrid neural network is proposed. The method is in two channels, and the convolutional neural network and bidirectional short and long-time memory network are used to extract the spatial and temporal features of speech respectively, Then, the outputs of the two networks are used as the input matrix of the multi-head attention mechanism. At the same time, considering the low-rank distribution problem of the existing multi-head attention mechanism, the attention mechanism calculation method is improved. The low rank distribution and the similarity of the output characteristics of the two neural networks are superimposed by mixed distribution. After the normalization operation, all the subspace results are stitched together. Finally, the output is classified through the full connection layer. The experimental results show that, the speech emotion recognition method based on mixed distributed attention mechanism and hybrid neural network has higher accuracy than other existing models, verify- ing the validity of the proposed method.

Key words: speech emotion recognition, Mel frequency cepstral coefficient, bidirectional long short-term memory network, convolutional neural network, multi-head attention mechanism

陈巧红, 于泽源, 贾宇波. 基于混合分布注意力机制与混合神经网络的语音情绪识别方法[J]. 计算机工程与科学, 2022, 44(12): 2246-2254.

CHEN Qiao-hong, YU Ze-yuan, JIA Yu-bo. A speech emotion recognition method using mixed distributed attention mechanism and hybrid neural network[J]. Computer Engineering & Science, 2022, 44(12): 2246-2254.

[1]	陈旭, 陈子雄, 景永俊, 王叔洋, 宋吉飞. 基于双曲图卷积神经网络的切片级漏洞检测方法[J]. 计算机工程与科学, 2025, 47(05): 851-863.
[2]	王莹, 杨青, 王翔宇, 张勇, . 基于非对称空间特征的脑电信号情感分析研究[J]. 计算机工程与科学, 2025, 47(05): 921-930.
[3]	李珍琪, 王强, 齐星云, 赖明澈, 赵言亢, 陆亿行, 黎渊. 轻量化卷积神经网络硬件加速设计及FPGA实现[J]. 计算机工程与科学, 2025, 47(04): 582-591.
[4]	徐欣, 李若诗, 袁野, 刘娜. 基于可学习图像滤波器的雾天驾驶场景图像语义分割[J]. 计算机工程与科学, 2024, 46(11): 2027-2034.
[5]	付燕, 杨旭, 叶鸥. 基于CNN和Transformer特征融合的烟雾识别方法[J]. 计算机工程与科学, 2024, 46(11): 2045-2052.
[6]	潘雨青, 于浩, 李峰. 基于加权非负矩阵分解的异常声音检测方法研究[J]. 计算机工程与科学, 2024, 46(08): 1425-1432.
[7]	田红鹏, 吴璟玮. RIB-NER：基于跨度的中文命名实体识别模型[J]. 计算机工程与科学, 2024, 46(07): 1311-1320.
[8]	尹春勇, 赵峰. 基于双层注意力和深度自编码器的时间序列异常检测模型[J]. 计算机工程与科学, 2024, 46(05): 826-835.
[9]	马长林, 孙状. 基于实体知识的远程监督关系抽取[J]. 计算机工程与科学, 2024, 46(05): 945-950.
[10]	陈杰, 李程, 刘仲. 面向多核向量加速器的卷积神经网络推理和训练向量化方法[J]. 计算机工程与科学, 2024, 46(04): 580-589.
[11]	曹浩东, 汪海涛, 贺建峰. 融合序列局部信息的日期感知序列推荐算法[J]. 计算机工程与科学, 2024, 46(04): 734-742.
[12]	秦文强, 吴仲城, 张俊, 李芳, . 基于异构平台的卷积神经网络加速系统设计[J]. 计算机工程与科学, 2024, 46(01): 12-20.
[13]	周理, 赵祉乔, 潘国腾, 铁俊波, 赵王. 基于RISC-V的图卷积神经网络加速器设计[J]. 计算机工程与科学, 2023, 45(12): 2113-2120.
[14]	余子丞, 凌捷. 基于Transformer和多特征融合的DGA域名检测方法[J]. 计算机工程与科学, 2023, 45(08): 1416-1423.
[15]	刘俊奇, 涂文轩, 祝恩. 图卷积神经网络综述[J]. 计算机工程与科学, 2023, 45(08): 1472-1481.