• 中国计算机学会会刊
  • 中国科技核心期刊
  • 中文核心期刊

计算机工程与科学 ›› 2022, Vol. 44 ›› Issue (12): 2246-2254.

• 人工智能与数据挖掘 • 上一篇    下一篇

基于混合分布注意力机制与混合神经网络的语音情绪识别方法

陈巧红,于泽源,贾宇波   

  1. (浙江理工大学信息学院,浙江 杭州 310018)
  • 收稿日期:2021-03-19 修回日期:2021-06-25 接受日期:2022-12-25 出版日期:2022-12-25 发布日期:2023-01-05

A speech emotion recognition method using mixed distributed attention mechanism and hybrid neural network

CHEN Qiao-hong,YU Ze-yuan,JIA Yu-bo   

  1. (School of Information Science and Technology,Zhejiang Sci-Tech University,Hangzhou 310018,China)
  • Received:2021-03-19 Revised:2021-06-25 Accepted:2022-12-25 Online:2022-12-25 Published:2023-01-05

摘要: 针对现有语音情绪识别中存在无关特征多和准确率较差的问题,提出一种基于混合分布注意力机制与混合神经网络的语音情绪识别方法。该方法在2个通道内,分别使用卷积神经网络和双向长短时记忆网络进行语音的空间特征和时序特征提取,然后将2个网络的输出同时作为多头注意力机制的输入矩阵。同时,考虑到现有多头注意力机制存在的低秩分布问题,在注意力机制计算方式上进行改进,将低秩分布与2个神经网络的输出特征的相似性做混合分布叠加,再经过归一化操作后将所有子空间结果进行拼接,最后经过全连接层进行分类输出。实验结果表明,基于混合分布注意力机制与混合神经网络的语音情绪识别方法比现有其他方法的准确率更高,验证了所提方法的有效性。

关键词: 语音情绪识别, 梅尔频率倒谱系数, 双向长短时记忆网络, 卷积神经网络, 多头注意力机制

Abstract: Aiming at the problem that there are many irrelevant features and low accuracy in the existing speech emotion recognition, a speech emotion recognition method based on mixed distributed attention mechanism and hybrid neural network is proposed. The method  is in two channels, and the convolutional neural network and bidirectional short and long-time memory network are used to extract the spatial and temporal features of speech respectively, Then, the outputs of the two networks are used as the input matrix of the multi-head attention mechanism. At the same time, considering the low-rank distribution problem of the existing multi-head attention mechanism, the attention mechanism calculation method is improved. The low rank distribution and the similarity of the output characteristics of the two neural networks are superimposed by mixed distribution. After the normalization operation, all the subspace results are stitched together. Finally, the output is classified through the full connection layer. The experimental results show that, the speech emotion recognition method based on mixed distributed attention mechanism and hybrid neural network has higher accuracy than other existing models, verify- ing the validity of the proposed method.

Key words: speech emotion recognition, Mel frequency cepstral coefficient, bidirectional long short-term memory network, convolutional neural network, multi-head attention mechanism