• 中国计算机学会会刊
  • 中国科技核心期刊
  • 中文核心期刊

计算机工程与科学 ›› 2022, Vol. 44 ›› Issue (06): 1105-1113.

• 人工智能与数据挖掘 • 上一篇    下一篇

复杂环境下基于自适应深度神经网络的鲁棒语音识别

张开生,赵小芬   

  1. (陕西科技大学电气与控制工程学院,陕西 西安 710021)

  • 收稿日期:2020-05-22 修回日期:2020-12-28 接受日期:2022-06-25 出版日期:2022-06-25 发布日期:2022-06-17
  • 基金资助:
    国家自然科学基金(61601271);陕西省科技计划(2017GY-063);陕西省榆林市2020年科技计划(CXY-2020-090)

Robust speech recognition based on adaptive deep neural network in complex environment

ZHANG Kai-sheng,ZHAO Xiao-fen   

  1.  (School of Electrical and Control Engineering,Shaanxi University of Science and Technology,Xi’an 710021,China)
  • Received:2020-05-22 Revised:2020-12-28 Accepted:2022-06-25 Online:2022-06-25 Published:2022-06-17

摘要: 在连续语音识别系统中,针对复杂环境(包括说话人及环境噪声的多变性)造成训练数据与测试数据不匹配导致语音识别率低下的问题,提出一种基于自适应深度神经网络的语音识别算法。结合改进正则化自适应准则及特征空间的自适应深度神经网络提高数据匹配度;采用融合说话人身份向量i-vector及噪声感知训练克服说话人及环境噪声变化导致的问题,并改进传统深度神经网络输出层的分类函数,以保证类内紧凑、类间分离的特性。通过在TIMIT英文语音数据集和微软中文语音数据集上叠加多种背景噪声进行测试,实验结果表明,相较于目前流行的GMM-HMM和传统DNN语音声学模型,所提算法的识别词错误率分别下降了5.151%和3.113%,在一定程度上提升了模型的泛化性能和鲁棒性。

关键词: 语音识别, 深度神经网络, 改进自适应准则, 特征空间

Abstract: In a continuous speech recognition system, aiming at the complex environments (including the variability of speakers and environmental noise), the training data does not match the test data, which results in a low voice recognition rate. A speech recognition method based on adaptive deep neural network is studied. The improved regularized adaptive criterion and the adaptive deep neural network in the feature space are combined to improve data matching. The fusion of speaker identity vector i-vector and noise perception training are used to overcome speaker and environmental noise changes and improve the classification function of the output layer of the traditional deep neural network, which ensures the characteristics of compactness within the class and separation between classes. The test experiment was carried out by superimposing various background noises under the TIMIT English speech data set and the Microsoft Chinese speech data set. The results show that, compared with the current popular GMM-HMM and traditional DNN speech acoustic models, our proposal decreases the recognition word error rate by 5.151% and 3.113% respectively, which improves the generalization performance and robustness of the model to a certain extent.


Key words: speech recognition, deep neural network, improved adaptive criterion, feature space