复杂环境下基于自适应深度神经网络的鲁棒语音识别

计算机工程与科学 ›› 2022, Vol. 44 ›› Issue (06): 1105-1113.

复杂环境下基于自适应深度神经网络的鲁棒语音识别

张开生,赵小芬

(陕西科技大学电气与控制工程学院，陕西西安 710021)

收稿日期:2020-05-22 修回日期:2020-12-28 接受日期:2022-06-25 出版日期:2022-06-25 发布日期:2022-06-17
基金资助:
国家自然科学基金(61601271)；陕西省科技计划（2017GY-063）；陕西省榆林市2020年科技计划（CXY-2020-090）

Robust speech recognition based on adaptive deep neural network in complex environment

ZHANG Kai-sheng,ZHAO Xiao-fen

（School of Electrical and Control Engineering,Shaanxi University of Science and Technology,Xi’an 710021,China）

Received:2020-05-22 Revised:2020-12-28 Accepted:2022-06-25 Online:2022-06-25 Published:2022-06-17

摘要/Abstract

摘要： 在连续语音识别系统中，针对复杂环境（包括说话人及环境噪声的多变性）造成训练数据与测试数据不匹配导致语音识别率低下的问题，提出一种基于自适应深度神经网络的语音识别算法。结合改进正则化自适应准则及特征空间的自适应深度神经网络提高数据匹配度；采用融合说话人身份向量i-vector及噪声感知训练克服说话人及环境噪声变化导致的问题，并改进传统深度神经网络输出层的分类函数，以保证类内紧凑、类间分离的特性。通过在TIMIT英文语音数据集和微软中文语音数据集上叠加多种背景噪声进行测试，实验结果表明，相较于目前流行的GMM-HMM和传统DNN语音声学模型，所提算法的识别词错误率分别下降了5.151%和3.113%，在一定程度上提升了模型的泛化性能和鲁棒性。

关键词: 语音识别, 深度神经网络, 改进自适应准则, 特征空间

Abstract: In a continuous speech recognition system, aiming at the complex environments (including the variability of speakers and environmental noise), the training data does not match the test data, which results in a low voice recognition rate. A speech recognition method based on adaptive deep neural network is studied. The improved regularized adaptive criterion and the adaptive deep neural network in the feature space are combined to improve data matching. The fusion of speaker identity vector i-vector and noise perception training are used to overcome speaker and environmental noise changes and improve the classification function of the output layer of the traditional deep neural network, which ensures the characteristics of compactness within the class and separation between classes. The test experiment was carried out by superimposing various background noises under the TIMIT English speech data set and the Microsoft Chinese speech data set. The results show that, compared with the current popular GMM-HMM and traditional DNN speech acoustic models, our proposal decreases the recognition word error rate by 5.151% and 3.113% respectively, which improves the generalization performance and robustness of the model to a certain extent.

Key words: speech recognition, deep neural network, improved adaptive criterion, feature space

张开生, 赵小芬. 复杂环境下基于自适应深度神经网络的鲁棒语音识别[J]. 计算机工程与科学, 2022, 44(06): 1105-1113.

ZHANG Kai-sheng, ZHAO Xiao-fen. Robust speech recognition based on adaptive deep neural network in complex environment[J]. Computer Engineering & Science, 2022, 44(06): 1105-1113.

编辑推荐

Metrics

阅读次数

全文

315

HTML			PDF

最新录用	在线预览	正式出版	最新录用	在线预览	正式出版
0	0	0	0	0	315

来源	本网站	其他网站

次数	242	73
比例	77%	23%

摘要

181

最新录用	在线预览	正式出版

0	0	181

	来源	本网站

	次数	181
	比例	100%

[1]	毛润泽, 吴子恒, 徐嘉阳, 章严, 陈帜, . DeepFlame：基于深度学习和高性能计算的反应流模拟开源平台[J]. 计算机工程与科学, 2024, 46(11): 1901-1907.
[2]	王鹏, 张嘉诚, 范毓洋, . 适应于硬件部署的神经网络剪枝量化算法[J]. 计算机工程与科学, 2024, 46(09): 1547-1553.
[3]	李猛, 刘姿邑, 宋宇航. 基于双重自表达与最大熵原理的深度子空间聚类算法[J]. 计算机工程与科学, 2024, 46(09): 1685-1692.
[4]	辛高枫, 刘玉潇, 张青龙, 韩锐, 刘驰. 边缘侧神经网络块粒度领域自适应技术研究[J]. 计算机工程与科学, 2024, 46(08): 1361-1371.
[5]	姜晶菲, 何源宏, 许金伟, 许诗瑶, 钱希福. NM-SpMM：面向国产异构向量处理器的半结构化稀疏矩阵乘算法[J]. 计算机工程与科学, 2024, 46(07): 1141-1150.
[6]	吴瑕, 郑洪英, 肖迪. 一种基于认证文件的双方验证模型水印方案[J]. 计算机工程与科学, 2024, 46(04): 647-656.
[7]	王斐斐, 贲可荣, 张献. 基于领域知识的语音识别鲁棒性增强技术研究[J]. 计算机工程与科学, 2023, 45(12): 2155-2164.
[8]	曹健, 陈怡梅, 李海生, 蔡强, . 基于图神经网络的行人轨迹预测研究综述[J]. 计算机工程与科学, 2023, 45(06): 1040-1053.
[9]	马铭苑, 李虎, 王梓斌, 况晓辉. 深度神经网络模型后门植入与检测技术研究综述[J]. 计算机工程与科学, 2022, 44(11): 1959-1968.
[10]	翦杰, 罗章, 赖明澈, 肖立权, 徐炜遐. 基于深度神经网络的高速信道自适应均衡器[J]. 计算机工程与科学, 2022, 44(04): 605-610.
[11]	杜鹏, 李超, 石剑平, 姜麟. 基于阿当姆斯捷径连接的深度神经网络模型压缩方法[J]. 计算机工程与科学, 2021, 43(11): 2043-2048.
[12]	张立志, 冉浙江, 赖志权, 刘锋. 分布式深度学习通信架构的性能分析[J]. 计算机工程与科学, 2021, 43(03): 416-425.
[13]	唐作栋，龚晓峰，雒瑞森. 一种小波特征与深度神经网络结合的信号制式识别算法[J]. 计算机工程与科学, 2020, 42(05): 902-909.