Computer Engineering & Science ›› 2022, Vol. 44 ›› Issue (06): 1105-1113.
• Artificial Intelligence and Data Mining • Previous Articles Next Articles
ZHANG Kai-sheng,ZHAO Xiao-fen
Received:
Revised:
Accepted:
Online:
Published:
Abstract: In a continuous speech recognition system, aiming at the complex environments (including the variability of speakers and environmental noise), the training data does not match the test data, which results in a low voice recognition rate. A speech recognition method based on adaptive deep neural network is studied. The improved regularized adaptive criterion and the adaptive deep neural network in the feature space are combined to improve data matching. The fusion of speaker identity vector i-vector and noise perception training are used to overcome speaker and environmental noise changes and improve the classification function of the output layer of the traditional deep neural network, which ensures the characteristics of compactness within the class and separation between classes. The test experiment was carried out by superimposing various background noises under the TIMIT English speech data set and the Microsoft Chinese speech data set. The results show that, compared with the current popular GMM-HMM and traditional DNN speech acoustic models, our proposal decreases the recognition word error rate by 5.151% and 3.113% respectively, which improves the generalization performance and robustness of the model to a certain extent.
Key words: speech recognition, deep neural network, improved adaptive criterion, feature space
ZHANG Kai-sheng, ZHAO Xiao-fen. Robust speech recognition based on adaptive deep neural network in complex environment[J]. Computer Engineering & Science, 2022, 44(06): 1105-1113.
0 / / Recommend
Add to citation manager EndNote|Ris|BibTeX
URL: http://joces.nudt.edu.cn/EN/
http://joces.nudt.edu.cn/EN/Y2022/V44/I06/1105