[1] |
Qin Y,Carlini N,Cottrell G,et al.Imperceptible,robust,and targeted adversarial examples for automatic speech recognition[C]∥Proc of International Conference on Machine Learning,2019:5231-5240.
|
[2] |
Prasad A,Jyothi P,Velmurugan R.An investigation of end-to-end models for robust speech recognition[C]∥Proc of 2021 IEEE International Conference on Acoustics,Speech and Signal Processing,2021:6893-6897.
|
[3] |
Zhang H,Liu C,Inoue N,et al.Multi-task autoencoder for noise-robust speech recognition[C]∥Proc of 2018 IEEE International Conference on Acoustics,Speech and Signal Processing, 2018:5599-5603.
|
[4] |
Hu H,Tan T,Qian Y.Generative adversarial networks based data augmentation for noise robust speech recognition[C]∥Proc of 2018 IEEE International Conference on Acoustics,Speech and Signal Processing,2018:5044-5048.
|
[5] |
Lee K F,Hon H W,Reddy R. An overview of the SPHINX speech recognition system[J].IEEE Transactions on Acoustics,Speech,and Signal Processing,1990,38(1):35-45.
|
[6] |
Graves A, Fernández S,Gomez F, et al.Connectionist temporal classification:Labelling unsegmented sequence data with recurrent neural networks[C]∥Proc of the 23rd International Conference on Machine Learning,2006:369-376.
|
[7] |
Chorowski J K,Bahdanau D,Serdyuk D,et al.Attention-based models for speech recognition[C]∥Proc of the 28th International Conference on Neural Information Processing Systems,2015:577-585.
|
[8] |
Amodei D,Ananthanarayanan S,Anubhai R,et al.Deep speech 2:End-to-end speech recognition in English and Mandarin[C]∥Proc of International Conference on Machine Learning,2016:173-182.
|
[9] |
Kannan A,Wu Y,Nguyen P,et al.An analysis of incorporating an external language model into a sequence-to-sequence model[C]∥Proc of 2018 IEEE International Conference on Acoustics,Speech and Signal Processing,2018:5824-5828.
|
[10] |
Gulati A,Qin J,Chiu C C,et al.Conformer:Convolution-augmented transformer for speech recognition[C]∥Proc of the 21st Annual Conference of the International Speech Communication Association,2020:5036-5040.
|
[11] |
Yao Z,Wu D,Wang X,et al.Wenet:Production oriented streaming and non-streaming end-to-end speech recognition Toolkit[C]∥Proc of the 22nd Annual Conference of the International Speech Communication Association,2021:4054-4058.
|
[12] |
Li Sen.Design and implementation of noise robust speech recognition algorithm based on deep learning[D].Chengdu:University of Electronic Science and Technology of China,2021.(in Chinese)
|
[13] |
Moore A H,Xue W,Naylor P A,et al.Noise covariance matrix estimation for rotating microphone arrays[J].IEEE/ACM Transactions on Audio,Speech,and Language Processing,2018,27(3):519-530.
|
[14] |
Hsu W N,Zhang Y,Glass J.Unsupervised domain adaptation for robust speech recognition via variational autoencoder- based data augmentation[C]∥Proc of 2017 IEEE Automatic Speech Recognition and Understanding Workshop,2017:16-23.
|
[15] |
Liu B,Nie S,Liang S,et al.Jointly adversarial enhancement training for robust end-to-end speech recognition[C]∥Proc of Interspeech,2019:491-495.
|
[16] |
Pujari S, Sneha S K,Vinusha R,et al.A survey on deep learning based lip-reading techniques[C]∥Proc of the 3rd International Conference on Intelligent Communication Technologies and Virtual Mobile Networks,2021:1286-1293.
|
[17] |
Makino T,Liao H,Assael Y,et al.Recurrent neural network transducer for audio-visual speech recognition[C]∥Proc of 2019 IEEE Automatic Speech Recognition and Understanding Workshop,2019:905-912.
|
[18] |
MacKenzie I S, Soukoreff R W.A character-level error ana- lysis technique for evaluating text entry methods[C]∥Proc of the 2nd Nordic Conference on Human-Computer Interaction,2002:243-246.
|
|
附中文参考文献:
|
[12] |
李森.基于深度学习的噪声鲁棒性语音识别算法设计与实现[D].成都:电子科技大学,2021.
|