• 中国计算机学会会刊
  • 中国科技核心期刊
  • 中文核心期刊

Computer Engineering & Science ›› 2022, Vol. 44 ›› Issue (10): 1869-1876.

• Artificial Intelligence and Data Mining • Previous Articles     Next Articles

Converting sign language to emotional speech

WANG Wei-zhe1,GUO Wei-tong2,3,YANG Hong-wu1,2,3#br#   

  1. (1.College of Physics and Electronic Engineering,Northwest Normal University,Lanzhou 730070;
    2.School of Educational Technology,Northwest Normal University,Lanzhou 730070;
    3.National and Provincial Joint Engineering Laboratory of 
    Learning Analysis Technology in Online Education,Lanzhou 730070,China)
  • Received:2020-08-16 Revised:2021-03-02 Accepted:2022-10-25 Online:2022-10-25 Published:2022-10-28

Abstract: In order to solve the problem of communication between speech-impaired people and healthy people, a neural network-based sign language-to-emotional speech conversion method is proposed. Firstly, a gesture corpus, a facial expression corpus, and an emotional speech corpus are established. Then, a deep convolution neural network is used to realize the recognition of gestures and facial expression. Mandarin vowels and consonants are used as synthesis units to train the deep neural network emotional speech acoustic model based on speaker adaptation and the mixed long short-term memory network emotional speech acoustic model based on speaker adaptation. Finally, the context-dependent labels of gesture semantics and the emotion labels corresponding to facial expression are input into the emotional speech synthesis model to synthesize the corresponding emotional speech. The experimental results show that gesture recognition accuracy and the facial expression recognition accuracy are 95.86% and 92.42%, respectively, and the average mean score of the synthesized emotional speech is 4.15. Meanwhile, the synthesized emotional speech has a high degree of emotional expression, which can be used for communication between speech-impaired people and healthy people. 


Key words: gesture recognition, facial expression recognition, emotional speech synthesis, neural network, sign language to speech conversion, speech-impaired people