手语到情感语音的转换

计算机工程与科学 ›› 2022, Vol. 44 ›› Issue (10): 1869-1876.

手语到情感语音的转换

王伟喆1，郭威彤2,3，杨鸿武1,2,3

(1.西北师范大学物理与电子工程学院，甘肃兰州 730070；2.西北师范大学教育技术学院，甘肃兰州 730070；
3.互联网教育数据学习分析技术国家地方联合工程实验室，甘肃兰州 730070)

收稿日期:2020-08-16 修回日期:2021-03-02 接受日期:2022-10-25 出版日期:2022-10-25 发布日期:2022-10-28
基金资助:
国家自然科学基金（62067008，31860285）；甘肃省自然科学基金（21JR7RA117）;甘肃省教育科学“十三五”规划2020年度重点课题GS［2020］GHBZ190

Converting sign language to emotional speech

WANG Wei-zhe1,GUO Wei-tong2,3,YANG Hong-wu1,2,3#br#

(1.College of Physics and Electronic Engineering,Northwest Normal University,Lanzhou 730070;
2.School of Educational Technology,Northwest Normal University,Lanzhou 730070;
3.National and Provincial Joint Engineering Laboratory of
Learning Analysis Technology in Online Education,Lanzhou 730070,China)

Received:2020-08-16 Revised:2021-03-02 Accepted:2022-10-25 Online:2022-10-25 Published:2022-10-28

摘要/Abstract

摘要： 为了解决语言障碍者与健康人之间的交流障碍问题，提出了一种基于神经网络的手语到情感语音转换方法。首先，建立了手势语料库、人脸表情语料库和情感语音语料库；然后利用深度卷积神经网络实现手势识别和人脸表情识别，并以普通话声韵母为合成单元，训练基于说话人自适应的深度神经网络情感语音声学模型和基于说话人自适应的混合长短时记忆网络情感语音声学模型；最后将手势语义的上下文相关标注和人脸表情对应的情感标签输入情感语音合成模型，合成出对应的情感语音。实验结果表明，该方法手势识别率和人脸表情识别率分别达到了95.86%和92.42%，合成的情感语音EMOS得分为4.15，合成的情感语音具有较高的情感表达程度，可用于语言障碍者与健康人之间正常交流。

关键词: 手势识别, 人脸表情识别, 情感语音合成, 神经网络, 手语到语音转换, 语言障碍者

Abstract: In order to solve the problem of communication between speech-impaired people and healthy people, a neural network-based sign language-to-emotional speech conversion method is proposed. Firstly, a gesture corpus, a facial expression corpus, and an emotional speech corpus are established. Then, a deep convolution neural network is used to realize the recognition of gestures and facial expression. Mandarin vowels and consonants are used as synthesis units to train the deep neural network emotional speech acoustic model based on speaker adaptation and the mixed long short-term memory network emotional speech acoustic model based on speaker adaptation. Finally, the context-dependent labels of gesture semantics and the emotion labels corresponding to facial expression are input into the emotional speech synthesis model to synthesize the corresponding emotional speech. The experimental results show that gesture recognition accuracy and the facial expression recognition accuracy are 95.86% and 92.42%, respectively, and the average mean score of the synthesized emotional speech is 4.15. Meanwhile, the synthesized emotional speech has a high degree of emotional expression, which can be used for communication between speech-impaired people and healthy people.

Key words: gesture recognition, facial expression recognition, emotional speech synthesis, neural network, sign language to speech conversion, speech-impaired people

王伟喆, 郭威彤, 杨鸿武, . 手语到情感语音的转换[J]. 计算机工程与科学, 2022, 44(10): 1869-1876.

WANG Wei-zhe, GUO Wei-tong, YANG Hong-wu, . Converting sign language to emotional speech[J]. Computer Engineering & Science, 2022, 44(10): 1869-1876.

[1]	姜晶菲, 何源宏, 许金伟, 许诗瑶, 钱希福. NM-SpMM：面向国产异构向量处理器的半结构化稀疏矩阵乘算法[J]. 计算机工程与科学, 2024, 46(07): 1141-1150.
[2]	王堃, 李少波, 何玲, 周鹏. 基于改进北方苍鹰优化随机配置网络的网络流量预测模型[J]. 计算机工程与科学, 2024, 46(07): 1245-1255.
[3]	田红鹏, 吴璟玮. RIB-NER：基于跨度的中文命名实体识别模型[J]. 计算机工程与科学, 2024, 46(07): 1311-1320.
[4]	邓翔宇, 裴浩媛, 盛迎. 基于网络融合的改进MobileViT人脸表情识别[J]. 计算机工程与科学, 2024, 46(06): 1072-1080.
[5]	尹春勇, 赵峰. 基于双层注意力和深度自编码器的时间序列异常检测模型[J]. 计算机工程与科学, 2024, 46(05): 826-835.
[6]	范琪, 王善敏, 刘成广, 刘青山. 类别特征约束的多目标域表情识别方法[J]. 计算机工程与科学, 2024, 46(05): 836-845.
[7]	马长林, 孙状. 基于实体知识的远程监督关系抽取[J]. 计算机工程与科学, 2024, 46(05): 945-950.
[8]	陈杰, 李程, 刘仲. 面向多核向量加速器的卷积神经网络推理和训练向量化方法[J]. 计算机工程与科学, 2024, 46(04): 580-589.
[9]	王谢中, 陈旭, 景永俊, 王叔洋. 基于异构图神经网络的半监督网站主题分类[J]. 计算机工程与科学, 2024, 46(04): 635-646.
[10]	吴瑕, 郑洪英, 肖迪. 一种基于认证文件的双方验证模型水印方案[J]. 计算机工程与科学, 2024, 46(04): 647-656.
[11]	余天赐, 高尚. 融合多结构信息的代码注释生成模型[J]. 计算机工程与科学, 2024, 46(04): 667-675.
[12]	李清风, 金柳, 马慧芳, 张若一. 双视图对比学习引导的多行为推荐方法[J]. 计算机工程与科学, 2024, 46(04): 707-715.
[13]	曹浩东, 汪海涛, 贺建峰. 融合序列局部信息的日期感知序列推荐算法[J]. 计算机工程与科学, 2024, 46(04): 734-742.
[14]	吕伏, 韩晓天, 冯永安, 项梁. 基于自适应纹理特征融合的纹理图像分类方法[J]. 计算机工程与科学, 2024, 46(03): 488-498.
[15]	马雪, 何星星, 兰咏琪, 李莹芳. 一阶逻辑中基于treelet图神经网络的前提选择[J]. 计算机工程与科学, 2024, 46(02): 374-380.