• 中国计算机学会会刊
  • 中国科技核心期刊
  • 中文核心期刊

计算机工程与科学 ›› 2023, Vol. 45 ›› Issue (04): 665-673.

• 图形与图像 • 上一篇    下一篇

基于多模态融合的动态手势识别研究

胡宗承1,段晓威2,周亚同1,何昊1   

  1. (1.河北工业大学电子信息工程学院,天津 300401;
    2.中国电子科技集团公司第二十九研究所,四川 成都 610036)
  • 收稿日期:2021-05-21 修回日期:2021-11-02 接受日期:2023-04-25 出版日期:2023-04-25 发布日期:2023-04-13
  • 基金资助:
    多模态;动态手势识别;门控循环神经单元;卷积神经网络

Research on dynamic gesture recognition based on multimodal fusion

HU Zong-cheng1,DUAN Xiao-wei2,ZHOU Ya-tong1,HE Hao1   

  1. (1.School of Electronic and Information Engineering,Hebei University of Technology,Tianjin 300401;
    2.The 29th Research Institute of China Electronics Technology Group Corporation,Chengdu 610036,China)
  • Received:2021-05-21 Revised:2021-11-02 Accepted:2023-04-25 Online:2023-04-25 Published:2023-04-13

摘要: 针对复杂环境中动态手势识别精度低且鲁棒性不强的问题,提出一种基于多模态融合的动态手势识别算法TF-MG。TF-MG结合深度信息和三维手部骨架信息,利用2种不同网络分别提取对应特征信息,然后将提取的特征融合输入分类网络,实现动态手势识别。针对深度信息运用运动历史图像方法,将运动轨迹压缩到单帧图像,使用MobileNetV2提取特征。针对三维手部骨架信息采用门控循环神经单元组成的DeepGRU对手部骨架信息进行特征提取。实验结果表明,在DHG-14/28数据集上,对14类手势识别精度达到93.29%,对28类手势识别精度达到92.25%。相对其他对比算法实现了更高的识别精度。

关键词: 多模态, 动态手势识别, 门控循环神经单元, 卷积神经网络

Abstract: Aiming at the problems of low accuracy and weak robustness of dynamic gesture recognition in complex environment, a dynamic gesture recognition algorithm based on multimodal fusion, named TF-MG, is proposed. TF-MG combines the depth information and hand skeleton information, extracts the corresponding feature information using two different networks, and then fuses the extract- ed features into the classification network to realize dynamic gesture recognition. According to the depth information, the motion history image method is used to compress the motion trajectory into a single frame image, and the feature is extracted by MobileNetV2. According to the hand skeleton information, DeepGRU composed of gated recurrent units is used to extract features from the hand skeleton information. The experimental results show that, on DHG-14/28 dataset, the recognition accuracy of 14 kinds of hand gestures reaches 93.29%, and that of 28 kinds of hand gestures reaches 92.25%. Compared with other algorithms, it achieves higher recognition accuracy.

Key words: multimodality, dynamic gesture recognition, gated recurrent unit, convolutional neural network