• 中国计算机学会会刊
  • 中国科技核心期刊
  • 中文核心期刊

计算机工程与科学

• 人工智能与数据挖掘 • 上一篇    下一篇

基于计算机视觉及深度学习的无人机手势控制系统

马乐乐,李照洋,董嘉蓉,侯永宏   

  1. (天津大学电子信息工程学院,天津 300072)
  • 收稿日期:2016-08-16 修回日期:2016-12-07 出版日期:2018-05-25 发布日期:2018-05-25

UAV gesture control system based on
computer vision and deep learning

MA Le-le,LI Zhao-yang,DONG Jia-rong,HOU Yong-hong   

  1. (School of Electric Information Engineering,Tianjin University,Tianjin 300072,China)
  • Received:2016-08-16 Revised:2016-12-07 Online:2018-05-25 Published:2018-05-25

摘要:

传统的无人机人机交互需要专门的设备和专业的训练,便捷新颖的交互方式往往更令人青睐。利用普通相机,对基于计算机视觉以及深度学习的无人机手势控制系统进行了研究。该系统首先利用快速跟踪算法在视频序列中提取出操作者所在区域,大大减少后续视频处理压力的同时去除了复杂背景以及相机漂移的影响。其次,根据动作的时间信息,用不同颜色编码光流特征,叠加在一张图片上,将视频转换为同时包含时间特征以及空间特征的彩色纹理图。最后,利用卷积神经网络对彩色纹理图进行学习及分类,根据分类结果生成控制无人机的指令。该系统每0.4 s对1.6 s内的动作进行一次判定,利用卷积神经网络对图片的分类实现实时性的人机交互,系统在60 m范围内的识别准确率在93%以上,在室内和室外环境下,操作者可以通过模仿指令动作方便地控制无人机。
 
 

关键词: 人机交互, 深度学习, 卷积神经网络, 无人机, 手势控制

Abstract:

The traditional Unmanned Aerial Vehicle (UAV) human-machine interaction requires specialized equipment and professional training, and convenient and innovative ways of interaction are often more popular. In this paper, with ordinary cameras, we study the UAV gesture control system based on computer vision and deep learning. The system first uses the fast tracking algorithm to extract the operator’s region in the video sequence, greatly reducing the pressure of subsequent video processing while removing the influence of complex background and camera drift. Secondly, according to the time information of the actions, the optical flow features are encoded in different colors and superimposed on a picture, and the video is converted into a color texture map that contains both temporal features and spatial features. Finally, colored texture images are well learned and classified by a deep Convolutional Neural Network (CNN) and UAV controlling commands are generated according to the classified results. The proposed system estimates actions within 1.6s every 0.4s and uses CNN to classify pictures so as to achieve real-time human-computer interaction. The system has a recognition accuracy of over 93% within 60 meters. In indoor and outdoor environments, the operator can conveniently control the UAV by imitating command actions.
 

Key words: human machine interface, deep learning, CNN, UAV, gesture control