• 中国计算机学会会刊
  • 中国科技核心期刊
  • 中文核心期刊

计算机工程与科学 ›› 2023, Vol. 45 ›› Issue (10): 1822-1829.

• 图形与图像 • 上一篇    下一篇

融合语义实例重建的抓取位姿估计方法

韩慧妍1,2,3,王文俊1,2,3,韩燮1,2,3,况立群1,2,3,薛红新1,2,3   

  1. (1.中北大学计算机科学与技术学院,山西 太原 030051;
    2.机器视觉与虚拟现实山西省重点实验室,山西 太原 030051;
    3.山西省视觉信息处理及智能机器人工程研究中心,山西 太原 030051)
  • 收稿日期:2022-06-09 修回日期:2022-10-25 接受日期:2023-10-25 出版日期:2023-10-25 发布日期:2023-10-17
  • 基金资助:
    国家自然科学基金(62106238);山西省自然科学基金(202303021211153);山西省科技成果转化引导专项(202104021301055);山西省研究生创新项目(2021Y626)

A grasp pose estimation method combining semantic instance reconstruction

HAN Hui-yan1,2,3,WANG Wen-jun1,2,3,HAN Xie1,2,3,KUANG Li-qun1,2,3,XUE Hong-xin1,2,3   


  1. (1.School of Computer Science and Technology,North University of China,Taiyuan 030051;
    2.Shanxi Key Laboratory of Machine Vision and Virtual Reality,Taiyuan 030051;
    3.Shanxi Province’s Vision Information Processing and Intelligent Robot Engineering Research Center,Taiyuan 030051,China)
  • Received:2022-06-09 Revised:2022-10-25 Accepted:2023-10-25 Online:2023-10-25 Published:2023-10-17

摘要: 针对抓取任务中难以区分多个紧邻物体及高维位姿学习准确率差的问题,提出一种融合语义实例重建的抓取位姿估计方法。增加语义实例重建分支对前景完成隐式三维重建,并以投票方式预测每个前景点所属实例的中心坐标,以区分紧邻物体;提出一种高维位姿的降维学习方法,使用2个正交单位向量分解三维旋转矩阵,提升位姿学习准确率;搭建融合语义实例重建的抓取检测网络SIRGN,并在VGN仿真抓取数据集上完成训练。实验结果表明,SIRGN在拥挤(Packed)和堆叠(Pile)场景的抓取成功率分别达到了89.5%和78.1%,且在真实环境具有良好的适用性。

关键词: 抓取位姿估计, 隐式三维重建, 投票, 降维, 旋转矩阵

Abstract: To solve the problem that it is difficult to distinguish multiple adjacent objects and the accuracy of high-dimensional pose learning is poor, a pose estimation method combining on semantic instance reconstruction is proposed. The semantic instance reconstruction branch is added to complete implicit 3D reconstruction of the foreground, and the center coordinate of each foreground point belongs to the instance is predicted by the voting method to distinguish adjacent objects. A pose dimensionality reduction learning method is proposed. Two orthogonal unit vectors are used to decompose the three- dimensional rotation matrix to improve the accuracy of pose learning. A semantic instance reconstruction grasping network (SIRGN) is proposed, and the training is completed on VGN simulation grasping dataset. The experimental results show that the grasping success rate of SIRGN in Packed and Pile environment is 89.5% and 78.1% respectively, and it has good applicability in real environment.

Key words: grasp pose estimation, implicit 3D reconstruction, voting, dimensionality reduction, rotation matrix