• 中国计算机学会会刊
  • 中国科技核心期刊
  • 中文核心期刊

计算机工程与科学 ›› 2023, Vol. 45 ›› Issue (11): 1982-1990.

• 图形与图像 • 上一篇    下一篇

一种基于MLP的高效高精度三维视线估计方法

吴志豪1,张德军1,吴亦奇1,陈壹林2   

  1. (1.中国地质大学(武汉)计算机学院,湖北 武汉 430078;
    2.智能机器人湖北省重点实验室(武汉工程大学),湖北 武汉 430205) 

  • 收稿日期:2022-06-14 修回日期:2023-01-31 接受日期:2023-11-25 出版日期:2023-11-25 发布日期:2023-11-16
  • 基金资助:
    国家自然科学基金(61802355);智能机器人湖北省重点实验室开放基金(HBIR 202105)

An efficient and high-precision 3D gaze estimation method based on MLP

WU Zhi-hao1,ZHANG De-jun1,WU Yi-qi1,CHEN Yi-lin2   

  1. (1.School of Computer Science,China University of Geosciences,Wuhan 430078;
    2.Hubei Key Laboratory of Intelligent Robot (Wuhan Institute of Technology),Wuhan 430205,China)
  • Received:2022-06-14 Revised:2023-01-31 Accepted:2023-11-25 Online:2023-11-25 Published:2023-11-16

摘要: 随着卷积神经网络(CNN)在计算机视觉领域的广泛应用,以及大量三维视线数据集的公开,基于表观和深度学习相结合的三维视线估计研究受到越来越多的关注。由于CNN结构复杂,这类方法在实时性要求较高的应用场景中还有待进一步改进。近来兴起的研究表明,网络结构更为简单的多层感知机(MLP)模型能够取得与当前最佳CNN、Transformer模型相当的性能。受此启发,提出了一种基于MLP的高效高精度三维视线估计方法,利用MLP模型对双眼、人脸图像提取特征,之后融合推导出三维视线。实验结果表明,对MPIIFaceGaze数据集和EyeDiap数据集中包含的31位不同相貌的受试者,使用提出的方法UM-Net进行视线估计,视线估计精度比肩基于CNN的,并且在视线估计速度上具有明显优势,在实时性要求较高的领域也有较好的应用前景。

关键词: 三维视线估计, 表观, 多层感知机, 实时性

Abstract: With the wide application of convolutional neural network (CNN) in the field of computer vision and the release of a large number of 3D gaze datasets, research on 3D gaze estimation based on the combination of apparent and deep learning has received more and more attention. However, due to the complex structure of CNN, such methods need to be further improved in occasions with high real-time requirements. Recent studies have shown that MLP models with simpler structures can achieve performance comparable to the current best CNN and Transformer models. Inspired by this, an efficient and high-precision 3D gaze estimation method based on MLP is proposed. The MLP model is used to extract features from face and binocular images and then fuse them to derive 3D gaze. Experiment shows that, for the 31 subjects with different appearance characteristics in MPIIFaceGaze dataset and EyeDiap dataset, the proposed method UM-Net achieves gaze estimation accuracy that is comparable to CNNs-based method, and it has obvious advantages in gaze estimation speed. Therefore, it has a good application prospect in fields with high real-time requirements.

Key words: 3D gaze estimation, appearance, multi-layer perceptron(MLP), real-time