• 中国计算机学会会刊
  • 中国科技核心期刊
  • 中文核心期刊

Computer Engineering & Science ›› 2026, Vol. 48 ›› Issue (3): 521-530.

• Graphics and Images • Previous Articles     Next Articles

A 3D human pose estimation method integrating semantic graph convolutional network and self-attention mechanism

TONG Lijing,YING Yizhuo,CAO Nan   

  1. (School of Artificial Intelligence and Computer Science,North China University of Technology,Beijing 100144,China)
  • Online:2026-03-25 Published:2026-03-25

Abstract: Aiming at the problem that it is difficult to capture the global characteristics of human joint sequences and the estimation accuracy is not high, a 3D human pose estimation method combining semantic graph convolutional network and self-attention mechanism is proposed. Firstly, in order to improve the feature extraction effect in the process of mapping from two-dimensional human pose sequence to three-dimensional human pose sequence, self-attention mechanism is integrated into semantic graph convolutional network to carry out spatial feature extraction based on the integration of local features and global features. Secondly, the channel-mixing module of the MLP-Mixer network is improved by introducing a semantic graph convolutional network and a U-shaped MLP structure for temporal feature extraction. Finally, 3D human pose estimation is performed based on the fused features from 2D human images and the extracted temporal features.  Experimental evaluations on the Human3.6M dataset for 3D human pose estimation demonstrate that, compared with current mainstream 3D human pose estimation methods, the proposed method reduces the average error metrics MPJPE and PA-MPJPE by approximately 4.5 mm and 0.2 mm compared with the suboptimal method, respectively. The experimental results validate the effectiveness of the proposed method.

Key words: 3D human pose estimation;semantic graph convolutional network;MLP-Mixer model;self-attention , mechanism