• 中国计算机学会会刊
  • 中国科技核心期刊
  • 中文核心期刊

Computer Engineering & Science ›› 2024, Vol. 46 ›› Issue (09): 1616-1524.

• Graphics and Images • Previous Articles     Next Articles

Mobile monocular depth estimation based on multi-scale feature fusion

CHEN Lei1,LIANG Zheng-you1,2,SUN Yu1,CAI Jun-min1   

  1. (1.School of Computer and Electronics Information,Guangxi University,Nanning 530004;
     2.Guangxi Key Laboratory of Multimedia Communications and Network Technology,Nanning 530004,China)
  • Received:2023-05-10 Revised:2023-11-21 Accepted:2024-09-25 Online:2024-09-25 Published:2024-09-23

Abstract: The current depth estimation model based on depth learning has a large number of parameters, which is difficult to adapt to mobile devices. To address this issue, a lightweight depth estimation method with multi-scale feature fusion that can be deployed on mobile devices is proposed. Firstly, MobileNetV2 is used as the backbone to extract features of four scales. Then, by constructing skip connection paths from the encoder to the decoder, the features of the four scales are fused, fully utilizing the combined positional information from lower layers and semantic information from higher layers. Finally, the fused features are processed through convolutional layers to produce high-precision depth images. After training and testing on  NYU Depth Dataset V2, the experimental results show that the proposed model achieves advanced performance with an evaluation index of δ1 up to 0.812 while only having 1.6×106 parameters numbers. Additionally, it only takes 0.094 seconds to infer a single image on the Kirin 980 CPU of a mobile device, demonstrating its practical application value.

Key words: deep learning, depth estimation, multi-scale feature, lightweight network, mobile terminal model