• 中国计算机学会会刊
  • 中国科技核心期刊
  • 中文核心期刊

Computer Engineering & Science ›› 2025, Vol. 47 ›› Issue (5): 864-874.

• Graphics and Images • Previous Articles     Next Articles

Multi-scale fully aggregated network for spatiotemporal fusion of remote sensing images

YU Zhiyuan1,2,3,SONG Huihui2,3,4   

  1. (1.School of Computer Science, School of Cyber Science and Engineering,
     Nanjing University of Information Science & Technology,Nanjing 210044;
    2.Jiangsu Key Laboratory of Big Data Analysis Technology,Nanjing 210044;
    3.Collaborative Innovation Center on Atmospheric Environment and Equipment Technology,Nanjing 210044;
    4.School of Automation,Nanjing University of Information Science & Technology,Nanjing 210044,China)
  • Received:2024-01-24 Revised:2024-03-13 Online:2025-05-25 Published:2025-05-27

Abstract: Spatiotemporal fusion is designed to generate remote sensing images with high spatio- temporal resolution. Currently, most spatiotemporal fusion models usually use convolution operations for feature extraction and cannot model the correlation of global features, which limits their ability to capture long-range dependencies. At the same time, due to the significant difference in spatial resolution of the images, it becomes very difficult to reconstruct the detailed texture. To solve these problems, a multi-scale full aggregation network model for spatiotemporal fusion of remote sensing images is proposed in this paper. Firstly, this paper introduces an improved Transformer encoder structure to learn the local and global time features in the images, and effectively extracts the temporal and spatial texture information contained within the images by modeling pixel interaction in space and channel dimensions. Secondly, a multi-scale hierarchical aggregation module, including local convolution, mesoscale self- attention and global self-attention, is designed to provide full-scale feature extraction capability, which helps to compensate for the feature loss in the model reconstruction process. Finally, the adaptive instance normalization and weight fusion module are used to learn the texture transfer and local changes from coarse image to fine image to generate the fusion image with global spatiotemporal correlation. Comparative experiments were conducted between the proposed model and five representative spatio- temporal fusion models on two benchmark datasets, CIA and LGC. Experimental results demonstrate that the proposed model outperformed all baseline models across five evaluation metrics.

Key words: remote sensing, spatiotemporal fusion, Transformer, multiscale feature extraction