• 中国计算机学会会刊
  • 中国科技核心期刊
  • 中文核心期刊

计算机工程与科学 ›› 2025, Vol. 47 ›› Issue (5): 864-874.

• 图形与图像 • 上一篇    下一篇

用于遥感图像时空融合的多尺度全聚合网络

于致远1,2,3,宋慧慧2,3,4   

  1. (1.南京信息工程大学计算机学院、网络空间安全学院,江苏 南京 210044;
    2.江苏省大数据分析技术重点实验室,江苏 南京 210044;
    3.大气环境与装备技术协同创新中心,江苏 南京 210044;4.南京信息工程大学自动化学院,江苏 南京 210044)
  • 收稿日期:2024-01-24 修回日期:2024-03-13 出版日期:2025-05-25 发布日期:2025-05-27
  • 基金资助:
    自然科学基金(项目编号61872189)

Multi-scale fully aggregated network for spatiotemporal fusion of remote sensing images

YU Zhiyuan1,2,3,SONG Huihui2,3,4   

  1. (1.School of Computer Science, School of Cyber Science and Engineering,
     Nanjing University of Information Science & Technology,Nanjing 210044;
    2.Jiangsu Key Laboratory of Big Data Analysis Technology,Nanjing 210044;
    3.Collaborative Innovation Center on Atmospheric Environment and Equipment Technology,Nanjing 210044;
    4.School of Automation,Nanjing University of Information Science & Technology,Nanjing 210044,China)
  • Received:2024-01-24 Revised:2024-03-13 Online:2025-05-25 Published:2025-05-27

摘要: 时空融合旨在生成具有高时空分辨率的遥感图像。目前,大多数时空融合模型通常使用卷积运算进行特征提取,无法对全局特征的相关性进行建模,这限制了它们捕获长程依赖性的能力。同时,由于图像空间分辨率的显著差异,重建细节纹理变得十分困难。为了解决这些问题,提出了一种用于遥感图像时空融合的多尺度全聚合网络模型。首先,引入改进的Transformer编码器结构学习图像中的局部时间特征和全局时间特征,通过在空间和通道维度对像素交互进行建模,有效提取图像中包含的时间和空间纹理信息。其次,设计了一种多尺度分层聚合模块,包括局部卷积、中尺度自注意力和全局自注意力,提供全尺度的特征提取能力,这有助于弥补模型在重建过程中出现的特征损失。最后,采用自适应实例归一化和权重融合模块,通过学习从粗图像到精细图像的纹理转移和局部变化,生成具有全局时空相关性的融合图像。在CIA和LGC这2个标准数据集上将提出的模型与5个具有代表性的时空融合模型进行了对比实验。实验结果显示,所提出模型在5种评价指标上均取得了最优结果。

关键词: 遥感, 时空融合, Transformer, 多尺度特征提取

Abstract: Spatiotemporal fusion is designed to generate remote sensing images with high spatio- temporal resolution. Currently, most spatiotemporal fusion models usually use convolution operations for feature extraction and cannot model the correlation of global features, which limits their ability to capture long-range dependencies. At the same time, due to the significant difference in spatial resolution of the images, it becomes very difficult to reconstruct the detailed texture. To solve these problems, a multi-scale full aggregation network model for spatiotemporal fusion of remote sensing images is proposed in this paper. Firstly, this paper introduces an improved Transformer encoder structure to learn the local and global time features in the images, and effectively extracts the temporal and spatial texture information contained within the images by modeling pixel interaction in space and channel dimensions. Secondly, a multi-scale hierarchical aggregation module, including local convolution, mesoscale self- attention and global self-attention, is designed to provide full-scale feature extraction capability, which helps to compensate for the feature loss in the model reconstruction process. Finally, the adaptive instance normalization and weight fusion module are used to learn the texture transfer and local changes from coarse image to fine image to generate the fusion image with global spatiotemporal correlation. Comparative experiments were conducted between the proposed model and five representative spatio- temporal fusion models on two benchmark datasets, CIA and LGC. Experimental results demonstrate that the proposed model outperformed all baseline models across five evaluation metrics.

Key words: remote sensing, spatiotemporal fusion, Transformer, multiscale feature extraction