• 中国计算机学会会刊
  • 中国科技核心期刊
  • 中文核心期刊

计算机工程与科学 ›› 2023, Vol. 45 ›› Issue (11): 1911-1921.

• 高性能计算 • 上一篇    下一篇

面向大区域碳卫星数据的分布式Kriging插值算法优化

周小华1,2,王学志1,2,周园春1,2,孟珍1,2   

  1. (1.中国科学院计算机网络信息中心,北京 100083;2.中国科学院大学,北京 100049)

  • 收稿日期:2022-12-05 修回日期:2023-02-05 接受日期:2023-11-25 出版日期:2023-11-25 发布日期:2023-11-16
  • 基金资助:
    中国科学院前沿科学重点研究计划(ZDBS-LY-DQC016)

Distributed Kriging interpolation algorithm optimization for  large region carbon satellite data

ZHOU Xiao-hua1,2,WANG Xue-zhi1,2,ZHOU Yuan-chun1,2,MENG Zhen1,2   

  1.  (1.Computer Network Information Center,Chinese Academy of Sciences,Beijing 100083;
    2.University of Chinese Academy of Sciences,Beijing 100049,China)
  • Received:2022-12-05 Revised:2023-02-05 Accepted:2023-11-25 Online:2023-11-25 Published:2023-11-16

摘要: 针对基于原生Kriging算法在大区域尺度上对碳卫星数据进行插值时存在的计算耗时长、并行加速难等问题,对Kriging插值算法进行调整,优化其中的关键计算环节,并根据数据特征与时序依赖关系对插值过程进行拆分重组,细化插值粒度,将其构建为可在分布式环境下并行执行的DAG结构工作流,最后基于一套双层架构的DAG任务调度引擎实现整个插值工作流在分布式环境下的并行加速。实验结果表明,在不同区域尺度上,以上方法框架均具有较高的插值效率,与Spark相比,在大区域尺度上具有明显的速度优势。

关键词: 碳卫星, 分布式插值, Kriging算法加速, 工作流调度

Abstract: To address the issues of long computation time and difficulty in parallel acceleration when using the original Kriging algorithm for interpolation of carbon satellite data at a large regional scale, the Kriging algorithm and its key parts are restructured and optimized. The whole interpolation process is broken up into several fine-grained operations and then organized into a distributed DAG workflow based on dependency relationship and data features. Finally, a distributed computing framework based on the double-tier scheduling structure is designed to accelerate the interpolation workflow on the distributed computing cluster. Experiments show that methods and framework described above can perform Kriging interpolation of different regional scales with high efficiency, and the efficiency advantages are more significantly than Spark at the large regional scale.

Key words: carbon satellite, distributed interpolation, Kriging algorithm acceleration, workflow scheduling