• 中国计算机学会会刊
  • 中国科技核心期刊
  • 中文核心期刊

计算机工程与科学 ›› 2025, Vol. 47 ›› Issue (4): 706-717.

• 人工智能与数据挖掘 • 上一篇    下一篇

BigFlow:科学数据跨中心协同分析服务系统

朱小杰1,2,程振京1,王华进1,杨刚1,田尧1,樊东卫3,米琳莹3,梁兆基1,2


  

  1. (1.中国科学院计算机网络信息中心,北京 100083;2.中国科学院大学,北京 100049;
    3.中国科学院国家天文台,北京 100101)

  • 收稿日期:2024-07-04 修回日期:2024-08-23 出版日期:2025-04-25 发布日期:2025-04-17
  • 基金资助:
    国家重点研发计划(2021YFF0703900);中国科学院“十四五”网信专项工程建设项目“科学大数据工程(三期)”(CAS-WX2022GC-02);国家自然科学基金(12273077)

BigFlow: A service system for cross-center collaborative analysis of scientific data

ZHU Xiaojie1,2,CHENG Zhenjing1,WANG Huajin1,YANG Gang1,TIAN Yao1,FAN Dongwei3,MI Linying3,LIANG Zhaoji1,2   

  1. (1.Computer Network Information Center,Chinese Academy of Sciences,Beijing 100083;
    2.University of Chinese Academy of Sciences,Beijing 100049;
    3.National Astronomical Observatories,Chinese Academy of Sciences,Beijing 100101,China)
  • Received:2024-07-04 Revised:2024-08-23 Online:2025-04-25 Published:2025-04-17

摘要: 大数据技术与科学数据的融合催生了诸多科学研究的新范式,也带来了对科学数据进行跨中心协同分析的广泛需求。科学数据跨中心协同分析面临跨中心数据流转不畅、跨框架异构计算困难和跨中心作业调度效率不高等技术挑战,同时要确保分析过程的可信性。为应对这些技术挑战,研制了科学数据跨中心协同分析服务系统BigFlow,该系统采用跨中心分布式架构,配备跨框架工作流执行引擎,实现了工作流跨域的可信执行。基于大规模天文星表交叉证认及黄河流域淤地坝位置识别等应用场景,对系统的跨中心协同分析能力进行了测试与验证。

关键词: 融合分析, 跨中心协同分析, 跨框架工作流, 可信分析

Abstract: The integration of big data technology and scientific data has spawned numerous new paradigms for scientific research and brought about a widespread need for cross-center collaborative analysis of scientific data. However, such analysis faces significant technical challenges, including inefficient cross-center data transfer, difficulties in cross-framework heterogeneous computing, and low efficiency in cross-center job scheduling, while also requiring trustworthiness throughout the analysis process. To address these technological challenges, a scientific data cross-center collaborative analysis service system called BigFlow has been developed.The systems cross-center collaborative analysis capabilities have been tested and validated based on scenarios such as large-scale astronomical star catalog cross-matching and the identification of check dam locations in the Yellow River basin.

Key words: integrated analysis, cross-center collaborative analysis, cross-framework workflow, trustworthy analysis