• 中国计算机学会会刊
  • 中国科技核心期刊
  • 中文核心期刊

计算机工程与科学

• 高性能计算 • 上一篇    下一篇

并行科学计算应用中采样数据的聚集I/O

曹立强,罗红兵   

  1. (北京应用物理与计算数学研究所,北京 100088)
  • 收稿日期:2017-06-14 修回日期:2017-10-17 出版日期:2018-09-25 发布日期:2018-09-25
  • 基金资助:

    国家自然科学基金重大研究计划重点项目(9143028);国家自然科学基金面上项目(11372049)

An aggregated I/O method of sampled data
for parallel computing applications

CAO Liqiang,LUO Hongbing   

  1. (Institute of Applied Physics and Computational Mathematics,Beijing 100088,China)
  • Received:2017-06-14 Revised:2017-10-17 Online:2018-09-25 Published:2018-09-25

摘要:

采样数据的并行I/O制约一些并行应用的运行效率。设计、实现了采样数据的聚集并行I/O方法。该方法在客户端部署采样数据缓存,然后合并数据到输出进程,再存储到文件。为了保障并行程序长时间运行过程中采样数据的存储一致性,该方法在JASMIN框架中监测应用程序的运行状态,当并行程序发生负载平衡或者重启动时刷新或者恢复数据。I/O过程中,进一步使用HDF5的分块I/O提高列存储数据的读写效率。测试表明,新方法不仅具有较好的可扩展性,还能在具有负载平衡与重启动等复杂功能的并行应用中提高采样数据的并行 I/O 效率7.5倍以上。

关键词: 科学计算, 采样数据, 并行I/O, 性能优化, 聚合缓存

Abstract:

Parallel I/O of sampled data constrains the operational efficiency of some parallel applications. We design and implement a parallel aggregation I/O method of sampled data. The method first uses the sampled data cache deployed on the client to reduce the number of I/O, and then collects the data to the output process by aggregating the traffic and stores it in the file. To guarantee the storage consistency of sampled data during the longrunning process of parallel programs, we monitor the running state of the application in the JASMIN framework and refresh or restore the data when parallel programs load or restart. During the output process, we use HDF5's chunk I/O to improve I/O efficiency. Test results show that the new method not only has good scalability, but also improves the parallel IO efficiency of sampled data by more than 7.5 times in parallel applications with complex functions such as load balancing or restart.

Key words: scientific computing, sampled data, parallel I/O, performance optimization, aggregated buffer