• 中国计算机学会会刊
  • 中国科技核心期刊
  • 中文核心期刊

计算机工程与科学 ›› 2022, Vol. 44 ›› Issue (04): 584-593.

• 高性能计算 • 上一篇    下一篇

面向大数据存储的主动与被动相结合的性能评测方法体系结构与实现

刘世缘1,李云春1,2,陈晨2,杨海龙1   

  1. (1.北京航空航天大学计算机学院,北京 100191;2.北京航空航天大学网络空间安全学院,北京 100191)


  • 收稿日期:2021-09-14 修回日期:2021-12-07 接受日期:2022-04-25 出版日期:2022-04-25 发布日期:2022-04-20
  • 基金资助:
    国家重点研发计划(2016YFB1000304);国家自然科学基金(62072018)

Architecture combining active & passive performance evaluation  methods and its implementation for big data storage

LIU Shi-yuan1,LI Yun-chun1,2,CHEN Chen2,YANG Hai-long1   

  1. (1.School of Computer Science and Engineering,Beihang University,Beijing 100191;
    2.School of Cyber Science and Technology,Beihang University,Beijing 100191,China)
  • Received:2021-09-14 Revised:2021-12-07 Accepted:2022-04-25 Online:2022-04-25 Published:2022-04-20

摘要: 随着数据量的日益增加,大数据存储在整个大数据应用框架体系中居于重要地位。对大数据存储系统进行性能评测可以指导大数据应用开发人员分析性能瓶颈,进行大数据系统的性能优化。在以往的工作中,通常使用基准测试的方式来对不同大数据框架进行性能评测,或者采用插桩并分析轨迹文件的方式对分布式文件系统进行性能分析。这2种方法采用的分析角度不同,并没有形成合理的评测体系来评价大数据分布式存储系统。本文提出主动与被动相结合的大数据存储系统性能评测方法体系结构及其具体实现。在主动性能评测方法方面,提供了6个领域,超过20个应用的基准测试程序,对大数据存储系统主动发起性能测试,分析大数据存储系统的基准性能指标;在被动性能评测方法方面,提供了对低效任务、低效算子、低效函数的分析及定位方法,通过分析运行在大数据存储系统之上的大数据应用,分析大数据应用程序低效的原因。通过实验表明,该大数据性能评测方法体系结构能够全面地对大数据存储系统进行性能评测。

关键词: 大数据存储, 性能评测方法, 基准测试程序, 低效行为分析

Abstract: Big data storage plays an important role in the whole big data application framework system in view of the increasing amount of data. Performance evaluation for big data storage can guide big data application developers to analyze performance bottlenecks and optimize the performance of big data systems. In the past, benchmarking was usually used to evaluate the performance of different big data frameworks, or the performance analysis of distributed file systems was carried out by piling and analyzing track files. These two methods adopt different analytical perspectives, but there has not been a reasonable evaluation system to evaluate the distributed storage system of big data. This paper proposes the architecture and specific implementation of the big data storage performance evaluation method combining active and passive methods. In the active evaluation method, this paper provides benchmark test programs of more than 20 applications in 6 fields to initiate performance tests on big data storage systems, and analyzes the benchmark performance indicators of big data storage systems. In the passive performance evaluation method, this paper provides analysis and positioning methods for inefficient tasks, inefficient operators and inefficient functions. By analyzing the big data applications running on the big data storage systems, we can find out the reasons for the inefficiency of big data applications. Experiments show that the architecture of the proposed big data performance evaluation method can comprehensively evaluate the performance of big data storage.

Key words: big data storage, performance evaluation method, benchmark suite, inefficient behavior analysis