• 中国计算机学会会刊
  • 中国科技核心期刊
  • 中文核心期刊

Computer Engineering & Science ›› 2022, Vol. 44 ›› Issue (04): 584-593.

• High Performance Computing • Previous Articles     Next Articles

Architecture combining active & passive performance evaluation  methods and its implementation for big data storage

LIU Shi-yuan1,LI Yun-chun1,2,CHEN Chen2,YANG Hai-long1   

  1. (1.School of Computer Science and Engineering,Beihang University,Beijing 100191;
    2.School of Cyber Science and Technology,Beihang University,Beijing 100191,China)
  • Received:2021-09-14 Revised:2021-12-07 Accepted:2022-04-25 Online:2022-04-25 Published:2022-04-20

Abstract: Big data storage plays an important role in the whole big data application framework system in view of the increasing amount of data. Performance evaluation for big data storage can guide big data application developers to analyze performance bottlenecks and optimize the performance of big data systems. In the past, benchmarking was usually used to evaluate the performance of different big data frameworks, or the performance analysis of distributed file systems was carried out by piling and analyzing track files. These two methods adopt different analytical perspectives, but there has not been a reasonable evaluation system to evaluate the distributed storage system of big data. This paper proposes the architecture and specific implementation of the big data storage performance evaluation method combining active and passive methods. In the active evaluation method, this paper provides benchmark test programs of more than 20 applications in 6 fields to initiate performance tests on big data storage systems, and analyzes the benchmark performance indicators of big data storage systems. In the passive performance evaluation method, this paper provides analysis and positioning methods for inefficient tasks, inefficient operators and inefficient functions. By analyzing the big data applications running on the big data storage systems, we can find out the reasons for the inefficiency of big data applications. Experiments show that the architecture of the proposed big data performance evaluation method can comprehensively evaluate the performance of big data storage.

Key words: big data storage, performance evaluation method, benchmark suite, inefficient behavior analysis