• 中国计算机学会会刊
  • 中国科技核心期刊
  • 中文核心期刊

J4 ›› 2016, Vol. 38 ›› Issue (01): 11-19.

• 论文 • Previous Articles     Next Articles

Parameter optimization for Spark jobs based
on runtime data analysis        

CHEN Qiaoan1,LI Feng1,CAO Yue1,LONG Mingsheng1,2   

  1. (1.School of Software,Tsinghua University,Beijing 100084;2.National Laboratory for Information Science and Technology (TNList),Tsinghua University,Beijing 100084,China)
  • Received:2015-10-02 Revised:2015-12-10 Online:2016-01-25 Published:2016-01-25

Abstract:

The fast growing runtime data is one of the most complicated and valuable data resources in big data systems. Based on runtime data, developers can analyze software quality and discover important information on software development model. As a distributed system, Spark generates a large amount of runtime data during running user applications. Those runtime data include log data, monitoring data and graph representation of jobs. Developers can optimize system parameters with the help of runtime data. However, there are different types of parameters in Spark and it is difficult to identify the effects of the parameters, which makes them hard to tune. In this paper we propose the concept of runtime data historical database and a parameters optimization model based on searching the database. Experimental results validate that the proposed optimization model achieves good performance on the recommendation of system parameters.

Key words: big data;runtime data;data analysis;parameters optimization;Spark