J4 ›› 2016, Vol. 38 ›› Issue (01): 11-19.
• 论文 • Previous Articles Next Articles
CHEN Qiaoan1,LI Feng1,CAO Yue1,LONG Mingsheng1,2
Received:
Revised:
Online:
Published:
Abstract:
The fast growing runtime data is one of the most complicated and valuable data resources in big data systems. Based on runtime data, developers can analyze software quality and discover important information on software development model. As a distributed system, Spark generates a large amount of runtime data during running user applications. Those runtime data include log data, monitoring data and graph representation of jobs. Developers can optimize system parameters with the help of runtime data. However, there are different types of parameters in Spark and it is difficult to identify the effects of the parameters, which makes them hard to tune. In this paper we propose the concept of runtime data historical database and a parameters optimization model based on searching the database. Experimental results validate that the proposed optimization model achieves good performance on the recommendation of system parameters.
Key words: big data;runtime data;data analysis;parameters optimization;Spark
CHEN Qiaoan1,LI Feng1,CAO Yue1,LONG Mingsheng1,2. Parameter optimization for Spark jobs based on runtime data analysis [J]. J4, 2016, 38(01): 11-19.
0 / / Recommend
Add to citation manager EndNote|Ris|BibTeX
URL: http://joces.nudt.edu.cn/EN/
http://joces.nudt.edu.cn/EN/Y2016/V38/I01/11