• 中国计算机学会会刊
  • 中国科技核心期刊
  • 中文核心期刊

Computer Engineering & Science ›› 2013, Vol. 35 ›› Issue (10): 25-35.

Previous Articles     Next Articles

Reviewing the big data solution based on Hadoop ecosystem    

CHEN Ji rong,LE Jia jin   

  1. (School of Computer Science and Technology,Donghua University,Shanghai 201620,China)
  • Received:2013-02-25 Revised:2013-05-29 Online:2013-10-25 Published:2013-10-25

Abstract:

Solving big data must deal with three crucial problems: big data storage, big data analysis and big data management. Firstly, the definitions of big data and Hadoop ecosystem are summarized respectively. Secondly, how to face big data is discussed from the two aspects of commercial products and Hadoop ecosystem. The paper focuses on reviewing the big data solution based on Hadoop ecosystem:(1) HDFS, HBase and OpenTSDB are used to deal with storage problems;(2) Hadoop MapReduce(Hive) and HadoopDB do analytical problems; and (3) Sqoop and Ganglia solve management problems. For each partner, its architecture, principles and features are analyzed. And for some defects or problems existing in some key partners, we propose some solutions, ideas and viewpoints based on our research progress. It is predicted that Hadoop ecosystem is the preferable solution for the small and mediumsized enterprises.

Key words: big data, Hadoop ecosystem, MapReduce, HDFS, columnoriented database