• 中国计算机学会会刊
  • 中国科技核心期刊
  • 中文核心期刊

Computer Engineering & Science

Previous Articles     Next Articles

A multi-source streaming data real-time
storage system based on load balance

GUO Hui-yun1,2,FANG Jun1,2,LI Dong1,2   

  1. (1.Beijing Key Laboratory on Integration and Analysis of Largescale Stream Data,Beijing 100144;
    2.Institute of Data Engineering,North China University of Technology,Beijing 100144,China)

     
  • Received:2016-12-24 Revised:2017-02-04 Online:2017-04-25 Published:2017-04-25

Abstract:

The perceptual streaming data of the Internet of things is mainly centered on timeseries data, and has the characteristics of a large amount of data, continuous arrival, and multiple sources and so on. When data is written in a large amount of concurrency, the existing traffic streaming data storage system based on HBase still has the problems of storage efficiency and system availability. To solve the problems, we design and implement a multisource streaming data realtime storage system based on load balance. The system expands the data proxy into a cluster architecture, presents a task scheduling algorithm based on load balance, and achieves the sequence matching between tasks and data proxy servers, thus making the data proxy cluster processing tasks in a balanced manner and achieving data storage in parallel in the HBase database. Experimental results show that the system maintains the data distribution ratio of each data agent between 0.3 and 0.4, and writes data to the HBase database at about 1.5 times the speed of the single data proxy.
 

Key words: multi-source streaming data, HBase, realtime storage system, data proxy, load balance, task scheduling