• 中国计算机学会会刊
  • 中国科技核心期刊
  • 中文核心期刊

计算机工程与科学

• 论文 • 上一篇    下一篇

基于负载均衡的多源流数据实时存储系统

郭会云1,2,房俊1,2,李冬1,2   

  1. (1.大规模流数据集成与分析技术北京市重点实验室,北京 100144;
    2.北方工业大学数据工程研究院,北京 100144)
  • 收稿日期:2016-12-24 修回日期:2017-02-04 出版日期:2017-04-25 发布日期:2017-04-25
  • 基金资助:

    北京市自然科学基金(4131001)

A multi-source streaming data real-time
storage system based on load balance

GUO Hui-yun1,2,FANG Jun1,2,LI Dong1,2   

  1. (1.Beijing Key Laboratory on Integration and Analysis of Largescale Stream Data,Beijing 100144;
    2.Institute of Data Engineering,North China University of Technology,Beijing 100144,China)

     
  • Received:2016-12-24 Revised:2017-02-04 Online:2017-04-25 Published:2017-04-25

摘要:

物联网感知流数据多以时序数据为主,具有数据量大、连续到达、多来源等特点。现有的基于HBase的交通流数据存储系统在数据写入并发量大时,仍然存在存储效率低与系统可用性不高的问题。针对该问题,设计并实现了基于负载均衡的多源流数据实时存储系统。该系统将数据代理扩展为集群架构,提出了一种基于负载均衡的任务调度算法,实现了任务与数据代理之间的按序匹配,使数据代理集群负载均衡地处理任务,实现数据并行存储到HBase数据库中。实验对比结果表明:该系统使各数据代理的数据分配比例维持在0.3~0.4,同时以约1.5倍于单数据代理的速度将数据写入HBase数据库。

 

关键词: 多源流数据, HBase, 实时存储系统, 数据代理, 负载均衡, 任务调度

Abstract:

The perceptual streaming data of the Internet of things is mainly centered on timeseries data, and has the characteristics of a large amount of data, continuous arrival, and multiple sources and so on. When data is written in a large amount of concurrency, the existing traffic streaming data storage system based on HBase still has the problems of storage efficiency and system availability. To solve the problems, we design and implement a multisource streaming data realtime storage system based on load balance. The system expands the data proxy into a cluster architecture, presents a task scheduling algorithm based on load balance, and achieves the sequence matching between tasks and data proxy servers, thus making the data proxy cluster processing tasks in a balanced manner and achieving data storage in parallel in the HBase database. Experimental results show that the system maintains the data distribution ratio of each data agent between 0.3 and 0.4, and writes data to the HBase database at about 1.5 times the speed of the single data proxy.
 

Key words: multi-source streaming data, HBase, realtime storage system, data proxy, load balance, task scheduling