• 中国计算机学会会刊
  • 中国科技核心期刊
  • 中文核心期刊

J4 ›› 2014, Vol. 36 ›› Issue (10): 1846-1853.

• 论文 • 上一篇    下一篇

流式处理系统的动态数据分配技术

王成章1,林学练1,谭静芳2   

  1. (1.北京航空航天大学计算机学院,北京 100191;2.泰山学院物理与电子工程学院,山东 泰安 271021)
  • 收稿日期:2014-06-11 修回日期:2014-08-24 出版日期:2014-10-25 发布日期:2014-10-25
  • 基金资助:

    国家973计划资助项目(2014CB340300)

Dynamic data distribution for stream processing system       

WANG Chengzhang1,LIN Xuelian1,TAN Jingfang2   

  1. (1.School of Computer Science and Engineering,Beihang University,Beijing 100191;
    2.School of Physics and Electronic Engineering,Taishan University,Taian 271021,China)
  • Received:2014-06-11 Revised:2014-08-24 Online:2014-10-25 Published:2014-10-25

摘要:

流式数据处理中,数据倾斜等原因易导致计算节点的负载不均衡,降低系统处理能力。传统的负载均衡方法,比如算子分配、算子迁移和负载脱落等技术因为相对较高的性能代价,在流式处理系统中没有得到广泛的应用。针对流式处理系统的特点,提出一种新的负载均衡方法。在该方法中,计算单元的数据被划分为若干分区,并且数据分区可以在计算单元中动态分配和迁移,在较少干扰系统运行的情况下,通过动态调整各计算单元的分区,平衡各个计算单元的输入流和利用率,以此达到负载平衡的目的。在此基础上,设计并实现了流式处理系统的负载均衡算法和数据在线迁移技术。实验结果表明,该方法能够显著减少数据处理的平均延迟,提高系统吞吐量。

关键词: 数据流, 流式处理, 负载均衡, 数据分配, 数据迁移

Abstract:

In stream processing systems,data skew often leads to load imbalance among computing nodes,thereby increases the response time of data process.Traditional load balancing methods such as operator distribution,operator migration and load shedding have never been widely applied in stream processing systems because of a relatively high performance penalty.Considering the characteristics of stream processing systems, a new load balancing mechanism is proposed. In this mechanism, the data on computing units are split into some sections,and each section can be allocated and migrated dynamically among computing units.Then,for the purpose of load balancing, the input streams and utilizations are balanced among computing units by adjusting sections with few disturbances on steam processing systems. Based on this,we design and implement a load balancing algorithm as well as an online data migration method.The experimental results show that our mechanism can reduce the average latency of data processing and improve the system throughput significantly.

Key words: data stream;stream processing;load balancing;data distribution;data migration