• 中国计算机学会会刊
  • 中国科技核心期刊
  • 中文核心期刊

计算机工程与科学

• 论文 • 上一篇    下一篇

面向Storm的数据流编程模型与编译优化方法研究

杨秋吉,于俊清,莫斌生,何云峰   

  1. (华中科技大学网络与计算中心,湖北 武汉 430074)
  • 收稿日期:2016-08-15 修回日期:2016-10-09 出版日期:2016-12-25 发布日期:2016-12-25
  • 基金资助:

    国家重点研发计划(2016YFB1000204);国家自然科学基金(61572211)

A data flow programming model and
compiler optimization for Storm
 

YANG Qiuji,YU Junqing,MO Binsheng,HE Yunfeng   

  1. (Center of Network and Computation,Huazhong University of Science and Technology,Wuhan 430074,China)
  • Received:2016-08-15 Revised:2016-10-09 Online:2016-12-25 Published:2016-12-25

摘要:

数据流编程模型将程序的计算与通信分离,暴露了应用程序潜在的并行性并简化了编程难度。分布式计算框架利用廉价PC构建多核集群解决了大规模并行计算问题,但多核集群层次性存储结构和处理单元对数据流程序的性能提出了新的挑战。针对数据流程序在分布式架构下所面临的问题,设计并实现了数据流编程模型和分布式计算框架的结合——在COStream的基础上提出了面向Storm的编译优化框架。框架包括两个模块:面向Storm的层次性任务划分与调度,以及面向Storm的层次性软件流水与代码生成。层次性任务划分利用Storm的任务调度机制将程序所有子任务分配到Storm集群节点内的多核上。层次性软件流水与代码生成将子任务构造成集群节点间的软件流水和节点内多核间的软件流水,并生成相应的目标代码。实验以多核集群为目标平台,在集群上搭建Storm分布式架构,选取数字媒体处理领域典型程序作为测试程序,对面向Storm的编译优化后的程序进行实验分析。实验结果表明了结合方法的有效性。

关键词: 多核集群, 数据流编程, 编译, 流水线, COStream

Abstract:

As a domain specific programming model, data flow programming combines the features of media applications and programming languages and offers an attractive way to express the parallelism. However, the hierarchical storage structure of the multicore cluster architecture incurs new challenges to the performance of data flow applications. Besides, the programmability remains a significant challenge for the compiler. Aiming at the problems the data flow programming model facing in processing the big data of digital media field, we design and implement an integration of a data flow programming model and a distributed computing framework, and propose a compiler optimization framework for Storm based on COStream. The compiler optimization method for Storm includes two steps: hierarchical task partition and scheduling for Storm, and pipeline scheduler and code generation for Storm. The hierarchical task partition and scheduling is used to assign the tasks to the multicore cluster nodes within the cluster, which can ensure a workload balance between multiple cores with small inter cluster communication overhead. The pipeline scheduler and code generation are used to build software pipelines between cluster nodes and between cores in a node, and generate the corresponding object code. We conduct experiments on a multicore cluster as the target platform, build the Storm distributed architecture in the cluster, choose typical digital media processing program as the benchmarks, evaluate and analyze the optimization performance for Storm. Experimental results verify the effectiveness of the proposed model.

Key words: muti-core cluster, data flow programming, compiler, pipeline, COStream