基于GPU/CPU混合架构的流程序多粒度划分与调度方法研究

计算机工程与科学

基于GPU/CPU混合架构的流程序多粒度划分与调度方法研究

陈文斌，杨瑞瑞，于俊清

（华中科技大学计算机科学与技术学院，湖北武汉 430074）

收稿日期:2016-09-05 修回日期:2016-11-01 出版日期:2017-01-25 发布日期:2017-01-25
基金资助:
国家重点研发计划（2016YFB1000204）;国家自然科学基金（61572211）

ultigranularity partition and scheduling for stream

programs based on multiCPU and multiGPU

heterogeneous architectures

CHEN Wenbin,YANG Ruirui,YU Junqing

(School of Computer Science and Technology,Huazhong University of Science and Technology,Wuhan 430074,China)

Received:2016-09-05 Revised:2016-11-01 Online:2017-01-25 Published:2017-01-25

摘要/Abstract

摘要：

数据流编程语言简化了相关领域的编程，很好地把任务计算和数据通信分开，从而使应用程序分别在任务级和数据级均具有可并行性。针对GPU/CPU混合架构中存在的大量数据并行、任务并行和流水线并行等问题，提出并实现了面向GPU/CPU混合架构的数据流程序任务划分方法和多粒度调度策略，包括任务的分类处理、GPU端任务的水平分裂和CPU端离散任务的均衡化，构造了软件流水调度，经过编译优化生成OpenCL的目标代码。任务的分类处理根据数据流程序各个任务的计算特点和任务间的通信量大小，将各任务分配到合适的计算平台上；GPU端任务的水平分裂利用GPU端任务的并行性将其均衡分裂到各个GPU，以避免GPU间高额的通信开销影响程序整体的执行性能；CPU端离散任务的均衡化通过选择合适CPU核，将CPU端各任务均衡分配给各CPU核，以保证负载均衡并提高各CPU核的利用率。实验以多块NVIDIA Tesla C2050、多核CPU为混合架构平台，选取多媒体领域典型的算法作为测试程序，实验结果表明了划分方法和调度策略的有效性。

关键词: 混合架构, 数据流编程, 任务划分, 存储优化

Abstract:

Dataflow programming language simplifies the domain programming and offers an attractive way to express the parallelism of mission computing and data communication on task level and data level. For the problems such as too much data parallelism, task parallelism and pipeline parallelism in multiCPU and multiGPU architectures, we propose an efficient data flow compilation framework. The framework takes the synchronous data flow graph as the beginning input, and uses many partition methods to distribute the tasks to multiCPU and multiGPU. According to the parallelism of tasks and communication, the tasks classification method assigns the tasks to GPU or CPU. We propose a GPU task horizontal splitting method to divide the tasks distributed to GPU into many blocks, and one GPU executes one block. The GPU task horizontal splitting method avoids the communication between GPU and GPU. The CPU dispersed task balancing partition method chooses appropriate CPU cores and balances the tasks distributed to these CPU cores. The method satisfies load balancing and raises the utilization rate of CPU cores. We choose a multiCPU and multiGPU heterogeneous architecture as the experiment platform and the common algorithms in media processing applications as benchmarks. Our experiments verify the effectiveness of the proposed methods.

Key words: heterogeneous architecture, dataflow program, task partition, storage optimization

陈文斌，杨瑞瑞，于俊清. 基于GPU/CPU混合架构的流程序多粒度划分与调度方法研究[J]. 计算机工程与科学.

CHEN Wenbin,YANG Ruirui,YU Junqing.

ultigranularity partition and scheduling for stream

programs based on multiCPU and multiGPU

heterogeneous architectures

[J]. Computer Engineering & Science.

[1]	杨秋吉，于俊清，莫斌生，何云峰. 面向Storm的数据流编程模型与编译优化方法研究[J]. 计算机工程与科学, 2016, 38(12): 2409-2418.
[2]	白燕1,2，任庆昌2. 智能建筑中央空调监控系统存储优化策略研究[J]. J4, 2014, 36(03): 558-565.
[3]	陈军李晓梅. 起节点上并行算法设计策略的研究[J]. J4, 2000, 22(4): 43-45.