|
[1]
|
|
Lee V,Hammarlund P,Singhal R,et al.Debunking the 100x GPU vs.CPU myth: An evaluation of throughput computing on CPU and GPU[J]. ACM Sigarch Computer Architecture News, 2010,38(3): 451460.
|
[2] |
Fraire J A,Ferreyra A,Marques C.OpenCL overview,implementation,and performance comparison[J].IEEE Latin America Transactions,2013,11(1): 274280.
|
[3] |
Alvarez P L,Yamagiwa S.Invitation to OpenCL[C]∥Proc of 2011 2nd International Conference on Networking and Computing,2011:816.
|
[4] |
Grossman M, Sbirlea A S,Budimlic Z,et al.CnCCUDA:Declarative programming for GPUs[M]∥Cooper K,MellorCrummey J,Sarkar V, ed.Languages and Compilers for Parallel Computing.Houston: Springer Berlin Heidelberg,2011:230245.
|
[5] |
Han T D, Abdelrahman T S. hiCUDA: Highlevel GPGPU programming[J].IEEE Transactions on Parallel and Distributed Systems,2011,22(1): 7890.
|
[6] |
Wei Haitao,Qin Mingkang,Zhang Weiwei,et al.StreamTMC: Stream compilation for tiled multicore architectures[J].Journal of Parallel and Distributed Computing,2013,73(4): 484494.
|
[7] |
Mu S,Li D D,Chen Y B,et al.Exploiting the taskpipelined parallelism of stream programs on manycore GPUs[J].
|
|
IEICE Transactions on Information and Systems,2013,E96D(10): 21942207.
|
[8] |
Schneider S,Hirzel M,Gedik B,et al.Safe data parallelism for general streaming[J].IEEE Transactions on Computers,2015,64(2): 504517.
|
[9] |
Wei Haitao,Yu Junqing,Yu Huafei,et al.Minimizing communication in rateoptimal software pipelining for stream programs[C]∥Proc of the 8th International Symposium on Code Generation and Optimization,2010:210217.
|
[10] |
Beaumont O,Legrand A,Robert Y.The masterslave paradigm with heterogeneous processors[J].IEEE Transactions on Parallel and Distributed Systems,2003,14(9): 897908.
|
[11] |
Markatos E P,LeBlanc T J.Using processor affinity in loop scheduling on sharedmemory multiprocessors[J].IEEE Transactions on Parallel and Distributed Systems,1994,5(4): 379400.
|
[12] |
Batcher K W,Walker R A.Dynamic roundrobin task scheduling to reduce cache misses for embedded systems[C]∥Proc of Design Automation and Test in Europe,2008:260263.
|
[13] |
METIS[EB/OL].[20130721].http://glaros.dtc.umn.edu/gkhome/metis/metis/overview.
|
[14] |
Huynh H P,Hagiescu A,Wong W F,et al.Scalable framework for mapping streaming applications onto multiGPU systems[J].
|
|
ACM Sigplan Notices,2012,47(8):110.
|
[15] |
Zhang Weiwei,Wei Haitao,Yu Junqing,et al.COStream: A language for dataflow application and compiler[J].Chinese Journal of Computers,2013,36(10): 1993 2006.(in Chinese)
|
[16] |
Wei Haitao,Yu Junqing,Yu Huafei,et al.Software pipelining for stream programs on resource constrained multicore architectures[J].IEEE Transactions on Parallel and Distributed Systems,2012,23(12): 23382349.
|
[17] |
Thiele L,Bacivarov I,Haid W,et al.Mapping applications to tiled multiprocessor embedded systems[C]∥Proc of the 7th International Conference on Application of Concurrency to System Design,2007:2940.
|
[18] |
Yu Junqing, Zhang Weiwei, Chen Wenbin, et al. Multilevel pipelining parallelism for dataflow programs on multicore cluster[J]. Chinese Journal of Computers, 2014,37(10):20712083.(in Chinese)
|
|
附中文参考文献:
|
[15] |
张维维,魏海涛,于俊清,等.COStream: 一种面向数据流的编程语言和编译器实现[J].计算机学报,2013,36(10): 1993 2006.
|
[18] |
于俊清,张维维,陈文斌,等.面向多核集群的数据流程序层次流水线并行优化方法[J].计算机学报,2014,37(10): 20712083.
|