[1] |
http://www.nscctj.gov.cn/resources/resource_1.asp.
|
[2] |
McCalpin J D.Stream:Sustainable memory bandwidth in high performance computers[EB/OL].[20130516].http://www.cs.virginia.edu/stream/.
|
[3] |
Gong Chunye,Liu Jie,Chi Lihua, et al. GPU accelerated simulations of 3D deterministic particle transport using discrete ordinates method [J].Journal of Computational Physics,2011,230(15):60106022.
|
[4] |
Petrini F, Fossum G, Fernandez J,et al.Multicore surprise lessons learned from optimizing sweep3D on the cell broadband engine [C]∥Proc of International Parallel and Distributed Processing Symposim,2007:110.
|
[5] |
Gan Xinbiao,Wang Zhiying, Shen Li,et al.abStream:A framework for programming manycore[J].Electrical Review,2012,88(7b):341344.
|
[6] |
Molka D,Hackenberg D,Schone R, et al.Memory performance and cache coherency effects on an Intel Nehalem multiprocessor system[C]∥Proc of the 18th International Conference on Parallel Architectures and Compilation Techniques,2009:261270.
|
[7] |
Preeti R P,Hiroshi N.Augmenting loop tiling with data alignment for improved cache performance[J]. IEEE Transactions on Computers,1999,48(2):142149.
|
[8] |
Fraboulet A,Kodary K,Mignotte A.Loop fusion for memory space optimization[C]∥Proc of IEEE International Symposium on System Synthesis,2001:95100.
|
[9] |
Alvin R,Chatterjee L S,Praveen K,et al.Recursive array layouts and fast matrix multiplication [J].IEEE Transactions on Parallel and Distributed Systems,2002,13(11):11051123.
|
[10] |
Pike G,Hilnger P N.Better tiling and array contraction for compiling scientic programs[C]∥Proc of the IEEE/ACM Conference on Supercomputing,2002:112.
|
[11] |
Liu Jie,Chi Lihua,Gong Chunye,et al.Highperformance matrix multiply on a massively multithreaded fiteng1000 processor[C]∥Proc of the 12th International Conference on Algorithms and Architectures for Parallel Processing,2012:166176.
|
[12] |
Wonnacott D.Using time skewing to eliminate idle time due to memory bandwidth and network limitations [C]∥Proc of International Parallel and Distributed Processing Symposim,2000:171180.
|