Parallel computation and performance optimization 
of STREAM on FT1000 processors

J4 ›› 2014, Vol. 36 ›› Issue (12): 2267-2271.

• 论文 • Previous Articles Next Articles

Parallel computation and performance optimization
of STREAM on FT1000 processors

CHI Lihua，HU Qingfeng，LIU Jie，GAN Xinbiao，JIANG Jie，YAN Yihui

(National Key Laboratory of Parallel and Distributed Processing,
National University of Defense Technology,Changsha 410073,China)

Received:2013-12-10 Revised:2014-02-21 Online:2014-12-25 Published:2014-12-25

Abstract

Abstract:

STREAM benchmark measures the memory bandwidth of microprocessors.It is a challenge to get high performance of STREAM benchmark on the massively multithreaded FT1000 processors.Based on the hierarchical cache,the instruction pipelines of four routines of STREAM are optimized.Then,a multilevel loop unrolling method is proposed according to the number of registers,the prefetched data sizes are determined by the instruction delay and the cache line size,and the optimized subroutines are written in assembly language.Under the OpenMP parallel computing environment, the parallel codes for STREAM benchmark are given with the local data optimized methods.The test results of STREAM codes with performance optimization show that the performance increases by 19.2~64.2% for sequential computation.The highest memory bandwidth of the parallel optimized codes is 8.5GB/s. In comparison to the original parallel codes,the performances of the parallel optimized codes is improved by 22.7% .

Key words: multithreaded processor；STREAM benchmark；performance optimization

CHI Lihua，HU Qingfeng，LIU Jie，GAN Xinbiao，JIANG Jie，YAN Yihui. Parallel computation and performance optimization
of STREAM on FT1000 processors [J]. J4, 2014, 36(12): 2267-2271.

Parallel computation and performance optimization
of STREAM on FT1000 processors

PDF

Knowledge

Abstract

Cite this article

share this article

Related Articles 0

Recommended Articles

Metrics

Comments

Parallel computation and performance optimization of STREAM on FT1000 processors

PDF

Knowledge

Abstract

Cite this article

share this article

Related Articles 0

Recommended Articles

Metrics

Comments

Parallel computation and performance optimization
of STREAM on FT1000 processors