• 中国计算机学会会刊
  • 中国科技核心期刊
  • 中文核心期刊

Computer Engineering & Science

    Next Articles

Modeling and evaluating Intel IMCI  vgather instruction using stencilsJames             

Lin1,2,WANG Yi chao1,QIN Qiang1,LI Shuo3,WEN Min hua1,Satoshi Matsuoka2   

  1. (1.Center for High Performance Computing,Shanghai Jiao Tong University,Shanghai 200240,China;
    2.Tokyo Institute of Technology,Tokyo 1528550,Japan;
    3.Intel Corporation,Portland OR97124,USA)
  • Received:2015-12-11 Revised:2016-03-21 Online:2016-09-25 Published:2016-09-25

Abstract:

Vgather is a hardwareimplemented vector instruction introduced by Intel Initial ManyCore Instructions (IMCI) for Xeon Phi. Its target is to help SIMD registers access data from noncontiguous memory locations. However, experimental results show that it can also be one of the key performance bottlenecks on Xeon Phi. We model the performance of Vgather by using the profiling tool PAPI to directly collect the results of hardware performance counters. Address Generation Interlock (AGI) events are profiled as the number of Vgather and the average latency of Vgather are estimated with VPU_DATA_READ events based on which we model the total latencies of Vgather instructions. 3D7P stencils are used to evaluate our model and the results show that Vgather spents nearly 40% of total kernel time. We implement a Vgatherfree version with intrinsic instruction to validate this model. Our contribution includes modeling Intel IMCI vgather instruction with hardware counters and validating it by stencils. The model can also be applicable on CPUs.

Key words: performance modeling, vgather, Xeon Phi, hardware performance counters