Computer Engineering & Science

Sixty Years of Parallel Computing

YANG Xuejun

2012, 34(8): 1-10. doi:

Abstract ( 803 )

PDF (922KB) ( 1150 ) 　　

Parallel computing is the main technology to implement high performance computing. This paper reviews the history of parallel computing over the past 60 years, and reaffirms the fact that the measurement equations for parallel scalability have played an important role in the development of parallel computing. Based on our analysis of challenges in exascale computing in the future, new scalability measurement model for parallel computing has been built, in which factors affecting performance of exascale computing are considered, including memory access, communication, reliability and power consumption. Through quantitative analysis, it has been found that there are some scalability "walls" during the development when parallel computing advances to higher performance. Finally, in consideration of the conditions of our country, the author proposes some suggestions for the development of high performance computing in our country.

Review and Perspectives of 10 Years’ China HPC TOP100

ZHANG Yunquan1,YUAN Guoxing2,SUN Jiachang1,ZHANG Linbo3

2012, 34(8): 11-16. doi:

Abstract ( 570 )

PDF (1180KB) ( 686 ) 　　

After 10 years’ development, the China HPC TOP100 rank list has become the de facto industry standard on ranking the fastest supercomputers in mainland China. Prof. Yunquan Zhang, etc., from the Institute of Software, CAS, give a detailed analysis and review on the performance, manufacturer, and application area development of 10 years’ China HPC TOP100 rank list. They discuss the reasonability on continually adopting the Linpack as a benchmarking standard, the possibility of adding the success rate of Linpack benchmarking as the reliability metric of supercomputing systems, and the importance of the rules on applying and verifying the submitted benchmarking results of HPC TOP100.

Trend Study of Extreme Large Scale Parallel Computing Systems

LU Yutong

2012, 34(8): 17-23. doi:

Abstract ( 654 )

PDF (545KB) ( 654 ) 　　

High performance computing technology developed as an accelerating speed,after the emerging of the Petaflops systems,the performance of supercomputers already increased rapidly to tens Petaflops in very short term.The general prospect by international academe and industry is that the Exascale computing system will emerge around year 2018. This paper will analyses the trend of supercomputing development based on the 39th TOP500 list, and furthermore, discuss the future Exascale computing trend and related key issues,including power consumption, scalability, reliability and programmability, etc.

Research on the Theory and Practice of RealTime Energy Control Technology in Supercomputer Systems

JIN Shiyao1，ZHANG Dongsong1，WU Fei2

2012, 34(8): 24-31. doi:

Abstract ( 572 )

PDF (889KB) ( 589 ) 　　

This paper presents a solution to the systemlevel energysaving program in supercomputer systems, which utilizes the realtime tasks executing the probability of shared computer resources and the safety mechanism to switch on and off the node machines, to achieve load detection statistics and forecasting of supercomputer systems, as well as energysaving safety decisions. The preliminary experiments show that the proposed energy security decisionmaking method can achieve more energy savings, and the effect of energy saving is limited by the specific system model, overhead model and the results of load forecasting.

Quantum Computers: Algorithms and Physical Implementations

FANG Liang,LIU Rulin,TANG Zhensen,SUI Bingcai,CHI Yaqing

2012, 34(8): 32-43. doi:

Abstract ( 735 )

PDF (1125KB) ( 1144 ) 　　

Quantum algorithms and physical implementations are two basic problems in the research of quantum computers.First,we summarize the major progress in the relative areas,and discuss some representative examples of quantum algorithms,especially the one for solving systems of linear equations.The impact factors on proposing new quantum algorithms are also analyzed.Then,the DiVincenzo criteria are discussed, alone with some typical practical implementations and their performance comparison.Additionally,the viewpoints from the enemy camp about quantum computers are also concerned.Finally,we investigate some new research directions.

Review of the Gordon Bell Prize

ZHANG Lilun1，DENG Xiaogang1,2

2012, 34(8): 44-52. doi:

Abstract ( 551 )

PDF (708KB) ( 697 ) 　　

The ACM Gordon Bell Prize is the top academic award in the field of HPC application. This prize is awarded to recognize outstanding achievement, with particular emphasis on innovation in applying HPC to applications in science. Now it is deemed to be the measure of the progress over the time of HPC application. In this paper, the results of the awardwinning research are summarized and analyzed over the last thirteen years, especially for the peak performance and the special awards. Some views on how to promote the research of HPC application are presented for the reference of demestic colleagues.

Implementation and Optimization of the Spectralfinite Difference Hybrid Scheme for 3D NavierStokes Equations on GPU

XU Ying,XU Lei

2012, 34(8): 53-58. doi:

Abstract ( 491 )

PDF (403KB) ( 597 ) 　　

The approach of accelerating the applications with GPUs already delivers impressive computational performance compared to the traditional CPU.The hardware architecture of GPU is a significant departure from CPUs,hence the redesign and validation of the numerical algorithm are required.The spectralfinitedifference scheme is usually used in the direction, and the numerical simulation (DNS) of turbulent channel flows is studied.In order to validate the numerical accuracy,the scalar diffusion equation is first solved with this scheme,and the results from GPU and CPU are validated with the analytical solution.The performance study of the scalar diffusion equation shows at least 20X speedup.For the 3D full NavierStokes equation,the performance on GPU shows a 24X speedup.

An Optimized Power Supply Architecture for Supercomputers

HU Shiping，YAO Xinan，SONG Fei

2012, 34(8): 59-62. doi:

Abstract ( 545 )

PDF (655KB) ( 520 ) 　　

An optimized power supply architecture for blade type supercomputers is proposed in this paper.Instead of the twolevel DC bus, a threestage conversion architecture is commonly used in blade type supercomputers,which uses a onelevel DC bus,twostage conversion architecture,and retains the DC interface to the backup batteries. The difficulties for implementing the proposed architecture are discussed in this paper and the solutions are provided in detail.The proposed architecture has been implemented in a supercomputer system.The reallife performance has proved that,compared to the other existing architectures, the proposed architecture improves the power efficiency,reduces the component cost and the space required on the plugboards, and improves the overall reliability of the power supply system.Other possible methods of reducing supercomputer power consumptions are also discussed in this paper.

A Study of the Massively Parallel Computation Based on Structured Grids

WANG Yuntao1，WANG Guangxue1，XU Qinxin1，DENG Xiaogang1,2

2012, 34(8): 63-68. doi:

Abstract ( 544 )

PDF (1118KB) ( 728 ) 　　

In order to impove the parallel efficiency in the CFD software,the RANS equation and Menter's kOmega SST twoequation model,a multigrid technology and the general data transfer method,a region of the load balancing method will be used.The software has passed the test on the Galaxy of Stars system,which was in the Research Center of Supercomputing Application of NUDT.More than 2048 CPUs were used to computing the wing/body configuration with 100 million cells,and the parallel efficiency has achieved 48%.So this research improves the efficiency in the massively parallel computation of accurate force coefficients,and benefits to the massively parallel computation applications in engineering.

A Parallel Module for the Multiblock Structured Mesh in JASMIN and Its Applications

GUO Hong，MO Zeyao，ZHANG Aiqing

2012, 34(8): 69-74. doi:

Abstract ( 560 )

PDF (1111KB) ( 630 ) 　　

A parallel module for the multiblock structured mesh computing has been designed and implemented in the JASMIN infrastructure. In this module,a blocks’ relationship description algorithm and a unified communication schedule have been designed and adopted,which effectively solves the communication bottleneck broadly existing in multiblock structured mesh parallel computing.Meanwhile,by encapsulating parallel computing strategies,such as distributed storage,data communications,etc., and providing standard interfaces,this module can help the user conveniently realize multiblock structured mesh parallel computing.According to our test results,applications based on this module can be run efficiently on thousands of processors,which proves the module’s satisfying parallel computing performance.

Transparent Virtualization of Power Management

LIU Yongyan1，LIU Yongpeng2，LU Kai2

2012, 34(8): 75-80. doi:

Abstract ( 468 )

PDF (671KB) ( 532 ) 　　

Server consolidation based on virtual machines is employed to improve the system utilization and energy efficiency.However, the isolation effect of virtualization imposes a challenge to power management.A twolevel model,i.e.inside and outside virtual machine,is proposed for transparent power management in virtual machine environments.A multilevel power behavior statistic framework is introduced to support the power profiling of virtual devices,the virtual machine and the host.Power management mechanisms are virtualized and a dynamic speed scaling algorithm is proposed to map the power management actions between virtual devices and physical devices.The experiments demonstrate that our transparent virtualization solution of power management has a negligible decline of the system performance,which is less than 1%.

Parallel Computation for Hydrodynamic Instability Based on the SAMR Mesh

LIU Qingkai，ZHAO Weibo，XU Xiaowen

2012, 34(8): 81-85. doi:

Abstract ( 520 )

PDF (504KB) ( 511 ) 　　

Computation based on the SAMR mesh gets the same simulation results as that on the global refinement mesh,but the former only needs less cells and computation time.A parallel SAMR program for the simulation of hydrodynamic instability in inertial confinement fusion is developed on JASMIN.And an implosion experiment is simulated on hundreds of processors.Both simulation results and performance analysis demonstrate the correctness and high efficiency of the code.

The Energy Efficiency of Dynamic Voltage Scaling in HighPerformance Computers

YI Huizhan,LUO Zhaocheng

2012, 34(8): 86-92. doi:

Abstract ( 482 )

PDF (547KB) ( 546 ) 　　

In the future,effectively controlling power consumption becomes one of the key factors to increase the performance of highperformance computers,and we investigate whether one of the typical low power techniques,dynamic voltage scaling,can reduce energy consumption of highperformance computers effectively.In the paper we present an energy model of dynamic voltage scaling in highperformance computers,and put forward the concept of walltime energy consumption and real energy consumption.Using an intelligent power meter,we obtain the energy consumption of three computer systems in multiple voltages and frequencies,and then analyze the energy efficiency of dynamic voltage scaling in highperformance computers.As a result,we draw the conclusion that the technique of dynamic voltage scaling can reduce the energy consumption of highperformance computers effectively.

A Study of the Hot Swap Control Strategy for the Blade Plugins

SONG Fei，YAO Xinan，HU Shiping

2012, 34(8): 93-98. doi:

Abstract ( 501 )

PDF (560KB) ( 620 ) 　　

After a variety of hot swap control strategies are studied,this paper intends to design a hot swap circuit for the blade plugins with 12V power bus structure of the new generation high performance computer system.With the PMBusTM communication interface of the latest control technology,the measurement,protection,control and test of the hot swap circuit for the power bus of the blade plugins has been realized in this paper.The design techniques will be applied to a new generation of super computer plugin boards.

A Direct Numerical Simulation of the Complex MultiScale Flow with Shock,Vortex and Sound Wave

ZHANG Shuhai,LI Hu,LIU Xuliang

2012, 34(8): 99-107. doi:

Abstract ( 477 )

PDF (1055KB) ( 673 ) 　　

In this paper,we offer an overview of our recent DNS studies on the interaction between a shock wave and a single planar vortex,a pair of planarvortices or a longitudinal vortex,compressible isotropic turbulence through directly solving the two and three dimensional unsteady compressible NavierStokes equations using a fifth order weighted essentially nonoscillatory(WENO) finite difference scheme based on the YH parallel computer.The main purpose of these studies is to reveal the feature of shock dynamics,vortex deformation or vortex breakdown and the mechanism of sound generation in the interaction between a shock wave and vortices,as well as to explore the flow structure and mechanism of turbulence. These studies have demonstrated the excellent resolution and stability properties of the high order WENO schemes,making such schemes an ideal numerical tool for the study of shock vortex interactions in which both strong discontinuities and complex flow structures coexist.Through these studies,it is found that there is a multistage feature in the interaction between a shock wave and a strong vortex, which contains the interaction of the shock wave and the initial vortex,of the reflected shock wave and the deformed vortex and of the shocklets and the deformed vortex.The sound generated by the interaction between a shock wave and a vortex pair contains two regimes:the linear regime and the nonlinear regime.In the linear regime,the sound wave generated by the interaction of a shock wave and a vortex pair equals to the linear combination of the sound wave generated by the interactions between the same shock wave and each vortex.The second regime corresponds to the shock interaction with a coupled vortex pair.In the interaction between the shock and a longitudinal vortex,we find that there is a multihelix structure in the region of vortex breakdown.Our simulation of the compressible isotropic turbulence also confirms the existence of shocklets at sufficiently high turbulent Mach number,which is the most noticeable influence of compressibility on the structure of turbulence.

Parallel Implementation of the SelfConsistent Mean Field Theory on Copolymer Research

KOU Dazhi1,LIANG Haojun2

2012, 34(8): 108-113. doi:

Abstract ( 559 )

PDF (555KB) ( 762 ) 　　

Selfconsistent Mean Field Theory (SCFT) has been applied widely in polymer thermodynamics,and especially achieves progress in copolymer’s microphases research［1～6］.This is a flexible theory. It’s parameter space has a wide setting range, and various application samples can be used under this theory.So it is difficult to build a steady software package,and also rare reference regarding the parallel implement about this theory.This article focuses on the parallel implementations of the copolymer selfassembly application. The performance is theoretically analyzed,and the experimental results show that our algorithm has good parallel performance and scalability.This algorithm is also helpful to the theoretical research of polymer and the soft matter field.

Parallel Computation for the Finite Element Method Based on a Hierarchical Data Structure

ZHAO Weibo,LIU Qingkai,QIN Guiming

2012, 34(8): 114-118. doi:

Abstract ( 590 )

PDF (594KB) ( 502 ) 　　

Effective data structures and parallel algorithms play key roles in large scale parallel computation of FEM(finite element method).This paper proposes a hierarchical data structure for the unstructured mesh. Based on this structure,we design algorithms for the parallel computation of FEM.Numerical results have been presented to show the validity and scalability of the data structure and algorithms.

A Design of the Reliable Link Layer for Virtual CutThrough Switching

WANG Yongqing,ZHANG Minxuan

2012, 34(8): 119-124. doi:

Abstract ( 489 )

PDF (445KB) ( 528 ) 　　

One of the primary objects in designing high performance interconnection networks is to reduce the communication latency.Virtual cutthrough is an effective latencytolerant technique.But under the condition of limited input buffering,it is a challenge to implement the reliable link layer with the high speed.We present a method to conquer this problem and support virtual cutthrough switching with link layer,reducing the unnecessary buffering in an intermediate node.It combines the efforts of packet format, sender manager and receiver manager.Firstly, an extra cyclic redundancy code is included in packet header to protect the information;secondly, a link level retry is provided to reduce the time and protocol overhead involved in the endtoend retry;thirdly,with the effective implementation receiver,and especially the inputbuffer,we avoid the buffer overflow and the flow control corruption.

A Study of the Massively Parallel Computation Based on Structured Grids

WANG Yuntao1，WANG Guangxue1，XU Qinxin1，DENG Xiaogang1,2

2012, 34(8): 125-130. doi:

Abstract ( 416 )

PDF (1469KB) ( 592 ) 　　

In order to impove the parallel efficiency in the CFD software,the RANS equation and Menter's kOmega SST twoequation model,a multigrid technology and the general data transfer method,a region of the load balancing method will be used.The software has passed the test on the Galaxy of Stars system,which was in the Research Center of Supercomputing Application of NUDT.More than 2048 CPUs were used to computing the wing/body configuration with 100 million cells,and the parallel efficiency has achieved 48%.So this research improves the efficiency in the massively parallel computation of accurate force coefficients,and benefits to the massively parallel computation applications in engineering.

Resource Status Management Based on Historical Idle Records

CHEN Haitao,LU Yutong

2012, 34(8): 131-134. doi:

Abstract ( 467 )

PDF (487KB) ( 509 ) 　　

A poweraware resource state control algorithmPARC is proposed to relieve the energy crisis of highperformance computing.The PARC algorithm records the idle logs of the nodes’ power state, and dynamically hibernate the idle nodes to save power.The simulation results show that the PARC algorithm can achieve effective energy savings with good control on the switching frequency of the nodes’ power states.

Current Issue

Author center

Review center

Online journal