Computer Engineering & Science

State of the art analysis of China HPC in 2014

YUAN Guoxing1,YAO Jifeng2

2014, 36(12): 2239-2241. doi:

Abstract ( 138 )

PDF (545KB) ( 317 ) 　　

In this paper,according to the latest China HPC TOP100 rank list released by SAMSS in the early November,the total performance trends of China HPC TOP100 and TOP 5 of 2014 are presented.Followed with this,characteristics ofthe performance,manufacturer,and application area are analyzed separately in detail.

High performance computing in
computational biology (Ⅰ) —molecular dynamics

WANG Tao

2014, 36(12): 2242-2250. doi:

Abstract ( 190 )

PDF (472KB) ( 429 ) 　　

Molecular dynamics is one important area of high performance computing applications.Lots of high performance computing resources or computing cycles are allocated to molecular dynamics simulations.The computing methods and features of molecular dynamics,including commonly used parallel algorithms and performance improvement approaches,are described. Several most often used massively parallel simulation softwares in molecular dynamics and their features are introduced. Finally, the developments and challenges of molecular dynamics simulations are viewed.

Design and implementation of supporting platform construction
oriented to numerical marine environment forecast

ZHANG Zhiyuan1,2,ZHANG Xiang2,ZHANG Quan2,WANG Yan2,YANG Guangwen1

2014, 36(12): 2251-2256. doi:

Abstract ( 150 )

PDF (844KB) ( 395 ) 　　

Numerical marine environment forecast has five characteristics which are multinumerical models, multicoupling processes, multiresult, more throughput and multiuser. Based on the above characteristics, the construction of the supporting platform of the marine environment forecast is proposed. An online coupler (HYCPL) for multicoupling processes, a model library(HYLIB) for multinumerical models, multiparameterization schemes and multistatistical methods, a test and evaluation tool (HYET) for multiresult, a parallel access and storage tool(HYI/O) for more throughput and a oneclick installation package (HY1KEY) for multiuser running environment are designed. All of these provide an effective and suitable supporting platform for numerical environment forecast system.

Change-oriented service evolution consistency checking

LI Bing，GAO Yan,WANG Bin,YANG Xiaochun

2014, 36(12): 2257-2266. doi:

Abstract ( 144 )

PDF (2440KB) ( 313 ) 　　

In an open,dynamic and changeable network environment, being able to effectively respond to users’ needs, changes of platforms and external environment is therefore the reason of why web service must have evolution capacity. For the problem of evolution consistency checking, the existing methods mostly based on fixed standards, that lack of finegrained, adjustable quantitatively analysis and cannot explicitly reflect the changes between service versions. In response to these shortcomings, the paper focuses on changes that service evolution caused, and proposes a service description model in aspects of services structural layer and nonfunctional layer. Based on the model proposed, the paper proposes changes extraction algorithm and changes union algorithm, and introduces the degree of evolution consistency to analyze evolution consistency quantitatively. At last, we design and implement a practical tool for service evolution consistency checking, with which the validity of evolution consistency determination method presented in this paper is verified.

Parallel computation and performance optimization
of STREAM on FT1000 processors

CHI Lihua，HU Qingfeng，LIU Jie，GAN Xinbiao，JIANG Jie，YAN Yihui

2014, 36(12): 2267-2271. doi:

Abstract ( 157 )

PDF (486KB) ( 428 ) 　　

STREAM benchmark measures the memory bandwidth of microprocessors.It is a challenge to get high performance of STREAM benchmark on the massively multithreaded FT1000 processors.Based on the hierarchical cache,the instruction pipelines of four routines of STREAM are optimized.Then,a multilevel loop unrolling method is proposed according to the number of registers,the prefetched data sizes are determined by the instruction delay and the cache line size,and the optimized subroutines are written in assembly language.Under the OpenMP parallel computing environment, the parallel codes for STREAM benchmark are given with the local data optimized methods.The test results of STREAM codes with performance optimization show that the performance increases by 19.2~64.2% for sequential computation.The highest memory bandwidth of the parallel optimized codes is 8.5GB/s. In comparison to the original parallel codes,the performances of the parallel optimized codes is improved by 22.7% .

Network calculus for fattree network

Qin Guangjun,ZHU Mingfa,XIAO Limin,RUAN Li

2014, 36(12): 2272-2279. doi:

Abstract ( 191 )

PDF (777KB) ( 301 ) 　　

Network calculus is one of the most important performance analysis tools in the network research area. The classical network calculus is mainly focused on the QoS, and the maximum backlog, the maximum network delay and the service curve are the significant analyzing metrics. However, the researchers in the highperformance computing systems are more concerned about the throughput, the communication latency, the network saturation point, etc., which cannot be inferred from the classical network calculus. The queuing theory is introduced into the network calculus, the communication latency and the throughput formulas are derived, and a network calculus analysis method for highperformance network calculus is proposed. Based on the analysis method, the uniform traffic pattern of the fattree network is analyzed. The analysis results indicate that the method can efficiently describe the network communication process and capture the saturation point, and basically be consistent with simulated results.

Effect of intra-pair skew between differential
pair on signal integrity in 25Gbps backplane

HU Jun,LI Jinwen,CAO Yuesheng,YANG Anyi,ZHANG Wei

2014, 36(12): 2280-2285. doi:

Abstract ( 265 )

PDF (1023KB) ( 456 ) 　　

Faced with the challenge of 25 Gbps signal transmission,digital system design and signal integrity engineers must pay attention to the critical issue of intrapair skew between differential pairs.Firstly,the cause of intra-pair skew between differential pairs is introduced. Secondly,effects of intrapair skew between differential pairs on signal integrity in 25 Gbps backplane are analyzed from frequency domain and time domain.At the same time,the margin of intra-pair skew between differential pairs for 25 Gbps signal transmission is evaluated by testbased channel simulation.Finally,the engineering treating methods are provided to reduce intra-pair skew between differential pairs.

A MapReduce scheduling algorithm supporting
multiple priorities based on queuing network

WAN Cong，WANG Cuirong，WANG Cong，L Yanxia,JIA Shuo

2014, 36(12): 2286-2295. doi:

Abstract ( 141 )

PDF (901KB) ( 375 ) 　　

MapReduce is a distributed computing framework for big data processing, which has been widely used in various fields. It’s a challenge to ensure the deadline of different priority users in the cluster providing MapReduce services. To solve this problem, a queuing network based multipriority scheduling algorithm (MPSA) is proposed. Firstly, the MapReduce based algorithms are summarized and analyzed, three common patterns are proposed, and the Jackson queuing network is used to build a mathematic model of the MapReduce based algorithms. The mathematic model can be used to find the resource demands of different priority queues. Secondly, the AR(1) model is used to predict the numbers of accessing users, and the binary search algorithm is used to calculate the assigned slot numbers of different priority users in map phase and reduce phase. Finally, a real time scheduling algorithm running in the MapReduce framework is implemented. Experimental results show that, compared with the traditional FIFO and fair scheduling algorithm, the proposed scheduling algorithm can ensure the defined deadlines of different priority users more effectively when the user arrival rates and the task scales change.

A cluster based data replication
strategy in cloud storage systems

FU Xiong,GONG Xiaojie,WANG Ruchuan

2014, 36(12): 2296-2304. doi:

Abstract ( 108 )

PDF (808KB) ( 339 ) 　　

Currently, cloud storage becomes one of the fundamental technologies for information sharing and data service in the Internet. Data replication is widely used in cloud storage systems to improve the data availability, enhance the faulttolerant capability and ameliorate improve the system performance. A clusterbased data replication strategy in cloud storage systems is proposed, which includes when to replicate data, the number of its replicas and where these replicas should be placed. In the replica placement stage, a Cluster Based Replication Placement (CBRP) method with load balance in cloud storage systems is proposed. Experiments demonstrate that the proposed method is practical and with good performance.

An energy efficiency evaluation model based on QoS
parameters reduction in cloud computing environments

CAI Xiaobo1,2,ZHANG Xuejie1

2014, 36(12): 2305-2311. doi:

Abstract ( 131 )

PDF (830KB) ( 297 ) 　　

High energy efficiency with high performance, low energy consumption, and QoS support is one of the research challenges in cloud computing. According to this issue, the current studies trade off the three aspects of the issue by fixing one factor and optimizing the others, which lack an efficient energy efficiency calculation method and evaluation model that integrate the three aspects and describe the "degree" of the energy efficiency better. A QoS parameters reduction approach and a weighted energy efficiency model are proposed, which introduce the system performance as a key indicator into QoS, reduce the discrete QoS parameters into a unified dimension, establish the energy efficiency classification levels of the cloud data centers, describe the energy efficiency as a qualitative concept, and realize the energy efficiency's qualitative evaluation in cloud computing environments. Besides, the energy efficiencies of the cloud data centers in singlemachine, homogenous, and heterogeneous cloud computing environments are evaluated, and experimental results show that the proposed energy efficiency model and evaluation method are effective for evaluating the QoS level and energy consumption in cloud computing environments.

Research on scheduling of realtime
messages over master-slave switched Ethernet

TAN Ming

2014, 36(12): 2312-2320. doi:

Abstract ( 186 )

PDF (673KB) ( 322 ) 　　

To make switched Ethernet meet the requirements of real-time communication, a novel link schedulability analysis method for both periodic and aperiodic realtime messages is proposed based on FTT-SE (Flexible Time Triggered Switched Ethernet) paradigm. In addition, it is proved that finding the optimal schedule for a given set of periodic messages on transmission links in order to minimize the maximum finishing time of reception links is NP-complete, and a heuristics algorithm named LSHA is proposed to solve this problem. Particularly, we design different EDF-based scheduling algorithms for periodic and aperiodic real-time messages respectively, which make it possible for the scheduler to take full advantage of multiple transmission paths, thus enhance the realtime communication over a COTSbased switched Ethernet. Simulation results show that the proposed Real-Time scheduling algorithm outperforms FTTSE in terms of enhancing network bandwidth utilization and reducing average message delay.

GPU based parallel optimization of spatial-spectral kernel
sparse representation for hyperspectral image classification

WANG Qicong1,3,WU Zebin1,2,LIU Jianjun1,WEI Zhihui1,3,YE Shun1,LIU Jiafu1

2014, 36(12): 2321-2330. doi:

Abstract ( 168 )

PDF (1446KB) ( 305 ) 　　

Hyperspectral image classification is a hot issue of hyperspectral remote sensing information processing. Under the structure of kernel sparse representation classification, SpatialSpectral Kernel Sparse Representation Classification (SSKSRC) of hyperspectral images can achieve better performance by joint spectral features and information of spatially adjacent pixels. However, it is impossible to utilize it in timecritical condition because of the large scale of data and calculation. A parallel optimization method of SSKSRC is proposed based on GPU/CUDA. A memory access optimization strategy is designed to optimize the data exchange between the host and the device. The parallel computing ability of GPU is fully used to accelerate the calculation of the kernel matrix in the process of classification. The matrix operation that is realized according to the parallel feature of GPU is used to optimize the solving process of the classification model based on the alternating direction multiplier method. The experiments with real hyperspectral image data validate the effectiveness and efficiency of the proposed method.

Research of TSV open defects in 3D SRAM

JIANG Jianfeng,ZHAO Zhenyu,DENG Quan,ZHU Wenfeng,ZHOU Kang

2014, 36(12): 2331-2338. doi:

Abstract ( 160 )

PDF (1078KB) ( 286 ) 　　

In 3D SRAM based on 3DIC technology,the manufacturing process of TSV is not mature yet,thus making TSVs be prone to open defects.And the existing test methods of TSV require a specific circuit, which increases the area overhead. Derived from 2D Memory BIST,the faulty behaviors (full open defects) of TSV in 3D SRAM are modeled.Based on coupling effects between TSVs we study the behaviors of SRAM cell through simulations, analyze and verify the influence of open defects on the existing values of SRAM cells in read and write operations.The physical faults caused by TSV open defects are mapped into SRAM functional faults.It is an effective method for testing and solving the open defects of TSVs without introducing any additional testing circuit in this way.

Modeling power distribution network in
TSV-based 3D-IC with silicon substrate effect

SUN Hao1，ZHAO Zhenyu1,LIU Xin2

2014, 36(12): 2339-2345. doi:

Abstract ( 226 )

PDF (1005KB) ( 330 ) 　　

Through Silicon Via (TSV) based ThreeDimensional Integrated Circuit (3DIC) introduces TSV into Power Distribution Network (PDN), and silicon substrate effect cannot be ignored because of 3D stack. Therefore, modeling PDN in TSVbased 3DIC must take TSV and silicon substrate effect into consideration. A model for 3D PDN in TSVbased 3DIC with silicon substrate effect is proposed. The proposed model is composed of a P/G (Power/Ground) TSV pair model and an onchip PDN model. In the modeling procedure of 3D PDN, the P/G TSV pair model with a bump and a contact is proposed based on a proved model, which reflects the electronic characteristics of P/G TSV pairs better. Additionally, the onchip PDN model, introducing the silicon substrate effect by conformal mapping method,is proposed based on the model proposed by Pak J S,which can reflect the silicon substrate effect on the electronic characteristics of PDN more effectively.The proposed model of 3D PDN is validated by experiments to prove that the proposed model of 3D PDN can evaluate the PDN impedance well and fast.

A fast and accurate node influence sorting
algorithm in online social networks

ZOU Qing1,ZHANG Yingying2,CHEN Yifan1,ZHANG Shigeng1,DUAN Guihua1

2014, 36(12): 2346-2354. doi:

Abstract ( 163 )

PDF (1079KB) ( 337 ) 　　

In large Online Social Networks (OSNs), sorting nodes according to their influence is an important research issue. The most influential (set of) nodes can be found by sorting nodes, which is essential to the control of information dissemination, public opinion control and analyses, and onpurpose advertising. Existing influence sorting algorithms either need global topology information of the network to calculate the influence of individual nodes (e.g., the betweennessbased algorithms), which are usually time consuming and thus are not applicable to large scale networks, or use the traditional sorting algorithms designed for web page ranking (e.g., PageRank), which cannot well handle the properties of OSNs like the existence of end nodes and different relationship between different nodes. The traditional PageRank algorithm is enhanced from two aspects to make it applicable to node sorting in large scale OSNs. Firstly, different residual weights are assigned to nodes according to the weights of links in the weight collection phase of PageRank, which effectively mitigates the negative influence of end nodes on the sorting accuracy. Secondly, in the voting process, the diversity among different nodes is considered and a neighborhood based algorithm is proposed to assign weights to different nodes, which effectively improve the sorting accuracy. The performance of the proposed algorithm is evaluated on a sample network constructed with 15,000 real users sampled from Sina Weibo. It is shown that the enhanced algorithm can find more than 40% of the 1000 most influential nodes in the sample network, while the traditional PageRank algorithm counterpart can find only 11%. Meanwhile, compared with the betweennessbased algorithms, the proposed algorithm achieves similar or even better sorting accuracy with much less time cost.

Triple modular redundancy design for VLSI gate level netlist

XU Ranran1,2,MENG Haibo1,GUI Xiaoyan2,SHEN Xiaowei1，AN Shuqian1

2014, 36(12): 2355-2360. doi:

Abstract ( 206 )

PDF (617KB) ( 290 ) 　　

Particles in universe may damage spacecrafts to malfunction,and triple modular redundancy (TMR) is an effective faulttolerant technology.However,the existing TMR design is usually specifically customized for a given chip, it can as not be used in general.A novel TMR design scheme is proposed for VLSI gatelevel netlist without considering the function.The scheme contains four design methods, which are global sequential elements TMR,local sequential elements TMR,global combinational logic cells TMR, and local combinational logic cells TMR.According to different libraries,the strategy also optimizes the drive capability.The proposed scheme is verified by a multicore processor netlist.The experimental results show that,the area overhead of global sequential elements TMR is 185% of that of the original netlist,and the area overhead of local sequential elements TMR 1%~80% of that of the original netlist.The scheme can be configured according to designers’requirements.Experimental data show that the delay introduced by the scheme on the critical paths is about 22.15%~22.86%,which is controllable for designers. And the scheme has a relative high reliability.

Design of a novel current sense amplifier for passive RFID

LI Wenxiao，LI Jiancheng，LI Cong，WANG Zhen，SHANG Jing

2014, 36(12): 2361-2366. doi:

Abstract ( 144 )

PDF (1315KB) ( 339 ) 　　

A novel sense amplifier is proposed to be suitable for the Multiple Time Programmable Nonvolatile Memory (MTPNVM) of passive RadioFrequency Identification (RFID).The circuit has superior performance such as lowpower, high speed,reliability and highly sensitive without extra area overhead. The simulation based on GSMC 0.13μmCMOS process shows that the new sense amplifier has high read speed and can work at low voltages (0.8V).When the voltage is 1.2V and the temperature is 27℃,the read delay is 10.5ns,the power consumption is 6.1μW@25MHz and the current difference it can identify accurately is about 33nA.

Study of rectangular-shaped resonators
method to suppress crosstalk

LIU Ziyu，LI Jinwen

2014, 36(12): 2367-2372. doi:

Abstract ( 147 )

PDF (1033KB) ( 296 ) 　　

In high speed digital systems,signal integrity problems become more and more obvious, the increase of signal rate and design density make crosstalk be one of the major factors. Firstly,the rectangular-shaped resonators method to suppress crosstalk is studied. Secondly,this method is compared with the 3W rule method and the guard trace method with shorting via,and the comparison results illustrate that the rectangular-shaped resonators method works bad for suppressing near-end crosstalk of the coupled microstrip lines but works good for suppressing farend crosstalk of the coupled microstrip lines.The frequency-domain simulation shows that the far-end crosstalk of the rectangular-shaped resonators structure is increased by 12 dB and 8 dB respectively,the time-domain simulation shows that the peak of farend crosstalk voltage of rectangular-shaped resonators structure is improved to be 18.2% and 23.1% of that of the 3W rule method and the guard trace method with shorting via.Finally,how the structural parameters (gap between the rectangularshaped resonators, length and width) of rectangular-shaped resonators affect the far-end crosstalk of the coupled microstrip lines is studied.The simulation results demonstrate that there is a group of optimized values for the three structural parameters to suppressing the far-end crosstalk the most effectively.

Experimental study on the effect of jet distance
on thermal performance of the jet cooling plate

CHAO Liangjie，XUE Jianshun，YU Hui

2014, 36(12): 2373-2377. doi:

Abstract ( 131 )

PDF (779KB) ( 390 ) 　　

In order to enhance the understanding of influential factors that affect the thermal performance of the jet cold plate,the effects of the jet distance on thermal performance and resistance of the jet cooling plate are studied through experiments, and the uniformity of the thermal performance of the jet cooling plate is analyzed. Experimental results show that:(1) in the case of the same flow rate, as the jet distance increases, the total thermal resistance of the jet cooling plate in the existing structure increases too after it decreases firstly,and there is a minimum value when the jet distance is 1.5 mm;(2) the resistance value of the jet cooling plate decreases when the jet distance increases,but the effect of the jet distance is small;(3) there exists a certain unevenness of the thermal performance of the jet cooling plate in the direction of flow,and the cooling capacity in the inlet side is slightly higher than that in the outlet side.

A fast target center location algorithm for
dynamic vision measurement based on CUDA

XU Xiaochen1,DONG Mingli1,WANG Jun1,SUN Peng1,2,YAN Bixi1

2014, 36(12): 2378-2385. doi:

Abstract ( 136 )

PDF (827KB) ( 385 ) 　　

CCD resolution plays a great role in vision measurement precision, but the high resolution will greatly increase the amount of data and computation. Because of that, the traditional serial target center location algorithm running on the CPU cannot meet the requirement of dynamic measurement. In view of this, a fast target center location algorithm for dynamic vision measurement based on CUDA is proposed. When the number of targets is beyond 10 000, more than 90% of time is consumed on image preprocessing, region constraint and target center calculation in the serial target center location algorithm. The three most timeconsuming parts are focused on and each part is analyzed and implemented based on CUDA. The experimental results show that, compared with the serial algorithm running on the CPU, the processing speed of 35 000 target centers based on CUDA is improved by 11.5 times with the same location precision, and the acceleration ratio is improved significantly along with the increase of targets number.

Current Issue

Author center

Review center

Online journal