Computer Engineering & Science

Exploring fine grained task parallel GEMM on
single-and multi-GPU systems

ZHANG Shuai，LI Tao,WANG Yifeng,JIAO Xiaofan,YANG Yulu

2015, 37(05): 847-856. doi:

Abstract ( 221 )

PDF (963KB) ( 432 ) 　　

The Dense Linear Algebra (DLA), which is very important to many applications such as pattern recognition and bioinformatics,depends critically on the general matrixmatrix multiplication (GEMM) routine.In current cuBLAS and MAGMA libraries,GEMM is implemented with kernel functions to achieve high performance for large GEMM.However,they are not efficient for multiple independent small matrices,even though the interfaces for batched small GEMMs are provided in cuBLAS.Moreover,they cannot automatically scale across multiple different GPUs with good load balancing.In this paper,we propose a task parallel GEMM (TPGEMM) that explores fine grained task parallelism for batched and multi-GPU GEMMs.The workloads of one or more GEMMs are decomposed into tasks which are scheduled to persistent GPU kernels at runtime.The TPGEMM avoids the overhead for launching multiple kernels and achieves better performance for batched small GEMMs compared with the cuBLAS and MAGMA libraries.Based on the fine grained task scheduling with low overhead, TPGEMM supports auto-parallelization across multiple GPUs and achieves an efficiency close to 100% on a workstation with 4 different GPUs.

Technologies of real-time processor architecture:a survey

SHI Wei，ZHANG Ming，GUO Yufeng，GONG Rui

2015, 37(05): 857-864. doi:

Abstract ( 192 )

PDF (605KB) ( 459 ) 　　

As realtime applications have become a kind of important embedded applications,the realtime microprocessor architecture,the core component of realtime systems,attracts much more attention.Unlike generalpurpose processors,realtime processors require a tight and predictable worstcase execution time.To meet the requirement,traditional realtime processors often adopt a relatively simple structure so as to avoid uncertain execution time.To meet the increasing performance demand of realtime applications,realtime processors also gradually shift towards multithreading and multicore. In the multithreaded and multicore processors,there are a large number of shared resources leading to deterioration of deterministic realtime systems.In this paper we summarize the technologies of realtime processor architecture.We first analyze the traditional realtime processor architecture from the aspects of instruction set architecture, microarchitecture,memory,I/O,and realtime task scheduling.Then,we analyze the highperformance realtime processor architectures employing multithreaded and multicore structures.Finally,we compare two commercial realtime processors,and summarize the status and trends of realtime processors.

A novel soft real-time VM scheduling algorithm
based on resource pre-allocation

DING Xiaobo1,2,MA Zhong1,3,DAI Xinfa3，HUANG Weihua3

2015, 37(05): 865-872. doi:

Abstract ( 153 )

PDF (1046KB) ( 339 ) 　　

As one of the most important technology of cloud computing,virtual machine (VM) technology has attracted considerable attention in recent years.But the virtual machine monitor(VMM) impacts the performance of real-time applications and parallel applications due to semantic gap. In this paper we analyze the Credit scheduler of Xen VMM, propose a novel soft real-time scheduling algorithm based on the Credit, and implement a scheduler prototype base on the new algorithm, which can satisfy the requirements of the soft realtime VM scheduling. In this new scheduling algorithm the Credit is pre-allocated to the soft real-time VMs according to the occupation rate of the CPU, and the scheduler adopts the dynamic time slice scheduling mechanism and works in non-work-conserving mode to realize soft real-time scheduling so as to guarantee that the scheduling period meets the requirements.Distinguishing the parallel and the non-parallel VMs and using different strategies ensure the execution of realtime tasks as well as meet the requirement of resource utilization.Test results show that the new algorithm has good performance in scheduling both parallel and non-parallel soft real-time tasks and hence meets the requirements of soft real-time application scheduling.

A TAM based adoption model and key factors of cloud
application services for enterprises

SHI Shuangyuan1,WU Yingmin2,ZHANG Zezhong1,XU Kai3

2015, 37(05): 873-881. doi:

Abstract ( 204 )

PDF (696KB) ( 402 ) 　　

Cloud computing dominates the future of enterprise applications,which has become a consensus of the industry,and shifting to cloud services becomes a tremendous change in enterprise applications model.Enterprise users have to adopt suitable cloud solutions,and need assessment tools for assessing the risk of cloud application service adoption,while the application service providers should provide satisfactory cloud application service.Therefore,research on the influential factors of cloud application service, user acceptance is of great value in both theoretical and practical fields. Based on technology acceptance model (TAM) and other related theories,we first set up a cloud application service technology acceptance model (CAS-TAM) and integrate the fuzzy decision making trial and evaluation laboratory (DEMATEL) method to analyze the CAS-TAM and the related constructs.We then examine the constructs in CAS-TAM and explore the core influencing factors.We find that perceived usefulness (PU), application service quality (ASQ), social influence (SI) are the key impact factors for the case study to the enterprise user adoption,and reliability (ASQ-RL), responsiveness (ASQ-RP), use & maintain ability (PBCU&M), ease of use (ASQ-EOU), social influence (SI) are the key causes which have critical impact on the adoption of cloud application service.

Scheduling DAG-based tasks in distributed system：a survey

TIAN Guozhong1,2,XIAO Chuangbai1

2015, 37(05): 882-894. doi:

Abstract ( 664 )

PDF (889KB) ( 31337 ) 　　

In recent years, along with the development of the technologies for distributed computing, such as grid and clouds workflow systems, the problem of scheduling DAG-based tasks in distributed system environment has attracted intensive attention of researchers recently. According to the latest research progress,we explore the problem of scheduling DAG-based tasks in distributed system environment and related technologies.It includes the following four parts:(1) describing the related concepts on distributed systems and on heterogeneous distributed systems and demonstrating the problem of scheduling DAG-based tasks in heterogeneous distributed system environment,its model and its typical applications;(2) classifying the researches on scheduling DAGbased tasks according to different perspectives;(3) reviewing the previous researches on scheduling the shared heterogeneous distributed resources based on multiple DAGs;(4) discussing problems to be resolved regarding scheduling multiple DAG.Finally,we summarize the key points of this paper.

An entity linking approach based on
massive RDF knowledge bases

TANG Xiaoqin1，LIU Libo1，ZHOU Tao2

2015, 37(05): 895-900. doi:

Abstract ( 172 )

PDF (601KB) ( 401 ) 　　

More and more unstructured data products are produced,distributed and consumed over the Internet today.Resource Description Framework (RDF) is the Internet resources description standard proposed by W3C,which is quite suitable in describing the unstructured information on Internet.As a consequence,a large number of RDF knowledge bases have been developed,such as Freebase,Yago,DBPedia and so on.RDF knowledge bases contain rich semantic information,which not only label the named entities on Web pages, but also expand semantic information.The task of mapping named entities from web pages to the corresponding entities in knowledge bases is called entity linking.Entity linking includes two main parts: mapping between entities and entity disambiguation.In the paper, we propose an efficient entity linking approach based on the properties of RDF knowledge bases.The algorithms use a simple weighted graph approach to solve the entity disambiguation problem on cloud platform. Experimental results show that our solution is accurate and scalable.

An improved realization method for GPU virtualization

CHEN Zhijia,ZHU Yuanchang,DI Yanqiang,FENG Shaochong

2015, 37(05): 901-906. doi:

Abstract ( 246 )

PDF (748KB) ( 358 ) 　　

In recent desktop cloud scenarios,a serious problem is that the 3D graphic processing performance of user virtual machines can not satisfy users’increasingly high requirements. In this paper,the main realization modes of GPU virtualization are first researched and analyzed.Then based on the above analysis of virtualization methods,a kind of new virtualization scenario comprising of VMM passthrough and API remoting is proposed. We utilize Hypervisor to create two kinds of virtual machines:one root virtual machine,named GVM that monopolizes the GPU resources and several child virtual machines named DVMs that don’t interact with GPU directly.The GVM shares GPU memory and command channels with the DVMs.Thus GPU calls from the child virtual machines can be transported to the GVM.Subsequently the GVM calls physical GPU and transports the results to the DVMs. The typical virtualization methods are tested and the results prove that the method can effectively improve the 3D graphic process performance of user virtual machines.

Wave equation prestack depth migration
based on Xeon Phi platform

YANG Xiangsen1,JIN Jun2,WANG Peng1,MA Zhaogui1,KANG Yonggan1

2015, 37(05): 907-913. doi:

Abstract ( 166 )

PDF (917KB) ( 352 ) 　　

Wave equation prestack depth migration is a high-precision seismic imaging method,suitable for complex media with sharp lateral velocity variation,however,its huge computation hinders its application.Xeon Phi is a new product for throughput computing which provides a better choice for these methods.Taking the Split Step Fourier algorithm for example, we introduce the code migration and performance optimization on Xeon Phi platform.We offload the kernel which is fully threaded to Xeon Phi by predefined compiler pragmas so as to take full advantage of SIMD engine to improve the efficiency of the parallelization processing; and we change the code structure to ensure the vectorization.Besides,by extending the parallel processing framework with the dynamic workloadbalance mechanism,we develop a seismic imaging system based on Xeon Phi platform which is suitable for largescale,full hybrid heterogeneous clusters.Finally, we conduct a real workload test and the results show that Xeon Phi platform can significantly improve the seismic imaging efficiency,and has good core scalability.

TMP-TIE: an inter-domain egress selection algorithm
based on traffic migration prediction

ZHAO Dan1,YANG Guigang1,HU Xiaofeng2

2015, 37(05): 914-919. doi:

Abstract ( 156 )

PDF (1239KB) ( 280 ) 　　

Egress selection algorithms directly represent the inter-domain routing policies and traffic engineering capability.In order to avoid large traffic migration caused by the tunable inter-domain egress (TIE) selection algorithm,we propose an inter-domain egress selection algorithm based on traffic migration prediction,named TMP-TIE,based on the network architecture of control and forwarding separation.TMP-TIE predicts and determines the volume of large traffic migration to decrease its impact on inter-domain traffic forwarding. The performance of Hot-Potato,TIE and TMP-TIE are compared,and the simulation results show that TMP-TIE has the minimal routing sensitivity and traffic sensitivity.It can also reduce the network cost and the probability of congestion in presence of failures,thus leveraging traffic engineering.

Research on minimizing energy
consumption schedule for divisible load WSN

XU Wei,LIU Duanyang,BAO Zhanbing

2015, 37(05): 920-924. doi:

Abstract ( 137 )

PDF (573KB) ( 274 ) 　　

To minimize energy consumption and prolong network lifetime in wireless sensor network (WSN) has become a hot research topic. For this purpose,we design a simple sequential schedule algorithm (SSSA) and an energytime tradeoff schedule algorithm (ETTS) based on the classic divisible load scheduling algorithm in WSN. The SSSA is the minimum energy consumption algorithm among the classic divisible load schedules proved by theory and experiment respectively under the constraint of minimal time.The experimental results show that the two schedule algorithms can reduce the network energy consumption and prolong the network life cycle effectively.And the SSSA should be used when the number of surviving nodes is highly important to the network topology,otherwise the ETTS should be used when the energy of the initial rounds is especially concerned.Furthermore,the ETTS can reduce the energy consumption further with the increase of the given time.

Research on congestion feedback based on
crosslayer in cognitive radio wireless Mesh networks

CHEN Yanyan

2015, 37(05): 925-929. doi:

Abstract ( 136 )

PDF (610KB) ( 345 ) 　　

Due to the difference of link load caused by the variation of channel status，congestion is an important factor for system performance in cognitive radio wireless mesh network. To solve this problem, a congestion feedback algorithm based on max-min fairness strategy is proposed. Through the comprehensive analysis of the resource allocation constraints of multi-rate encoding/decoding modulation based on the random search-genetic algorithm, the channel allocation mechanism of multi-data-flow and the optimized routing mechanism, we build a crosslayer model and calculate the network congestion. Meanwhile, through congestion feedback this algorithm realizes the associated cross-layer optimization in physical layer,link layer and network layer respectively,and avoids congestion to the greatest extent.The simulation results show that this algorithm has better convergence when congestion occurs,and it can avoid congestion and improve the throughput.

Research on software defect prediction based on
integrated sampling and ensemble learning

DAI Xiang1，MAO Yuguang1,2

2015, 37(05): 930-936. doi:

Abstract ( 158 )

PDF (697KB) ( 313 ) 　　

We study the class-imbalanced problem of software defect prediction and propose an integrated sampling method for class-imbalanced data classification so as to enhance the classification ability.In order to avoid the blindness of random sampling,we utilize the integrated sampling method to balance datasets:using SMOTE for over-sampling minority class and KMeans clustering for down-sampling majority class.After obtaining a balanced dataset,we utilize multiple single classifiers to ensemble learning. Experimental results show that the software defect prediction algorithm,which combines integrated sampling and ensemble learning,has better classification performance,obtaining a higher true positive rate while significantly reducing the false alarm rate.

Automatic transformation between programs’
flowcharts and codes based on graph grammar

ZHU Yun,ZENG Xiaoqin,ZHU Ning,LIU Yufeng

2015, 37(05): 937-945. doi:

Abstract ( 232 )

PDF (1324KB) ( 418 ) 　　

Flowchart is very important in the lifecycle of software engineering. During the design of a program,programmers usually need to draw a structural flowchart before coding.While in the course of the analysis and maintenance of a program,it can be helpful for programmers to analyze the program’s structure by firstly reversing the source codes to a corresponding flowchart.Obviously,automatical transformation between flowcharts and source codes can save plenty of human resources and lots of development time. We discuss how to make use of the Edge-based Graph Grammar (EGG) to automatically perform the transformation,and demonstrate the realization of grammatical analysis and reverse generation of flowcharts by applying the reduction and derivation operations of the EGG through an example.

A comprehensive evaluation method based on
entropy theory and measure of medium truth scale

TANG Ying，ZHANG Yuping，CHEN Haiyan

2015, 37(05): 946-950. doi:

Abstract ( 232 )

PDF (440KB) ( 282 ) 　　

There are many comprehensive evaluation methods for various comprehensive evaluation problems. The measure of medium truth scale based on the fuzzy mathematics is one of the methods that can be used to evaluate objects with multiple indicators, and it can deal with fuzzy phenomenon effectively. In order to make the evaluation results more objective, we propose a method based on entropy theory and the measure of medium truth scale. We first calculate the weight of primary indexes with entropy theory,and use the analytic hierarchy process (AHP) to calculate the weight of the secondary indexes. Consequently the errors caused by the subjectivity of the measure of medium truth scale in the process of evaluation are effectively reduced.Finally the method is applied to the quality assessment of residential projects.Compared with the measure of truth scale and the entropy weight method,the proposed method in this paper is more feasible and effective.

Possibilistic bisimulation based on generalized
possibility measures and its logical characterizations

ZHANG Xingxing,DENG Nanyi,MA Zhanyou,LI Yongming

2015, 37(05): 951-957. doi:

Abstract ( 145 )

PDF (633KB) ( 292 ) 　　

Firstly, we define the syntaxe and semantics of the Generalized Possibilistic Computation Tree Logic’s expansion(GPoCTL*), the Generalized Possibilistic Computation Tree Logic’s reduction (GPoCTL－) and the Generalized Possibilistic Reward Computation Tree Logic(GPoRCTL) in terms of the generalized possibility measures. Based on classical bisimulation and generalized possibility measures, we then discuss the possibilistic bisimulation and their properties. Finally, the equivalence relations among GPoCTL,GPoCTL*,GPoCTL－ and bisimiliar states are proved.

An empirical study of the combinatorial
testing of time-shifted TV programs

WANG Yan,NIE Changhai,ZHANG Shaobing

2015, 37(05): 958-966. doi:

Abstract ( 174 )

PDF (637KB) ( 361 ) 　　

Time-shifted TV has become one of the typical multimedia businesses.It has a variety of functions but the operations are more complex.Since the existing tests do not fully consider the interactions of parameters in the system,it is very difficult to find certain functional defects and failures.We apply the adaptive combinatorial testing methodology to Set-Top Box time-shifted TV business testing.Firstly we establish a combinatorial testing model,then we design and execute test cases and locate the faults exposed in the testing.The empirical study not only validates the existing combinatorial testing theories and methods,but also finds many real failures in the Set-Top Box time-shifted TV business.

An optimal charging schedule strategy of electric vehicles based
on partheno-genetic algorithm and dynamic programming

LU Jianyi1,YAN Chao1,XIAO Laiyuan2,ZHENG Rui1

2015, 37(05): 967-973. doi:

Abstract ( 209 )

PDF (544KB) ( 473 ) 　　

Battery charging scheduling is one of the most important aspects in electric vehicle operations management. An efficient charging scheme can not only help operator reduce the charging cost but also relieve the stress of the power system during peak hours.Based on the assumptions of real time electricity price and nointerruption charging jobs,we propose a minimum charging cost strategy on the basis of parthenogenetic algorithm and dynamic programming.To test the performance,we make a comparison between our algorithm and a designed strategy called "first come first charge" and the traditional genetic algorithm.We test the three charging strategies with the same examples and the simulation results indicate that the proposed method is effective in cost saving while insuring the loading balance of the electric system.In addition,the Gantt chart of vehicle assignment also shows it can effectively relieve the stress of the grid.

Application research of neural network ensemble method
based on improved clustering analysis

YANG Hongbo1,ZHENG Jian2

2015, 37(05): 974-978. doi:

Abstract ( 244 )

PDF (517KB) ( 48511 ) 　　

With the expansion of industrial production scale and the increasing complexity of production process,demands for process simulation are increasing. In the paper, we propose an improved clustering analysisbased neural network ensemble method.Firstly,according to the density distribution of data,an improvement in Kmeans method is made to overcome the disadvantage in the selection of the original central point，and then the samples are classified and the differences among the samples are enlarged.Secondly,aiming to the different samples, the general regression neural network (GRNN) with fast learning ability is used to construct and train individual neural networks.Thirdly,a compensation network is constructed for all the samples by GRNN so as to eliminate the output errors due to false selection.Finally,the obtained clustering center is utilized to make numerical analysis for the input samples and to select the individual neural network.The output of the selected individual neural network is compared with the output of the compensation network,and the neural network ensemble is realized.The verification on artificial data Sinc illustrates that the model accuracy is enhanced by the proposed method.Thus,it provides a new way for increasing the accuracy of process simulation.

An index system establishment method based
on screening evolution model

ZHANG Hengwei，HAN Jihong，ZHANG Jian，WANG Jindong，KOU Guang

2015, 37(05): 979-985. doi:

Abstract ( 180 )

PDF (734KB) ( 341 ) 　　

To solve the problem of evaluation index selection and weight distribution,on the basis of the idea of genetic algorithms,we propose an index system establishment method based on screening evolution model (SEM).In each screening round,the data reliability based on Euclidean distance and the index fitness based on evaluation information entropy are calculated.According to index fitness,excellent indexes are selected into the preliminary set,and inferior indexes are dealt with by crossover and mutation operations to form a new alternative set.After several screening rounds,a preliminary index set is established eventually,and the index weights are allocated and adjusted according to their fitness.The sample analysis proves the feasibility and effectiveness of the method.

An abandoned object detection algorithm
in complex environments

YE Liren，HE Shenghong，ZHAO Lianchao

2015, 37(05): 986-992. doi:

Abstract ( 246 )

PDF (1176KB) ( 514 ) 　　

To solve the problem of detecting abandoned objects in complex environments, we propose an efficient method. Firstly, the candidate areas of the objects are obtained by comparing the foregrounds derived from a Gaussian mixture model based on partial updating and from an improved threeframedifference method.And the blobs of temporarily static objects are segmented by applying shadow elimination and the connected component analysis. Then the blob whose static duration reaches the threshold is labeled as the abandoned object after ruling out the possibility of a static human by applying Histograms of Oriented Gradients (HOG) pedestrian detection. Finally,to overcome pedestrian occlusion and illumination changes, the Features from Accelerated Segment Test (FAST) local feature matching is applied to the detected abandoned objects. Experimental results show that the proposed method has high accuracy and fast processing speed , and a good antiinterference performance is achieved in complex environments.

Current Issue

Author center

Review center

Online journal