Computer Engineering & Science

Optimized scheduling algorithms for parallel task graphs

LI Yufeng1,2，MO Zeyao2，XIAO Yonghao1，XIONG Min1

2019, 41(06): 955-962. doi:

Abstract ( 285 )

PDF (670KB) ( 383 ) 　　

Review attachment

Many complex application problems in scientific and engineering computing can be solved through scientific workflow technology, and these scientific workflows in supercomputing domain are modeled by parallel task graphs. Effective scheduling of parallel task graphs is significant for efficient execution of applications. We propose a scheduling model of parallel task graphs under resource constraints. Some optimal scheduling conclusions are given for ForkJoin type parallel task graphs. In addition, we propose a new scheduling algorithm for general parallel task graphs. This algorithm considers the effect of data communication overhead on resource allocation and scheduling performance. We also improve the CPA algorithm under specific conditions. Our algorithms are compared with the commonly used CPR and CPA algorithms. Experimental results show that our new algorithms can achieve good scheduling results. The proposed scheduling algorithm and the optimal conclusions have reference significance for the development of high performance scheduling function for workflow application systems.

A configurable convolutional neural network

accelerator based on tiling dataflow

LI Yihuang，MA Sheng，GUO Yang，CHEN Guilin，XU Rui

2019, 41(06): 963-972. doi:

Abstract ( 259 )

PDF (1507KB) ( 297 ) 　　

Review attachment

Convolutional neural networks (CNNs) have been recognized as the best algorithm for deep learning, and they are widely used in image recognition, automatic translation and advertising recommendations. Due to the increasing size of the neural network, the number of the neurons and synapses of the network is also enlarged. Therefore, using specific acceleration hardware to mine the parallelism of CNNs becomes a popular choice. For hardware design, the classic tiling dataflow has achieved high performance. However, the utilization of processing elements of the tiling structure is very low. As deep learning applications demand higher hardware performance, accelerators require higher utilization of processing elements. In order to achieve this goal, we can change the scheduling order to improve the performance, and use parallel input feature graphs and output channels to improve computing parallelism. However, as neural network computation's demand on hardware performance increases, the array size of processing elements inevitably becomes larger and larger. When the array size is increased to a certain extent, a single parallel approach makes utilization gradually to decrease. This requires hardware to develop more neural network parallelism, thereby suppressing element idling. At the same time, in order to adapt to different network structures, configurable operation of hardware arrays on the neural network is required. But configurable hardware can greatly increase hardware overhead and data scheduling difficulty. So, we propose a configurable neural network accelerator based on tiling dataflow. In order to reduce hardware complexity, we propose a partial configuration technique, which can not only improve the utilization of processing elements under large array, but also reduce hardware overhead as much as possible. When the array size of processing elements exceeds 512, the utilization can maintain at an average of 82%~90%. And the accelerator performance is almost linearly proportional to the number of processing elements.

A hybrid memory trace collection

and analysis toolkit for DDR4

LI Zuojun1,2,CHEN Mingyu1,2,Qin Xiaoning3

2019, 41(06): 973-980. doi:

Abstract ( 308 )

PDF (1572KB) ( 362 ) 　　

Review attachment

With the development of multicore technology, the popularity of big data, cloud computing, artificial intelligence applications, the gradual practicality of nonvolatile memory technology and the urgent need for information security, the design of memory systems as the core part of data processing is extremely important. However, the existing analysis tools for memory systems are not able to meet the need of the researchers due to a variety of defects. We redesign the hardware based on the original HMTT, and realize the complete, efficient and distortionfree memory trace collection on the latest DDR41600 platform, and further enhance the portability of the toolkit based on the original system. Finally, we use the toolkit to perform memory trace collection test on the latest SPEC CPU 2017 applications, and analyze the collected trace information, which further verifies the effectiveness of our work. Our work offers researchers a powerful tool on analyzing the memory access behaviors of multiple applications and designing memory system architectures.

A static and dynamic adaptive optimization

method based on Java virtual machine

ZHANG Haijun1，ZHENG Yan2，YE Jun1，BAI Shujing1

2019, 41(06): 981-986. doi:

Abstract ( 144 )

PDF (661KB) ( 306 ) 　　

Review attachment

Dynamic language can take advantage of the profiling information at runtime to guide various optimizations of the program. However, the existing JAVA virtual machine does not effectively utilize the information collected at runtime, and directly discards it at the end. It re-monitors and collects the information needed for optimization when the program is executed again. We therefore propose a static and dynamic adaptive optimization method based on HotSpot virtual machine, which saves the optimal parameters or information obtained by the optimized object iterative search at runtime into the resource library. It can learn from the resource library to obtain the best parameters or options suitable for the current program, and effectively use the data accumulated at runtime. Resource analysis is static and offline, and does not take up the overhead for running the application. In the process of iterative learning, the accuracy and efficiency of the resource library learning process are ensured by avoiding redundancy instances to enter the library and removing noise instances from the library. Experiments show that the proposal is practical in guiding the adaptive optimization for Java virtual machine on different platforms.

GA-Sim: A job running time prediction algorithm

based on categorization and instance learning

XIAO Yonghao,XU Lunfan,XIONG Min

2019, 41(06): 987-992. doi:

Abstract ( 210 )

PDF (701KB) ( 309 ) 　　

Review attachment

In high performance computing job scheduling, many scheduling algorithms, especially the backfilling algorithm such as EASY, depend on the accurate estimation of job running time. Using user-supplied job running time usually significantly reduce scheduling performance. We propose a job running time prediction algorithm based on categorization and instance learning, named GA-Sim. It considers both prediction accuracy and underestimation problem. Numerical experiments on two actual scheduling logs show that compared with the IRPA and TRIP, the GASim reduces the underestimation rate while achieving higher prediction accuracy. We also make an indepth analysis of the numerical experiment results, and give suggestions for choosing an appropriate prediction algorithm under different circumstances.

A new scalable parallel simulator for

polymer flooding in naturally fractured reservoirs

ZHONG He1，LIU Hui1,CUI Tao2,WANG Kun1,HE Ye2,SHEN Lihua1,YANG Bo1,HE Ruijian1,ZHU Zhouyuan3,HU Jinghong4,CHEN Zhangxing1

2019, 41(06): 993-1000. doi:

Abstract ( 161 )

PDF (1175KB) ( 348 ) 　　

Review attachment

As one of the main energy resources, oil plays an important role in transportation, industrial production and daily life. A large number of technologies have been developed to maximize oil production. For instance, polymer flooding has been widely applied in China. The simulation of polymer flooding in naturally fractured reservoirs is vital to sustainable oil production and extension of the life of oil fields. We develop a scalable parallel reservoir simulator to simulate real oil production in naturally fractured reservoirs using polymer flooding technology, with which reservoir engineers can study production technologies using powerful parallel computers and optimize oil production. Comparison with commercial software shows that our simulator is correct and efficient. Besides, numerical experiments demonstrate that our simulator is scalable and can calculate large-scale reservoir models with hundreds of millions of grid blocks by using thousands of CPU cores.

CoFM: A hardware/software co-processing flow

management mechanism for middlebox acceleration

SHUI Xiaoyu,LI Junnan,SUN Zhigang,HUANG Jinfeng

2019, 41(06): 1001-1008. doi:

Abstract ( 163 )

PDF (706KB) ( 236 ) 　　

Flow management is the basis for Middlebox to implement stateful message processing. It is responsible for managing the connection established by endtoend communication and the management of its status. In order to meet the diverse needs of different Middleboxes for flow management, flow management functions are usually implemented in software such as Broand mOS to ensure flexibility. However, due to limited capability of CPUs, software flow management has the disadvantages of low throughput and high processing delay.
To achieve both performance and flexibility, we propose a software/hardware coprocessing flow management mechanism, called CoFM, to accelerate Middleboxes. The basic idea of the CoFM is decoupling flow management into applicationindependent mapping management (that is, finding connections according to flow identifiers) and applicationrelated connection management. And the mapping management can be offloaded to hardware to reduce the number of memory access without affecting the flexibility of connection management. Furthermore, the mapping management supports dynamic insertion of new mappings and deletion of outtime mappings, which can reduce the number of hardware/software interactions and hardware resource consumption, respectively. Finally, we implement the mapping management on FPGA using Verilog language. The results show that the mapping management of the CoFM has a throughput of 50 Gbps with low processing latency (<1 μs), and needs only a relatively small hardware resource cost, which is suitable to accelerate flow management.

Design and optimization of an N-state

binary consensus algorithm

LIU Hua1,YANG Chunxi1,HAN Guangsong2,XIE Kexin1

2019, 41(06): 1009-1015. doi:

Abstract ( 161 )

PDF (828KB) ( 218 ) 　　

Given the disadvantages of poor expansibility and strong experience dependence of existing binary consensus algorithms, we propose an Nstate distributed binary consensus algorithm. Firstly, based on the idea of average consensus of the Gossip algorithm and the roulette idea, the degree of deviation between the state average of the wireless sensor network and the current state average is updated, and the initial probabilistic distribution of all possible update states is calculated. Secondly, the genetic algorithm is applied to optimize the initial probability distribution and obtain the optimal probability distribution with better accuracy. The results show that the proposed algorithm has higher accuracy and shorter convergence time under the same number of states.

A formal engineering method for requirement

modeling of airborne embedded control software

HUANG Yihao,FENG Jincao,ZHENG Hanyue,MIAO Weikai,PU Geguang

2019, 41(06): 1016-1025. doi:

Abstract ( 234 )

PDF (860KB) ( 333 ) 　　

Embedded control software is one of the core components of modern aerocrafts. Constructing formal requirement specification to accurately describe the intended functions and operational scenarios is recognized as a fundamental way to ensure the quality of such safety-critical software. However, industrial practitioners still face a big challenge of applying formal modeling in largescale software projects due to the lack of a systematic engineering approach. Such an approach should provide practical technologies for guiding practitioners to construct formal specifications from the scratch and validate whether the specifications conform to the intended functions and scenarios. To tackle this challenge, we propose a formal engineering method called aero control software description language based modeling and validation (ACSDL-MV) for the requirement modeling of airborne control software. The ACSDL-MV incorporates formal methods and traditional software engineering technologies by guiding practitioners to build formal specifications from the original requirement document in an evolutionary process. For building formal specifications, we design a formal description language of airborne control software, which is named aero control software description language (ACSDL). In order to confirm that the requirement specifications accurately and sufficiently describe software capabilities expected by people, we provide static review based on diagram and dynamic simulation based on formal model. Experimental results show that the approach can detect more potential errors than traditional review methods.

A hazard analysis method based

on failure propagation model

GE Xiaoyu1，SHEN Guohua1,2，HUANG Zhiqiu1,2，DENG Liumeng1，WAN Weijian1

2019, 41(06): 1026-1033. doi:

Abstract ( 209 )

PDF (680KB) ( 335 ) 　　

Embedded realtime systems are extensively used in safetycritical environments, such as transportation, aerospace and nuclear power systems. Although system design may not have any defects, random failures due to wear of physical components or sudden changes in the environment can cause system hazards during operation. Currently, the hazard analysis methods based on failure propagation model either only consider failure propagation time or just failure probability, and do not comprehensively analyze the impact of the failure propagation time and the failure probability on the hazard analysis. Timed failure propagation graphs (TFPGs) are usually used to model the failure propagation process in the design phase of a safetycritical system, which includes failure propagation delay modeling. Considering the effect of the uncertainty of failure propagation path on the probability of the hazard occurrence, we propose a hazard analysis method, which uses the probabilisticTFPGs to model the failure propagation process. We also design an analysis algorithm to obtain the correlation between occurrence time and occurrence probability. Finally, a case is given to demonstrate the feasibility of the proposed approach.

A monic execution sequence generation

method for web service testing

HE Juanjuan1,LIU Dongmei1,ZHU Hong2,DU Yining1,ZHOU Zijian1,ZHENG Xiaoyu1

2019, 41(06): 1034-1043. doi:

Abstract ( 146 )

PDF (897KB) ( 284 ) 　　

Automatic test case generation plays a key role in automatic testing for web services. Existing testing techniques based on algebraic specifications rely on the creation, initialization, and replication of objects under test to verify the correctness of test results. However, their thirdparty nature means that such operations are not usually available, and thus it is hard to support transforming test cases into executable sequences of operations. One possible solution is to convert the test case into a linear execution sequence that contains only one instance of the service under test, does not include instance initialization, and only modifies and checks the instance states. We improve the original work and propose to use the test execution graph with inverse terms (TEGI) to describe state changes during the execution of the test case. We design a TEGI construction algorithm and a monic execution sequence generation algorithm, and implement prototype tools. A case study demonstrates that the proposal is feasible for automatic test case generation and can improve the testability of web services.

A weight guided filtering matching

algorithm combining three measures

GUO Xin1,2,WANG Yanjie1，FU Donghui1,2,FAN Bo1,2

2019, 41(06): 1044-1049. doi:

Abstract ( 197 )

PDF (561KB) ( 345 ) 　　

The census transform in existing stereo matching algorithms has good effect in the weak texture region, but it neglects the gray level information of the image, causing unsatisfactory matching effect in repeated texture regions. We therefore propose an improved census transform. In the initial matching cost stage, we design a similarity measure algorithm based on census transform, mutual information and gradient information. Then in the cost aggregation stage, we adopt an adaptive weight guided filtering aggregation strategy. Finally, the final disparity map is obtained by calculating and optimizing disparity. We use the proposed algorithm to test the standard images provided on the Middlebury website on the VS2015 software platform. Experimental results show that it can get an accurate disparity map and the average mismatch rate is 5.29%, which meets the requirement of 3D reconstruction.

A fast object motion direction recognition

method based on SIFT algorithm

LI Hongfeng1,WEI Jingtao1,FU Yawei2，WANG Jiatao1

2019, 41(06): 1050-1056. doi:

Abstract ( 181 )

PDF (828KB) ( 273 ) 　　

Due to the limitation of objective conditions, imaging devices can only image at a lower resolution, and cannot distinguish the motion direction. We propose a
fast object direction recognition method based on the SIFT. Firstly, according to the block matching algorithm, we use the corner feature extraction method to determine the position of the block, which is named as a detection block. The next frame image is traversed with the determined block, the matching block is found and named as the target block, and the motion direction between the check block and the target block is determined. Finally, the SIFT algorithm is used to match the feature points of two pictures. Each picture selects three corresponding feature points to construct a triangle, and the transformation relationship of two corresponding triangles constructs a mathematical model to identify the motion direction. Experimental results show that the motion direction recognition speed of our proposal is increased by 3.3 times compared with the block matching, and the matching accuracy is increased by 2.9 times, achieving higher judgment speed and accuracy. By comparison, the SIFT search algorithm has stronger feasibility and can determine the moving direction of objects faster in multi-frame images.

Face recognition by support vector machine

optimized by an improved grey wolf algorithm

FENG Zhang，PEI Dong,WANG Wei

2019, 41(06): 1057-1063. doi:

Abstract ( 202 )

PDF (784KB) ( 324 ) 　　

In order to solve the low efficiency problem of the combined twodimensional principal component analysis (2DPCA) and principal component analysis (PCA) in extracting face features, we propose a face recognition method based on the support vector machine optimized by the 2DPCA and fast PCA combined with an improved gray wolf algorithm (EGWO). Concerning the feature extraction, the method combines the 2DPCA with fast PCA to reduce the dimension and extraction time of the extracted features, thus reducing the identification time required by the SVM. In order to improve the global search capability of the gray wolf algorithm, the elite opposite-based learning strategy is used to initialize population individuals, which effectively enhances GWO’s exploration and mining capabilities. And then the elite opposite-based learning strategy is used in the SVM to iteratively obtain the best kernel parameters and disciplinary parameters, and the final classifier obtained from training is applied in face recognition. The GWO and opposite-based learning grey wolf algorithm (OGWO) are compared on six benchmark test functions in convergence accuracy and speed, and the former outperforms the latter. Experiments on face images in ORL and Yale datasets show that the improved algorithm is better and more stable than the GWO, particle swarm optimization (PSO) and differential evolution (DE) algorithm combined with SVM model.

A new automatic summarization method

based on paragraph vector

SHEN Qiangqiang，XIONG Zeyu,XIONG Yueshan

2019, 41(06): 1064-1070. doi:

Abstract ( 161 )

PDF (1107KB) ( 273 ) 　　

Automatic text summarization technology has a very broad application prospect in many fields, such as web search and browsing recommendation. The classic text summarization algorithm uses statistical methods to extract article keywords and topic sentences. It ignores semantic and grammatical information of the text to some extent. As distributed word vector embedding technology has been widely used in text summarization in recent years, we propose an automatic text summarization method based on word vector generation. This method mainly includes four modules: word vector generation, paragraph vector generation based on word vector, keyword extraction, and topic sentence extraction, through which an automatic text summarization of the document can finally be achieved. Experimental results show that the improved automatic text summarization method can extract topic sentences effectively.

model based on improved CNN

WU Guodong1,2，SONG Fugen1，TU Lijing2，SHI Mingzhe2

2019, 41(06): 1071-1077. doi:

Abstract ( 27 )

PDF (734KB) ( 78 ) 　　

In order to alleviate the problem of data sparsity in recommendation systems, based on the advantage of convolutional neural networks (CNNs) in capturing local features, we propose a local similarity prediction recommendation model based on improved CNN (LSPCNN) by adding an adjustment layer. The new model iteratively adjusts the initial user-item scoring matrix to localize user’s interest preference. And then CNNs are integrated to predict the missing score and achieve personalized recommendation. Experimental results show that the MAE value of the LSPCNN model under different degrees of data sparsity decreases by an average of 4% compared with the traditional recommendation methods, and it effectively alleviates the data sparsity problem and improves the performance of recommendation systems.

An improved ant colony algorithm for multi-agent

path planning in dynamic environments

ZHENG Yanbin1,2,WANG Linlin1,XI Pengxue1,FAN Wenxin1,HAN Mengyun1

2019, 41(06): 1078-1085. doi:

Abstract ( 204 )

PDF (1001KB) ( 291 ) 　　

Aiming at the problem of multi-agent path planning in dynamic environments, we propose an improved dynamic path planning method by combining the ant colony algorithm and fireworks algorithm. This method accelerates the iteration speed of the algorithm by adapting the pheromone intensity value and the pheromone reduction factor, and uses the fireworks algorithm to solve the deadlock problem in the path planning process and avoid falling into a local optimum. In the process of multi-agent dynamic collision avoidance, corresponding collision avoidance strategies are made according to whether the motion trajectory between dynamic obstacles and multiagent intersects, and the path collision function is used to solve the multi-agent frontal collision problem. Simulation results show that the proposed algorithm is superior to the traditional ant colony algorithm. It can effectively solve the collision problem in multiagent path planning, and quickly find the optimal collision-free path.

An imbalanced data classification method

based on probability threshold Bagging

ZHANG Zhonglin,WU Dangping

2019, 41(06): 1086-1094. doi:

Abstract ( 232 )

PDF (962KB) ( 304 ) 　　

The category imbalance problem exists widely in real life. Most of the traditional classifiers assume balanced class distribution or equal misclassification cost. However, when dealing with unbalanced data, their classification performance is seriously affected. Aiming at the classification problem of imbalanced data sets, we propose a probability threshold Bagging classification algorithm, called PT-Bagging to deal with unbalanced data. The algorithm combines the threshold-moving technique with the bagging ensemble algorithm, uses the original distributed training set for training in the training phase, introduces a decision threshold-moving method in the prediction phase, and employs the calibrated posterior probability estimation to obtain the maximized average performance measurement of the imbalanced data classification. Experimental results show that the PT-Bagging algorithm can better classify imbalanced data.

A spectral clustering algorithm based on Canopy clustering

ZHOU Wei，XIAO Yang

2019, 41(06): 1095-1100. doi:

Abstract ( 171 )

PDF (710KB) ( 267 ) 　　

The traditional spectral clustering algorithm is sensitive to initialization. Aiming at this defect, we introduce the canopy algorithm to conduct coarse cluster and get the initial clustering center as the input of the K-Means algorithm. Then we propose a spectral clustering algorithm based on canopy clustering (CanopySC) to reduce the blind selection of the initial center of the traditional spectral clustering algorithm. We apply the new algorithm to face image clustering. Compared with the traditional spectral clustering algorithm, the Canopy-SC algorithm can not only get better clustering centers and results, but also has a higher clustering accuracy. Experiments demonstrate its effectiveness and feasibility.

Key words:

A collaborative filtering recommendation algorithm

based on a bee colony K-means clustering model

LI Yanjuan,NIU Mengting,LI Linhui

2019, 41(06): 1101-1109. doi:

Abstract ( 247 )

PDF (813KB) ( 337 ) 　　

Review attachment

To address the problem of low recommendation quality and low recommendation efficiency of current collaborative filtering recommendation algorithms, we propose a collaborative filtering recommendation algorithm based on an improved bee colony K-means clustering model. Firstly, based on user attribute information, the algorithm uses the improved bee colony K-means algorithm to cluster users and establish a user clustering model. Secondly, we calculate the distance between target users and the clustering center in the user clustering model, and the cluster with the minimal distance is taken as the retrieval space of active users. Finally, we search the nearest neighbor of the target user by the similarity calculation according to the useritem scoring matrix, and generate a recommendation list via the information of the nearest neighbor. Experimental results show that the proposed algorithm can achieve lower MAE and shorter running time, and it can enhance the quality and efficiency of recommendation.

Current Issue

Author center

Review center

Online journal