Computer Engineering & Science

A survey of performance improvement methods

for multi-core cache sparse directory

WU Jianguo，CHEN Haiyan，LIU Sheng，DENG Rangyu，CHEN Junjie

2019, 41(03): 385-392. doi:

Abstract ( 318 )

PDF (917KB) ( 459 ) 　　

Review attachment

Due to limited power consumption, the general-purpose processor stopped pursuing higher frequency more than a decade ago and moved towards integrating more processor cores on a single chip. At the same time, with the increasing density of transistors according to the law of Moore, the number of processor cores integrated on a single chip has been doubled and redoubled, thus multicore and manycore processors have become the mainstream of high-performance processors. It is an inevitable trend for the future kilo-core general processor to support shared memory programming model. However, the traditional cache coherence directory structure is confronted with the problems of high latency, frequent replacement of directory entries, limited scalability for the hardware cost and power consumption. The sparse directory realizes the tradeoff between the hardware cost of the traditional directory structure and the coherence maintenance efficiency, and is considered as an energy-efficient and scalable structure for many-core processors to maintain cache coherence. We review related research and methods for improving the performance of sparse directory in recent years, analyze their characteristics in terms of area, access delay, power consumption and implementation complexity, and summarize the merits and shortcomings of these directory schemes. It has certain reference significance for designing novel scalable shared memory architectures for future many-core processors.

A Spark-based parallel brainstorm optimization algorithm and

its application in optimizing complex multimodal functions

YANG Guangming，ZHANG Tao，TRUONG Thanhtung，WANG Rui，MA Lianbo

2019, 41(03): 393-399. doi:

Abstract ( 191 )

PDF (956KB) ( 301 ) 　　

The brain storm optimization (BSO) algorithm is a new type of swarm intelligence optimization algorithm, which is inspired by the brainstorm problemsolving model and is suitable for solving complex multimodal function optimization problems. However, when solving multimodal extremum, the BSO requires iterative operations. When computing large data sets, its computation efficiency and accuracy are too low. In order to solve the above problems, we design and implement a parallel brainstorm optimization algorithm based on Spark. By parallelizing the clustering with the highest computational complexity in the BSO algorithm and the generation process of new populations, the speedup and efficiency of the algorithm are improved. In particular, based on the idea of parallelization, the population is divided into multiple subgroups for co evolution, and each subgroup produces new populations to maintain the diversity of the population and improve the convergence speed of the algorithm. Finally, the parallel BSO algorithm is used to solve the multimodal function. Experiments show that, when the total number of cores of the parallel nodes is 10, the computation time of the parallel BSO algorithm is saved by 50%, the computational accuracy is basically equal to the serial BSO algorithm, and the convergence speed is improved obviously. The results prove the validity of the parallelization of the BSO.

A L2 cache partitioning mechanism for

multithreaded array-based many-core processors

CHEN Yifei,ZHU Lei,LI Hongliang

2019, 41(03): 400-408. doi:

Abstract ( 195 )

PDF (1151KB) ( 246 ) 　　

Review attachment

Because of its high computational performance and energy efficiency ratio, array-based many-core processors have been widely used in the high performance computing field. To build future high performance computing systems, processor must solve the severe challenge of ‘memory wall’ and core synergy problem. In a typical array-based many-core processor, the core adopts the single-threaded structure to reduce overhead. However, the demand for memory access is higher. We introduce the hardware simultaneous multithreading technology into the single core structure. Aiming at the problem that the utilization rate of the singlecore multithreaded L2 cache is significantly low, we present a L2 cache partitioning mechanism (thread-based cache partitioning) for the array-based manycore processor. Experimental results demonstrate that, based on the L2 cache partition mechanism, the miss rate of the L2 instruction cache is decreased by 18.59%, the miss rate of the L2 data cache is decreased by 6.60% and the CPI performance is increased by 10.1%.

A memristor neural network circuit based on temporal rule

HUANG Chenglong，HAO Dongdong,FANG Liang

2019, 41(03): 409-416. doi:

Abstract ( 195 )

PDF (983KB) ( 259 ) 　　

Review attachment

The memristor is the resistor with dynamic characteristics. Its resistance value can be changed according to the variation of the external field, and it can maintain the original resistance value after the external field is removed. Memristors can be used to store synaptic weights thanks to their similar characteristics to the connection strength of biological synapses. In order to realize the function of recognition and learning of the IRIS dataset based on the temporal rule, we design a SPICE simulation circuit of neural networks which takes the bridge memristor as synapses. The circuit uses the single pulse encoding method, and the time of the pulse represents data information. The neural network circuit consists of 48 pulse input ports, 144 synapses and three output ports. The synaptic weights are modified based on the learning rules of the temporal rule. Simulation results show that the classification accuracy on the IRIS dataset can reach 93.33%, which proves that the proposed neural network circuit can be used in brain-like pulse neural networks.

A load balancing strategy on heterogeneous

CPU-GPU data analytic systems

SUN Tingting，HUANG Hao，WANG Jialun，WENG ChuliangSUN Tingting，HUANG Hao，WANG Jialun，WENG Chuliang

2019, 41(03): 417-423. doi:

Abstract ( 213 )

PDF (798KB) ( 461 ) 　　

Review attachment

With strong parallel computation power, GPU-based data analytic systems can achieve better performance than traditional CPU-based data analytic systems. However, how to leverage the resources of CPU and GPU to dispatch workloads appropriately in massive data scenarios remains to be solved. We propose a load balancing strategy on heterogeneous CPU-GPU data analytic systems. It adopts a pipeline model to split workloads, and a load balancing model based on pipeline is proposed to dispatch workloads to heterogeneous processors reasonably and appropriately, which reduces the total execution time of the system and enhances the performance. Experimental results show that the load balancing model based on pipeline is suitable for various queries with different dataset volumes and has great performance.

A heterogeneous computing oriented

structural parallel programming framework

LI Anmin,JI Weixing,LIAO Xinyi,GAO Jianhua,TAN Zhaonian,WANG Yizhuo，SHI Feng

2019, 41(03): 424-432. doi:

Abstract ( 496 )

PDF (1180KB) ( 483 ) 　　

Review attachment

With the advent of artificial intelligence era, heterogeneous computing has been playing a more and more important role in deep learning and scientific computing. One of the bottlenecks that limit the application of heterogeneous computing systems is a lack of efficient software development framework. Existing programming frameworks like OpenCL and CUDA, base on C/C++ language and traditional parallel programming methods, and support hardware like GPU, DSP and FPGA, which are complained due to their low efficiency in software development as well as the difficulties in software reasoning and debugging, leading to clumsy handling of the cooperation and scheduling between computing devices. We introduce a scriptbased structural parallel programming framework for heterogeneous computing platforms, which provides a structural parallel programming interface to support the mapping of computing tasks to heterogeneous computing devices, and facilitate the reasoning and verification of parallel programs. We also design and implement a structural scheduling algorithm based on the genetic algorithm, which fully utilizes the computing capability of heterogeneous systems and enhances the efficiency of software development. Experimental results show that the proposed programming framework achieves 1.5× ~ 2.5× speedup in comparison to a single processor on the CPU+GPU platform.

An approach to optimizing the execution plan

of scientific workflows in cloud environment

GUO Hongle，CHEN Wanghu，MA Shengjun，LI Xintian，QIAO Baomin

2019, 41(03): 433-439. doi:

Abstract ( 143 )

PDF (703KB) ( 260 ) 　　

Review attachment

In order to reduce the cost of scientific workflow execution in cloud environment, we propose an approach to optimizing the execution plans of scientific workflows in cloud environment. It introduces the monkey group algorithm and relies on the intra-level and inter-level optimization of the current execution plan. Under the premise of ensuring the global deadline of the workflow, through the logical aggregation of the same-level tasks and the inter-level adjustment of the tasks, the difference in the number of tasks at each level is minimized to avoid waste of resources and reduce the waiting time of tasks. Experiments show that compared with the BTS algorithm and the SPSWVC algorithm, the proposed method can reduce resource consumption and the total delay time of tasks.

An HDFS storage optimization strategy based on Cauchy code

XIE Guojun,SHEN Jiquan,YANG Huanhuan

2019, 41(03): 440-445. doi:

Abstract ( 165 )

PDF (523KB) ( 312 ) 　　

Review attachment

With the advent of the big data era, data storage is facing severe challenges. The traditional Hadoop distributed file system (HDFS) has problems such as high storage redundancy and insufficient load balancing. Aiming at these problems, based on Cauchy code, we propose a Cauchy dynamic decentralized storage (CDDS) strategy. For the data blocks in the system, this strategy can generate different storage schemes based on their heat levels while ensuring data availability. For the cold data and hot data in the system, we adopt the Cauchy based erasure code technology to perform singlecopy storage and multicopy storage respectively, which guarantees the reliability of the data and the I/O capability of the system. Test results show that the CDDS strategy reduces data storage space to 75% of the original, and enhances the system’s reliability and load balancing capability.

An optimized heat dissipation structure

design for laptop computers

LI Hong,LI Jun,GONG Guohui

2019, 41(03): 446-451. doi:

Abstract ( 197 )

PDF (597KB) ( 472 ) 　　

Review attachment

We analyze the main factors for poor heat dissipation of an autonomous controlled laptop computer based on domestic central processing unit, and propose an optimized design according to each key factor. The overall heat dissipation performance of the whole laptop computer is improved by optimizing the design of the thermal module, air passage and rotation shaft cover. The temperature test data show that when the computer is running at full load in the working environment of 25℃, the temperature of each test point of the optimized laptop computer is obviously reduced. Therefore heat dissipation performance of the laptop computer has been greatly improved. The hottest area of the C shell keyboard is 35 degrees centigrade, which brings good user experience.

Testing for reconfigurated faulty

digital microfluidic chips

ZHANG Ling1,KUANG Jishun2

2019, 41(03): 452-457. doi:

Abstract ( 142 )

PDF (468KB) ( 275 ) 　　

Review attachment

The digital microfluidic chip (DMFC) is widely used in safety-critical biomedical applications, which has very high requirement for reliability.
Due to the reconfigurability of the DMFC, when the number of faults is less than a certain ratio, electrode arrays can skip the faulty units and be reused after
they are reconfigurated, which are called reconfigurated microchips. The irregular electrode units of the reconfigurated microchip should be robustly and completely tested to avoid any bioassay errors. We propose parallel testing on irregular electrode units after reconfiguration for the first time. The reconfigurated electrode array is divided into multiple sub-arrays with the same size. each sub-array is allocated with a test droplet for parallel test to minimize test time. The timeminimizing problem is converted to a distribution problem of test source pool, and an ILP model is designed for this NP-complete problem to achieve minimal test time. Experimental results show that the proposal avoids repeated diagnosis of faults, minimizes test time for the reconfigurated microchip after faults, and achieves a good test effect.

A wireless network anti-pollution attack scheme based on

message authentication and hybrid homomorphic signature

YANG Jing,FAN Mingyu,WANG Guangwei

2019, 41(03): 458-465. doi:

Abstract ( 145 )

PDF (983KB) ( 290 ) 　　

In order to improve the performance of wireless networks against pollution attack, we propose a wireless network anti-pollution attack scheme based on message authentication and homomorphic signature. Firstly, we use the source node, nonsource node set and link set of the directed multi-graph to construct the model of wireless network coding process, and establish a network anti-pollution model based on data pollution attack and label pollution attack. Secondly, using MACs, D-MACs and the Sign homomorphic signature scheme to establish a hybrid homomorphic signature scheme, which can improve the message verification process of the anti-pollution attack model, ensure the integrity of the content of each MAC coded packet, and improve the execution efficiency of the algorithm. Finally, experiments under the simulation environment of the ASNC mechanism verify the performance advantage of the proposed algorithm from three perspectives, which are the percentage of contaminated nodes, cumulative distribution of traffic and calculation efficiency.

A motif-based topology partitioning

method for target area networks

YANG Di，LIU Yan，CHEN Jing，ZHANG Weili

2019, 41(03): 466-478. doi:

Abstract ( 153 )

PDF (1065KB) ( 347 ) 　　

With the development of the information society, the importance of network security has become increasingly prominent, and the accurate geographical location of network entities can help better implement network management. The existing classical network entity location method based on topology heuristic clustering adopts the clustering based on network structure to cluster network entities, does not consider the specific characteristics of the network topology, and leads to final results with big error. In order to solve this problem, we propose a target area network topology partitioning method based on the motifs. According to the high clustering characteristics of local nodes in the target network topology, we innovatively introduce the concept of "motif" and analyze the motif structure in the target network topology. Learning from the idea of the initial seed expansion in local community discovery methods for the complex network, we take the motif structure as the initial seed to expand, and divide the nodes closely connected with the motif in the topology into different sets. Finally we locate the node sets according to the landmark and the public IP geo-location database, and take the location of the set as the geographic location of the nodes within it so as to achieve the bulk positioning of network entities. Experiments based on the network topologies in Hong Kong and Taiwan show that compared with the classical HCBased method and network node clustering method (NNC), the positioning accuracy of network entities of our method can be enhanced by about 25% and 16% respectively, and there are more network entities can be located in a batch way.

A QoE level evaluation model

based on user experience delay

GONG Wencong1,YUAN Jingling1,2,CHEN Mincheng1,XIANG Yao1,ZHAO Zikang1

2019, 41(03): 479-484. doi:

Abstract ( 187 )

PDF (522KB) ( 266 ) 　　

It is difficult to obtain the data reflecting user experience in Web service and evaluate the quality of user experience (QoE). Based on the analysis of Web data and existing evaluation methods, we propose a QoE level evaluation method that incorporates user experience delay (ED). This method is based on the analytic hierarchy process and combines the subjective feelings of human physiology to measure user experience more objectively. The feasibility of this method is verified through relevant cases.

An improved positioning algorithm for scenic areas

based on two times weighted LANDMARC

JIN Peng1,XI Tao1,WANG Lijing2

2019, 41(03): 485-489. doi:

Abstract ( 147 )

PDF (608KB) ( 256 ) 　　

Aiming at the problem of low positioning accuracy caused by complex terrains in scenic areas, we propose an improved two times weighted positioning algorithm based on the traditional indoor positioning algorithm LANDMARC to solve the complex positioning problem that some reference labels in scenic areas are not uniformly distributed. Firstly, the algorithm can obtain the coordinates of the labels to be located by one weighted positioning. Then, the obtained coordinates are connected with the vertexes of reference areas respectively. The reference area is divided into K triangular regions, and the centers of the inscribed circle of K triangular regions are obtained respectively. The K centers are taken as the reference coordinate of the nearest neighbor. By setting the coefficients of the second weighting, a more accurate positioning label coordinate is calculated. Experiments on a typical scenic area show that the improved positioning algorithm has a 10.6% improvement in positioning accuracy in the complex scenic environment compared with the traditional one-time weighted positioning algorithm, which proves its better applicability in complex scenic areas.

A dynamic reward and punishment based branching

heuristic algorithm for CDCL SAT solvers

CHEN Xiulan，LIU Ting

2019, 41(03): 490-497. doi:

Abstract ( 210 )

PDF (479KB) ( 287 ) 　　

Branching heuristic algorithms play an important role in CDCL SAT solvers, and the conventional branching heuristic algorithms are more concerned with the number of conflicts when calculating the activity scores of variables and do not consider the influence of the decision level or conflict decision level. In order to improve the efficiency of solving SAT problem, we propose a dynamic reward and punishment based branching method (DRPB), which is inspired by the EVSIDS and ACIDS. Whenever a conflict occurs, the DRPB updates the activity score of variables by integrating the numbers of conflicts, decision-making level, conflict decision level, and conflict frequency of variables. We improve the Glucose version 3.0 by replacing the VSIDS algorithm with the DRPB, and conduct an empirical evaluation not only on instances of SATLIB benchmarks, but also on 2015 and 2016 SAT competitions. Experimental results show that compared with the traditional and single variable branching heuristic algorithms, the proposed strategy can reduce the size of the search tree and improve the performance of the SAT solvers by reducing the branches of the search tree and the number of Boolean constraint propagation.

Core functional unit design of electronic

document archiving management system

ZOU Minglu1，LUO Yuan2

2019, 41(03): 498-504. doi:

Abstract ( 141 )

PDF (851KB) ( 299 ) 　　

Taking the design of data import unit as an example, we explore the design philosophy of the core functional unit of electronic document archiving management system. Based on the establishment of a functional unit working flow and determining the logic operation and logical processing, we discuss object model establishment, definition of a class and relationship between classes, logic implementation of operation function, and unit functional testing.

Summary on computing structure and

platform for autonomous driving

ZOU Wenchao,LI Renfa,WU Wufei

2019, 41(03): 505-512. doi:

Abstract ( 179 )

PDF (924KB) ( 412 ) 　　

With the continuous development of information and communication technology, the field of autonomous driving cars has made a great progress. Autonomous driving cars effectively solve the user’s increasing demand for safety, convenience, comfort, efficient travel, and other personalized needs. The autonomous driving car is very popular among the traditional manufacturers, emerging auto companies and people for its great commercial prospect and application value. Starting from the definition of autonomous driving cars, we classify the computing tasks of autonomous driving cars and describe the computing structures of the mobile terminal, the edge node and the cloud. Then, we discuss current mobile terminal computing platforms of autonomous driving cars, which shows heterogeneous and multi-core characteristics, and compare and analyze the current mainstream underlying computing platforms. Finally, we summarize the problems and challenges encountered in the computing structure and platform of autonomous driving cars from aspects of performance cost and information security.

A new kNN multi-label classification

algorithm with label-specific features

JIANG Yun，XIAO Xiao，HOU Jinquan，CHEN Li

2019, 41(03): 513-519. doi:

Abstract ( 122 )

PDF (498KB) ( 297 ) 　　

In a multi-label learning system, each sample is associated with multiple class labels at the same time but described by only one feature vector. The common strategy adopted by most existing multi-label classification algorithms is to predict all class labels using the same set of features. It is not the best choice as each label is probably most relevant to its own characteristics. To solve this problem, we propose an improved kNN multi-label classification algorithm with lablespecific features, named IML-kNN. Firstly, the IML-kNN preprocesses the feature vectors of multi-label data and constructs the most discriminative feature for each class of labels. Then, the IML-kNN algorithm is used to do classification based on the obtained characteristics. Experimental results show that the IMLkNN algorithm is obviously superior to the ML-kNN algorithm and other three commonly used multi-label classification algorithms on the yeast and image data sets.

Application of an improved hierarchy

interval-valued hesitant fuzzy TODIM

method in cloud manufacturing resources

WANG Tiedan, ZHAO Yang, PENG Dinghong

2019, 41(03): 520-530. doi:

Abstract ( 153 )

PDF (1130KB) ( 329 ) 　　

Aiming at the decision problem in hierarchical criteria structure, considering the decision makers’ uncertainty and hesitant preferences and the cognitive referencedependency and loss aversion behavior, we propose an intervalvalued hesitant fuzzy (IVHF) hierarchical TODIM method. Firstly, we construct an IVHF distance measure based on the verification of the traditional JACCARD distance measure. Then, combining the idea of the analytic hierarchy process method and TODIM method, we establish an IVHF decision method for solving the decision problem in hierarchical criteria structure. Finally, we apply the proposed method to the selection of cloud manufacturing resources, and comparative analysis demonstrates its effectiveness and feasibility.

Virtual arm motion planning based on

heuristic Bi-RRT algorithm

CHEN Jingjie，GENG Liqin

2019, 41(03): 531-537. doi:

Abstract ( 204 )

PDF (1248KB) ( 356 ) 　　

In order to improve the efficiency of the motion path planning of virtual human arms when it is repairing in complex multi-obstacle space, we propose a path planning method based on the heuristic Bi-RRT algorithm. Combining with the heuristic searching idea based on the Bi-RRT algorithm, we make a path planning of the wrist joint. Multiple distances between the random search point and the target point are compared, and the random search points with the shortest distance to the target point are remained. This makes the random tree have a certain direction of expansion and invalid search is reduced. Simulation results show that the heuristic Bi-RRT algorithm has better search results and a higher search efficiency than the traditional RRT algorithm and Bi-RRT algorithm. It also has a good effect on the motion path planning of 7-DOF virtual arms.

Current Issue

Author center

Review center

Online journal