Computer Engineering & Science

Design and implementation of processor pseudo-random

test generator based on models and libraries

JU Peng-jin，ZHANG Xiao-dong，LI Hui

2018, 40(1): 1-9. doi:

Abstract ( 351 )

PDF (1057KB) ( 523 ) 　　

Processor pseudo-random test generators are important and necessary in processor research and development, which can generate large numbers of tests so as to cover the huge verification space. However, some of tests in the test set may become illegal and useless when the processor design changes, especially when instruction set or architecture changes, resulting in significant verification and maintenance costs. In order to tackle the problem, a hierarchical method based on models and libraries is proposed to create a processor pseudo-random test generator. Several techniques such as instruction set tree modeling, multidimensional memory address modeling and processor professional knowledge library modeling are used to solve the problem of how to efficiently reuse test sets in processor research and development. The practical application shows that this method can well adapt to the changes of processor design and enhance the usability and reusability of processor pseudo-random test generators. The reuse rate of test set can reach more than 95%, which can significantly shorten the verification cycle when the processor design is changed and upgraded.

Orchestrating HPL between CPU and China accelerator

GAN Xin-biao1,2，SUN Liao-yuan3,LIU Jie1，XIONG Cheng-wei1,HUANG Jia-kun1

2018, 40(1): 10-14. doi:

Abstract ( 430 )

PDF (802KB) ( 526 ) 　　

Review attachment

HPL is a Linpack benchmark package widely used in high performance computing test. Matrix is divided into sub-matrix and distributed into computing elements in traditional HPL algorithm. However, it is ineffective for China Accelerator because of a specified interface on matrix multiplication built in China Accelerator. Thus, dPEM (delicate Partition and Encapsulation on Matrix) is advised to expose a friendly testing configuration environment. Furthermore, we propose OA4MM (Orchestrating Algorithm for Matrix multiplication) based on heterogeneous system composed of CPU and China Accelerator. Experimental results validate dPEM and OA4MM on CPU + China Accelerator. OA4MM can promote productivity up to 10% in comparison to heterogeneous HPL.

PFPonCanTree：A parallel frequent patterns

incremental mining algorithm based on MapReduce

XIAO Wen,HU Juan,ZHOU Xiao-feng

2018, 40(1): 15-23. doi:

Abstract ( 429 )

PDF (876KB) ( 411 ) 　　

Review attachment

Frequent pattern mining is one of the most important data mining tasks. Traditional frequent pattern mining algorithmsare executed in a "batch" mode, that is,all the data are mined in one time, so they cannotmeet the needs of the ever-growing bigdata mining. MapReduce is a popular parallel computing modeland has been widely used in the field of parallel data mining. In this paper, we migrate the traditional frequent pattern incremental mining algorithm CanTree to the MapReduce computing model,achieving a parallel frequent pattern incremental miningalgorithm. The experimental results show that the proposed algorithm achievesbetterload balancing and improvesthe execution efficiency significantly.

A parallel multiple logical columns reconfiguration

algorithm on fault-tolerant processor arrays

ZHANG Zi-kai，WU Ji-gang，JIANG Wen-chao，LIU Zhu-song

2018, 40(1): 24-33. doi:

Abstract ( 373 )

PDF (903KB) ( 388 ) 　　

Review attachment

Efficient fault-tolerant reconfiguration techniques are essential for improving the reliability of high performance architecture such as mesh-connected processor arrays. Meanwhile, reconfiguration must be achieved as fast as possible to meet the real-time constraints. Existing techniques of generating maximum logic array on parallel reconfiguration are only for the single logical column, and no maximum logic array algorithm is reported for reconfiguring multi-logical columns in parallel on processor arrays. According to the potential parallelism of mesh-connected processor arrays, based on the divide and conquer strategy, this paper proposes a parallel algorithm to reconstruct the logical array. The proposed algorithm divides processors array into subarrays, then reconfigures each subarray in parallel. After that, it merges the logical subarrays in parallel. The proposed algorithm can effectively accelerate the running speed by reducing the data redundancy in communication and calculation. Moreover, it is proved that the proposed algorithm can generate the maximum logic array. Experimental results show that the proposed algorithm is faster by nearly 39.6% than the existing parallel algorithm on a 64×64 processor array and has good scalability.

Accelerating CNN on mobile GPU

WANG Xiang-xin1，SHI Yang2，WEN Mei2

2018, 40(1): 34-39. doi:

Abstract ( 495 )

PDF (550KB) ( 477 ) 　　

Review attachment

Convolutional Neural Networks (CNNs) are playing an increasingly important role in areas such as image classification and speech recognition because of their excellent performance. Some researchers have already wanted to apply this deep learning process on mobile phones, but the performance of the porting program is unsatisfactory due to the huge amount of computation of CNN. In order to explore how to solve this problem, this paper uses a deep learning framework named MXNet to realize the forward process of CNN on mobile phones and focuses on the use of GPU that is another powerful computing device on the mobile phone. Based on the OpenCL common programming framework, we use matrix multiplication to compute the most time-consuming convolution in the forward process and move it to the GPU. Besides, serval improvements are made to achieve better performance. Finally, the experimental results show that we succeed in reducing the time of the forward process to half of the original time.

A message authentication scheme for VANET

based on certificateless proxy re-signature

YANG Xiao-dong1,2,YANG Ping1,LI Yan1,LIU Ting-ting1,WANG Cai-fen1

2018, 40(1): 40-44. doi:

Abstract ( 484 )

PDF (419KB) ( 422 ) 　　

Review attachment

Based on proxy re-signature and certificateless public key cryptography, a message authentication scheme for vehicular ad hoc network (VANET) is presented. By using the proxy re-signature technology, the trusted authority can convert a signature generated by on-board unit into one from the road-side unit on the same message. Hence, it not only effectively reduces the risk of tracking vehicle according to signature, but also ensures anonymous message communication in VANET. If the vehicle releases false news, the trusted authority can accurately track the true identity of the vehicle and recall illegal vehicles. Moreover, compared with Huang scheme, the proposal has higher security and lower communication overhead.

High reliable emergency warning based on

digital broadcast signaling aggregation

GAO Jian，CHEN Wei-gang，YANG Jin-sheng

2018, 40(1): 45-51. doi:

Abstract ( 340 )

PDF (725KB) ( 402 ) 　　

Review attachment

Terrestrial digital video broadcasting system is mainly used to provide audio and video communication services, and can be applied to information dissemination in emergency situations. However, in some special circumstances such as war or natural disasters, the established signal transmitting system tends to be damaged and it may be difficult to achieve reliable broadcasting. In view of the above problems, this paper takes DVB-T system as an example and proposes a high reliable emergency warning method based on digital broadcast signaling aggregation, which can achieve high reliable short message transmission under low transmission power. In a pre-set fixed transmission mode, the method utilizes the saved signaling part to carry the short message data, uses the non-binary Low Density Parity Check Codes (LDPC) with the rate of 1/2 in the field of GF(16) and the repeat coding as the channel coding scheme, and uses Binary Phase Shift Keying as the modulation method. Further, the proposed method is tested and verified on the software-defined radio platform in an open environment. The results show that this method can achieve high reliable transmission of emergency short message data, and can be extended to other digital broadcasting system.

An adaptive routing protocol in

intermittently connected mobile networks

JIANG Qing-feng1,2，MEN Chao-guang1，TIAN Ze-yu1,MA Ying-rui3

2018, 40(1): 52-57. doi:

Abstract ( 414 )

PDF (713KB) ( 366 ) 　　

Review attachment

In order to improve the efficiency of message transmission in intermittently connected mobile networks, an Adaptive Routing Protocol Based on OLSR (ARPBO) of mobile ad hoc network is proposed. ARPBO forwards messages rapidly by the OLSR protocol when the network is connected. When the network is interrupted, ARPBO extends the OLSR protocol, selects the next hop node effectively by the message transmission node's local connected network, and then forwards messages through the "store-carry-forward" mechanism of delay tolerant network. The experimental results show that the proposed routing protocol can achieve higher message delivery success ratio and lower delivery delay when the network is intermittently connected.

A parameterized algorithm for minimum node

weighted Steiner tree in logistics networks

LUO Yu-hong，LI Li

2018, 40(1): 58-65. doi:

Abstract ( 413 )

PDF (802KB) ( 418 ) 　　

Review attachment

By optimizing logistics transport network, logistics costs can be effectively reduced. The logistics network optimization problem of centralized distribution can be transformed into the minimum nodeweighted Steiner tree problem, which is an NP-hard problem. In this paper, a new heuristic solution algorithm P-NSMT is proposed by using parameter theory.The idea of the algorithm is as follows:the algorithm first builds a minimum spanning tree with connected terminal nodes, and then adds the Steiner node into the tree so as to reduce the total weight of the spanning tree, and lastly generates a minimum Steiner tree whose total number does not exceedparameter k. Experiments show that the P-NSMT algorithm outperforms other algorithms in terms of accuracy and time efficiency, and is especially suitable for the logistics networkswith large network size and fewer terminal delivery nodes.

GDOP study of the ray tracing

based AOA positioning algorithm

KONG Fan-zeng1,GUO Min2,REN Xiu-kun1,ZHENG Na-e1

2018, 40(1): 66-71. doi:

Abstract ( 525 )

PDF (1149KB) ( 450 ) 　　

Review attachment

GDOP is an important index to measure the precision of the positioning system. The ray tracing-based localization algorithm uses the mirror station to locate the target. In view of the problem that the existed GDOP calculation methods do not consider the correlation between the image base stations (I-BS) and hence cannot analyze the relationship between the accuracy and the I-BS layout of the ray tracing based positioning system, this paper proposes a GDOP calculation method for the ray tracing based AOA positioning algorithm (AOA-RT). In the micro-cell AOA location model, according to the relationship between the I-BSs and the base station, the correlation matrix of position errors of I-BSs is obtained, and the calculation formula of the GDOP is derived. The simulation proves that the GDOP calculation method is reasonable, and we get several related conclusions.

Image approximation based on local dictionary

searching and multi-atom matching pursuit

HUANG Ya-fei1,2,LIANG Xi-ming1,FAN Shao-sheng2

2018, 40(1): 72-78. doi:

Abstract ( 414 )

PDF (596KB) ( 370 ) 　　

Review attachment

Global searching in dictionary with single atom being selected in each iteration leads to greedy algorithms’ high complexity in sparse decomposition. Given this, we propose an improved matching pursuit (MP) algorithm named local dictionary searching and multi-atoms matching pursuit (LMMP).

Calculation showed that the order of kernel atoms in the adjacent generation of MP algorithm is basically stable, the best atom just to search in local dictionary consisting of the front order atoms. Searching for multiple incoherent atoms on single iteration to further improve the speed of MP algorithm. Reduce the approximation error by updating the residual image one by one atom in turn.

Theoretical analysis indicates that the LMMP algorithm is convergent and its time complexity is several orders of magnitude lower than the MP. Experimental results show that the LMMP algorithm outperforms other global searching methods in computational speed and approximation performance.

A traffic signs’Chinese character recognition algorithm

based on mixed optimized deep Boltzmann machine

LI Wen-xuan,SUN Ji-feng

2018, 40(1): 79-85. doi:

Abstract ( 382 )

PDF (763KB) ( 419 ) 　　

In order to improve the recognition rate of traffic signs’Chinese characters, we propose a mixed optimized deep Boltzmann machine(MDBM) algorithm to improve the approximation of probability distribution. Two sampling methods (grayscale sampling initialization and binary sampling initialization) are proposed to construct the restricted Boltzmann machines, which are overlapped to form the depth Boltzmann machine. In addition, we propose a fine-tuning algorithm, called complex conjugate gradient method, to improve the fine-tuning part in deep Boltzmann machine. Experiments on traffic signs data show that the recognition rate of the proposed algorithm is better than that of the original deep Boltzmann machine.

A low dose CT image statistical reconstruction algorithm

based on discrete shearlet regularization

ZHANG Hai-yan1,ZHANG Li-yi1,2,SUN Yun-shan1,2

2018, 40(1): 86-92. doi:

Abstract ( 737 )

PDF (726KB) ( 421 ) 　　

Review attachment

Though reducing the number of projection angles or lowering the current intensity of X-ray tube can reduce radiation dose and therefore alleviate damage to human bodies, the former measure can result in incomplete projection data while the later causes a declined signal to noise ratio of projection data. We propose a low-dose CT image statistical iterative reconstruction algorithm based on sparsity constraint in shearlet domain. The statistical weighting coefficient of data fidelity terms is introduced to reduce the influence of noise on the reconstruction results, and the sparse representation of intermediate images in shearlet domain is added into the objective function as a regularization item by means of the augmented Lagrangian method so as to narrow down the solution space and obtain stable and accurate reconstruction from incomplete projection data. According to experimental data, this algorithm can get high-quality images when projection data is far from completeness or the signal to noise ratio of projection data declines sharply. The proposed algorithm can be used for attaining reconstructed images that clearly keep structural details when the radiation dose is decreased to 10% of the filtered back projection (FBP) or even lower degrees.

Monocular vision agricultural machine

localization based on pose threshold filter

HUANG Pei-chen1,2,LUO Xi-wen 1,2,ZHANG Zhi-gang 1,2,LIU Zhao-peng 1,2

2018, 40(1): 93-100. doi:

Abstract ( 348 )

PDF (752KB) ( 506 ) 　　

Review attachment

Precise localization is an important premise of autonomous navigation for agricultural vehicles. We propose a localization algorithm based on monocular camera. After features are detected and tracked through multiple frames, vehicle poses are estimated based on 3D-2D correspondences. Furthermore, the translation absolute scale is calculated based on the assumption that the ground patches are locally flat and the camera is moving at a known and fixed height over the ground. Finally, the poses are refined by the pose threshold filter. Compared with the RTK-GPS data, the average relative position errors of the three different experimental terrains are 5.459 9%, 8.373 1% and 6.443 94%, and the average heading errors are 7.717 7°, 5.738 9° and 3.438 3°. The results show that the algorithm is feasible for agricultural vehicles localization.

A Blinn-Phong BRDF infrared reflection model

LI Min,YANG Yi-bin,WANG Ya-nan,YANG Min

2018, 40(1): 101-107. doi:

Abstract ( 603 )

PDF (767KB) ( 605 ) 　　

Review attachment

Aiming at problems of the complicated calculation of radiation reflection components and realistic deficiency in the infrared scene simulation, we present a Blinn-Phong BRDF infrared reflection model, which can be used in 3D infrared scene simulation based on Unity. Based on the threshold segmentation of measured infrared images, the target surface temperature is solved by using the simulation link inversion model, which includes simplification of radiance operation and infrared imaging process. According to the similarity between the infrared radiation theory and visible light illumination model, the modified Blinn-Phong model is transplanted into the infrared band, and the bidirectional reflectance distribution function (BRDF) is introduced to improve simulation accuracy. The Blinn-Phong BRDF infrared reflection model is presented. Finally, zero stadia simulation scenes are built based on this radiation reflection model, and simulated images and measured images are compared to verify the reliability and effectiveness of the reflection model. Experimental results indicate that the reflection model has high simulation efficiency, and can better simulate the high light phenomenon of infrared reflection, meeting the requirement of radiation reflection in the infrared scene simulation.

Face recognition based on structured low

rank representation and low rank projection

LIU Zuo-Jun，GAO Shang-bing

2018, 40(1): 108-115. doi:

Abstract ( 387 )

PDF (627KB) ( 384 ) 　　

Review attachment

Occlusion and corruption in the training images result in degraded performance of the sparse representation classification (SRC) algorithm in practical applications of face recognition. Aiming at the aforementioned problem, we propose a new face recognition method based on structured low rank representation (SLR) and low rank projection (LRP), called SLR_LRP. Firstly, the original training samples are decomposed via SLR to obtain clean training samples. And a LRP matrix is learned based on the original training samples and the recovered clean samples. Secondly, test samples are projected onto the LRP matrix. Finally, SRC is exploited to classify the corrected test samples. Experiments on the AR and the Extended Yale B face databases demonstrate that the SLR_LRP can effectively deal with the occlusion and pixel corruption in samples.

Big image processing based on

improved three level buffers technology

FENG Wei-huan,WANG Mao-zhi,ZENG Ying-chao

2018, 40(1): 116-120. doi:

Abstract ( 390 )

PDF (687KB) ( 346 ) 　　

Review attachment

Given the problem of system blocking caused by the traditional technology when processing big images and redrawing user interface, we propose an improved three level buffers technology. Firstly, a memory area compatible with the properties of the big image is allocated to load the big image data. Secondly, another allocated memory area compatible with the display device is used to execute other image processing instructions. Finally, the image data is copied to the real display device by the bit-block-transfer method in sequence. So only the memory area compatible with the display device is released and another memory area can avoid reloading the image when redrawing the user interface. The proposed technology is proved to be effective when processing the big drilling histogram in the digital core system software.

A hybrid differential shuffled frog leaping

algorithm based on selection strategy

WANG Lin1,WAN Xiao-yu1,WAN Jian-chao2

2018, 40(1): 121-127. doi:

Abstract ( 393 )

PDF (686KB) ( 369 ) 　　

Review attachment

A Selected Differential Shuffled Frog Leaping Algorithm (SDSFLA) is proposed. Compared with the classic Shuffled Frog Leaping Algorithm (SFLA), SDSFLA improves the efficiency of populationupdatingby increasingthe number of updated vectors, uses thecrossover operator and mutation operator of Differential Evolution(DE) to strengthen information exchanges between vectors, uses multiple updating strategies to improve the success rate of the trial vectors, and increases the diversity of the population viarandomly-selected control parameters.Based on 16 benchmark functions, SDSFLA is compared with an improved frog leaping algorithm and two improved differential evolution algorithms. The test results confirm the validity and stability of the SDSFLA algorithm.

An IMU-based human-machine cooperation

control algorithm of active dancing robot

LIU Zhao1,2，SONG Li-bin2，YU Tao2，GUO Kai2，WANG Zeng-xi2，GENG Mei-xiao2

2018, 40(1): 128-132. doi:

Abstract ( 521 )

PDF (599KB) ( 415 ) 　　

Review attachment

This paper proposes an IMU (Inertial Measurement Unit) -based human-machine cooperation control algorithm of active dancing robot The robot is equipped with three springs with a certain rigidity at the waist. Kalman filter is used to fuse the triaxial accelerometer data with the three-axis gyroscope data to obtain the changes of attitude angle of the robot under the influence of human partners. The threshold method is used to eliminate the jitter error of attitude angle. Combined with the current state of the robot, the mapping of the relative attitude angle to the velocity vector is obtained. Then, the desired trajectory speed and the human trajectory correction speed are fused to get the robot trajectory and the corrected coordinates of the remaining target points. The algorithm is applied to Waltz CCL (Closed Change Left) trajectory test, and the experimental results show that the algorithm works well.

based on sine function and its application

YANG Yong,ZHANG Dong-sheng,WANG Yan-ru,ZHANG Ya-nan

2018, 40(1): 133-138. doi:

Abstract ( 547 )

PDF (390KB) ( 509 ) 　　

Similarity measure is an important method to measure the similarity between two intuitionsitic fuzzy sets. Many formulas about similarity measure have been proposed, but some are unable to classify or involve complicated calculation in practical applications. To solve this problem, based on the cotangent similarity measurement method proposed by Tian and the information carried by the membership degree, nonmembership degree and hesitancy degree, we propose two sine similarity measure formulas which prove to be correct. Then we conduct comparative analysis among various similarity measures by several numerical examples to illustrate the effectiveness of the proposed sine similarity measures. Finally, the effectiveness of the method is illustrated by an example.

Current Issue

Author center

Review center

Online journal