Computer Engineering & Science

Optimization of the log pattern extraction

ZHAO Yi-ning,XIAO Hai-li

2017, 39(05): 821-828. doi:

Abstract ( 206 )

PDF (677KB) ( 544 ) 　　

The LARGE system is a log analysis framework deployed in the supercomputing environment in Chinese Academy of Sciences. It monitors and analyzes various log files in the environment through log collection, centrally analysis and result feedback. In the process of monitoring system logs, it is necessary for system maintenance personnel to reduce the large number of original logs into a small set of log patterns using the log pattern extraction algorithm. However, because of the fast increase of log size and the peculiarity of messages log files, the traditional log pattern extraction algorithm fails to satisfy the requirement of rapid processing of logs. We propose an optimization method for the log pattern extraction algorithm by introducing the idea of the MapReduce mechanism to accelerate the process of log pattern extraction in case of multiple input log files. Evaluation results show that when there are a number of input files, the optimization method can significantly improve the running speed of the vocabulary consistency algorithm and greatly reduce the running time. We also evaluate the time cost and the extraction effect the optimization algorithm when the vocabulary conversion function is used.

Performance optimization of stencil computation

on ARM64 multi-core microprocessor

FENG Lu-xia，LI Chun-jiang，HUANG Ya-bin

2017, 39(05): 829-833. doi:

Abstract ( 258 )

PDF (722KB) ( 418 ) 　　

Stencil computation is a class of important calculation kernels widely used in the field ranging from image and video processing to largescale scientific and engineering simulation and calculation. However, the evaluation of stencil computation on the ARM64 highperformance processor is rare. According to the features of AMCC XGENE2 and Phytium FT1500A, we design an optimization method based on twodimension bound, which reduces the parallelism overheads of thread scheduling，and increases the Cache hit rate by the threadCPU bound and threaddatablock bound. Experimental results show that this method can improve the performance of the stencil calculation on ARM64 architecture, and the results of our kernel demonstrate the good scalability on the two ARM64 multicore microprocessor platforms.

Shared L1 instruction cache for manycore processors

ZHANG Kun,LIU Xiao,ZHENG Fang,XIE Xiang-hui

2017, 39(05): 834-840. doi:

Abstract ( 180 )

PDF (1281KB) ( 428 ) 　　

Manycore processor design faces a great challenge in terms of chip area utilization. How to improve the proportion of the computing units in the limited chip area is a hot topic in the research on manycore architecture. We focus on the design of the instruction cache on the manycore processor. The instruction cache is shared among several processing cores in order to improve the pipeline performance. We propose a design of shared instruction cache and implement cycleaccurate performance simulation on it. The RTL codes of the shared instruction cache are synthesized to get the area cost and timing results. Experimental results demonstrate that the shared instruction cache can decrease 11% to 27% miss rate and improve the pipeline performance by 4% to 7%.

An optimized bagging decision tree

algorithm based on MapReduce

ZHANG Yuan-ming,CHEN Miao,LU Jia-wei,XU-Jun,XIAO Gang

2017, 39(05): 841-848. doi:

Abstract ( 167 )

PDF (860KB) ( 420 ) 　　

In order to address the shortcomings of overfitting and poor scalability of the C4.5 decision tree algorithm, we propose an optimized C4.5 algorithm with Bagging technique, and then parallelize it according to the MapReduce model. The optimized algorithm can obtain multiple new training sets that are equal to the initial training set by sampling with replacement. Multiple classifiers can be obtained by training the algorithm with these new training sets. A final classifier is generated according to a majority voting rule that integrates the training results. Then, the optimized algorithm is parallelized in three aspects, including parallel processing training sets, parallel selecting optimal decomposition attributes and optimal decomposition point, and parallel generating child nodes. A parallel algorithm based on job workflow is implemented to improve the ability of big data analysis. Experimental results show that the parallel and optimized decision tree algorithm has higher accuracy, higher sensitivity, better scalability and higher performance.

Multi-dimension browsing based on

features in massive file system

HE Yang，HE Lian-yue，CHEN Bo，XU Jun，XU Zhao-miao

2017, 39(05): 849-854. doi:

Abstract ( 158 )

PDF (714KB) ( 374 ) 　　

The small files distributed file system (SMDFS) can efficiently manage ten billions of files. However, a huge amount of data such as photos, music, etc., often needs to quickly browse files from multiple dimensions, and traditional files organization schemes based on the directory structure to manage massive files cannot easily meet this requirement. Based on the SMDFS file system, we introduce features to file attributes and put forward a featurebased massive small files inverted indexing technique and a distributed indexing technique, which enables the SMDFS browse files based on multiple features. Experimental results show that the featuresupported SMDFS can provide efficient management and rapid multidimensional browsing capability for massive small files while the massive small files access performance based on filedirectory structure is not significantly decreased.

Parallel design and implementation of

JPEG compression algorithm based on OpenCL

ZHANG Min-hua,ZHANG Jian-xian,QIU Xue-hong,ZHOU Duan

2017, 39(05): 860. doi:

Abstract ( 336 )

PDF (758KB) ( 455 ) 　　

As the scale of information data increases enormously, traditional singleprocessor or multiprocessor structure based computing devices are unable to meet the requirements of realtime data processing. The heterogeneous parallel computing technology attracts much attention and is widely applied for its effective computation efficiency and parallel realtime processing capability. We propose a parallel design of the JPEG compression algorithm based on OpenCL by using the advantages of the GPU in image processing. The JPEG algorithm is divided into multiple kernel programs, and the kernels are sequentially controlled by the event information transfer. The parallel algorithm is simulated and verified on the GPU+CPU platform. Experimental results show that under the same image quality condition, the parallel algorithm can improve algorithm implementation efficiency and reduce time substantially. And as the graph size increases, the efficiency of the algorithm obtains obvious improvement.

An index optimization model for

network security situation evaluation

WU Guo,CHEN Lei,SI Zhi-gang,BAI Li-fang

2017, 39(05): 861-869. doi:

Abstract ( 152 )

PDF (755KB) ( 578 ) 　　

Low accuracy and poor real-time performance are two issues in network security situation evaluation (NSSE). We present an index optimization model for network security situation evaluation based on factor analysis and principal component analysis. The model can deduce a class of comprehensive variables with strong independence to describe the existing index system. Experimental results show that this optimization can obtain more realtime evaluation results without accuracy loss for NSSE.

A new key-escrow-free hierarchical

identity-based encryption scheme

TANG Xin，QI Fang

2017, 39(05): 870-876. doi:

Abstract ( 163 )

PDF (564KB) ( 391 ) 　　

We present a new scheme to remove key escrow from the hierarchical identitybased encryption (HIBE), based on the security notion of anonymous ciphertext indistinguishability against key generation center (ACIKGC) proposed by Chow. In view of this, we firstly describe how to equip a modified framework in the HIBE system with the ACIKGC security. The private key generator (PKG) and identity certificate authority (ICA) cooperate in an anonymous private key generation protocol, such that the PKG can issue a private key to a user authenticated by the ICA without knowing the list of users’ identities. Then, we apply the proposed scheme to the HIBE system, and theoretical analysis shows that the scheme can provide better security and maintain the performance.

Cyber security assessment for SCADA systems

based on attackdefense game model

HUANG Hui-ping，XIAO Shi-de，MENG Xiang-yin

2017, 39(05): 877-884. doi:

Abstract ( 182 )

PDF (515KB) ( 414 ) 　　

SCADA system cyber security assessment is an important basic work to ensure the reliable work of the system. Existing evaluation methods do not take the mutual influence between the attacker and the defender and the economic effect into account. In order to solve this problem, we propose an assessment method based on attack defense tree and game theory. Based on the attack defense tree, this method calculates the expected payoff function of the attacker and the defender, and establishes the system's attack and defense game model. The mixed strategy Nash equilibrium of the complete information static game model is solved, and the probability distribution of the attack and defense strategy is obtained. We describe the application of the method in a case study. The evaluation results show that the method is reasonable and feasible, which can help risk managers to evaluate the investment benefit of the existing system information security and defense measures. So they can deploy the defensive measures focusing on some particular attack events to achieve maximum return of investment.

An enhanced forward error correction

coding in high-speed link transmission

FENG Xuan,HU Shu-kai,WANG Di,SONG Xin-liang,LI Hong-liang

2017, 39(05): 885-891. doi:

Abstract ( 145 )

PDF (813KB) ( 392 ) 　　

In the reliable transmission of highspeed links, research of forward error correction (FEC) in physical layers focuses on improving error correction coding performance. The fact that all coding redundancy is used for error correction leaves no margin to load usercustomized information in transmission. Thus, we design an enhanced FEC coding scheme with higher coding efficiency and greater flexibility. The coding performance and cost are evaluated by the Matlab and synthesis tools. FEC codes are capable of loading usercustomized information with no additional cost or harm to coding performance.

Implementation and optimization of

LTL probabilistic model checker

LIN Zhe-chao，DONG Wei

2017, 39(05): 892-896. doi:

Abstract ( 173 )

PDF (453KB) ( 328 ) 　　

Probabilistic model checking is one model checking technology, which cannot only verify the system qualitatively but also judge if the system satisfies quantitative properties. The complexity for LTL probabilistic model checking can be up to a double exponential level. Existing tools such as PRISM and MRMC do not support the validation of LTL properties. We optimize the LTL probabilistic model checking algorithm and implement an efficient LTL model checker. The proposed tool is evaluated via some comparative experiments.

Cooperative streaming process reengineering

based on sequential behaviors

HUANG Li1,2,TAN Wen-an1,XU Xiao-yuan2

2017, 39(05): 897-903. doi:

Abstract ( 167 )

PDF (744KB) ( 420 ) 　　

Stream data possesses realtime, continuous and sequential features. To detect implicit information and dynamic process in stream data, we propose a hybrid heuristic cooperative optimization algorithm based on time series prediction with selfadapting for stream process reengineering. Firstly, we define the stream process model, and improve the heuristic miner rules in the process logic based on the implicit dependency relation among log activities. Secondly, we define the ageing factor based on sequential behaviors and introduce the multiple particle swarm cooperation selfadapting strategy based on Gauss mutation to improve the local and global search capacity of the PSO algorithm, thus the process model is optimized and reengineered. Comparative experiments on four benchmark functions varify the better convergence and stability of the proposed algorithm in streaming process mining.

An unsupervised defect prediction

method based on probability

LU Zheng-fa,XU Ling,ZHANG Xiao-hong,CHEN Lin,YANG Meng-ning

2017, 39(05): 904-911. doi:

Abstract ( 145 )

PDF (1033KB) ( 375 ) 　　

Software defect prediction can improve the efficiency of software development and testing to ensure software quality. Unsupervised defect prediction methods can be quickly applied to engineering practice as they do not need labeled data. We propose an unsupervised defect prediction method (probabilistic clustering and labeling, PCLA) based on probability. This method evaluates the probability of the class’s defect by mapping the difference of the metric value and its threshold to probability, and then predicts class by clustering and labeling, which can solve the problem of information loss caused by the existing unsupervised methods that are sensitive to the threshold when they compare metric value with its threshold directly. The PCLA method is applied to seven datasets of NetGen and Relink. Experimental results show that the PCLA method has an average increase of 4.1%, 2.52% and 3.14%, respectively in recall rate, precision and Fmeasure in comparison with the existing unsupervised methods.

A data visualization method

based on extreme learning machine

CHEN Wen-bing，SONG Ma-jun，WANG Ting-chun

2017, 39(05): 912-918. doi:

Abstract ( 182 )

PDF (712KB) ( 424 ) 　　

Review attachment

We present a novel data visualization method based on the extreme learning machine (ELM), which uses the multidimensional scaling (MDS), Pearson correlation and Spearman correlation respectively instead of the common MSE to project the high-dimensional data onto a two-dimensional plane to carry out data visualization. Experimental results show that compared with the recent popular stochastic neighbor embedding (SNE) and t-SNE, the proposed method outperforms them in visual effect and computation performance. Furthermore, these experimental results also show that the MDS-based ELM has the best performance.

An empirical mode decomposition de-noising

method based on singular spectrum analysis

XIAO Xiao-bing,LIU Hong-li,MA Zi-ji

2017, 39(05): 919-924. doi:

Abstract ( 392 )

PDF (826KB) ( 529 ) 　　

Review attachment

We propose an empirical mode decomposition (EMD) de-noising method based on singular spectrum analysis (SA). Firstly, noisy signals are decomposed into several intrinsic mode functions (IMF) by the EMD in this method. The first IMF is regarded as high-frequency noise and the noise energy included in other IMFs can be estimated. Then, the ratio of signal energy in each IMF can be calculated. Secondly, the SSA is implemented on each IMF with proper window length and parts of proper singular value decomposition (SVD) components are selected to reconstruct the IMF according to the ratio of signal energy in each IMF. Finally, the denoised signals are obtained by adding all the reconstructed IMF and the residue. Compared with the wavelet soft threshold method, the EMD soft threshold method and the EMD filter method, the proposed method is superior to other methods as a whole, so it is an effective signal de-noising method.

A face recognition algorithm based on EHMM-SVM

LIU Huan,SU Shi-mei

2017, 39(05): 925-930. doi:

Abstract ( 177 )

PDF (592KB) ( 396 ) 　　

Review attachment

The EHMM relies on the output maximum likelihood probability to determine the face image. However, because of the similarity between face images, this method can lead to recognition errors. We propose a face recognition method based on the EHMM-SVM. The two-dimensional discrete cosine transform (2D-DCT) is used to extract face features so as to obtain the observation vector sequence. The output probability of each face image corresponding to the EHMM model is obtained by the double nested Viterbi algorithm. The output probability is input into the SVM for classification training and recognition tests, and the results of face recognition are obtained. ORL and YALE face databases are used in the experiments. Experimental results show the feasibility and effectiveness of the method.

An image segmentation algorithm of neighborhood

median weighted fuzzy C-means with spatial constraints

YANG Jun1，KE Yun-sheng1，WANG Mao-zheng2

2017, 39(05): 931-935. doi:

Abstract ( 167 )

PDF (866KB) ( 544 ) 　　

Review attachment

Euclidean distance is the most commonly used distance measurement method in the process of clustering analysis.The traditional Euclidean distance image segmentation method does not consider the spatial information, neighborhood characteristics and other factors. In order to use more image space information to improve the quality of image segmentation, in addition to implanting spatial constraints information of pixels, we propose an alternative neighborhood median weighted Euclidean distance to replace the Euclidean distance. The results of segmentation experiments on multiple images show that, compared with the existing algorithms, this algorithm cannot only improve image segmentation effect with a better noise resistance, but also accelerate the convergence and obtain high efficiency.

A dynamic weighted average pedestrian identification

modle based on adaptive feature selection

YANG Chao,CAI Xiao-dong,GAN Kai-jin,WANG Li-juan

2017, 39(05): 935-943. doi:

Abstract ( 170 )

PDF (1292KB) ( 482 ) 　　

In order to solve the problem of low recognition rate and slow recognition speed in pedestrian identification, we propose a dynamic weighted average ranking pedestrian identification method based on adaptive feature selection to solve the problem of pedestrian recognition. Firstly, the GrabCut algorithm is combined with the manifold-based saliency detection algorithm to improve the accuracy of pedestrian appearance extraction. Then, we propose an adaptive selection method to effectively extract pedestrian features. Finally, we design a dynamic weighted average ranking model to merge multidimensional dynamic features. Experimental results show that the proposed method can improve the accuracy of pedestrian recognition, and has good robustness to the change of attitude.

Visual tracking based on analysis of sparse

error matrix in low rank projection

YANG Guo-liang,TANG Jun,WANG Jian,ZHU Song-wei,LIANG Li-ming

2017, 39(05): 944-950. doi:

Abstract ( 153 )

PDF (764KB) ( 399 ) 　　

Review attachment

Single target tracking is an important part of computer vision, and its robustness is restricted by target occlusion, illumination variation and target scale variation. To deal with these problems, we propose a visual tracking algorithm based on the analysis of the sparse error matrix in low rank projection. In order to overcome the effect of model drifting, target templates are updated dynamically with the similarity between target templates and candidate targets. In the framework of particle filter, the sparse error matrix of candidate targets is obtained by using the theory of robust principal component analysis and low rank projection, and the observation likelihood estimation of the next frame is achieved according to edge and smoothness information. Experimental results on multiple video sequences show that this algorithm has better robustness performance than that of the state-of-the-art tracker.

An improved multi-scale Retinex

method for tone mapping

LU Bi-bo1，CHEN Jing1，WANG Jian-long2，ZHENG Yan-mei1

2017, 39(05): 951-956. doi:

Abstract ( 275 )

PDF (841KB) ( 413 ) 　　

Review attachment

Traditional low dynamic range display devices fail to demonstrate high dynamic range image information. To solve this problem, we propose a multi-scale Retinex method for tone mapping. We employ the guided filtering to decompose the original HDR image into the light layer and the reflecting layer. A basic layer and a series of detail layers are obtained via multi-scale decomposition of the reflecting layer. Finally the detail layers and the base layer are combined with color recovery. Experimental results show that the proposed method can well restore the information of real scenarios and provide more details, better contrast and bright color.

Current Issue

Author center

Review center

Online journal