Computer Engineering & Science

Optimization techniques for continuous

range query based on Storm

WANG Botao，ZHAO Kaili，CHANG Lidong，LI Rui，HUANG Shan，LI Jing，LI Xiang

2017, 39(01): 1-14. doi:

Abstract ( 220 )

PDF (1240KB) ( 603 ) 　　

Review attachment

In the era of mobile big data, traditional location based service (LBS) techniques face new challenges such as lack of system scalability and performance. We first propose a query framework based on Storm according to the characteristics of LBS applications. Then, we design continuous parallel range query algorithms based on Storm to optimize query performance. As for the consistency problem in the distributed environment, we design a distributed lock service based on ZooKeeper to guarantee the correctness of query results. Furthermore, we propose a cacheoptimized algorithm based on TimeCacheMap and two caching strategies for the timeconsuming problem of accessing database in the parallel continuous range query algorithm based on Storm, so as to reduce the overhead of accessing database and improve query efficiency.

ultigranularity partition and scheduling for stream

programs based on multiCPU and multiGPU

heterogeneous architectures

CHEN Wenbin,YANG Ruirui,YU Junqing

2017, 39(01): 15-26. doi:

Abstract ( 257 )

PDF (1121KB) ( 494 ) 　　

Dataflow programming language simplifies the domain programming and offers an attractive way to express the parallelism of mission computing and data communication on task level and data level. For the problems such as too much data parallelism, task parallelism and pipeline parallelism in multiCPU and multiGPU architectures, we propose an efficient data flow compilation framework. The framework takes the synchronous data flow graph as the beginning input, and uses many partition methods to distribute the tasks to multiCPU and multiGPU. According to the parallelism of tasks and communication, the tasks classification method assigns the tasks to GPU or CPU. We propose a GPU task horizontal splitting method to divide the tasks distributed to GPU into many blocks, and one GPU executes one block. The GPU task horizontal splitting method avoids the communication between GPU and GPU. The CPU dispersed task balancing partition method chooses appropriate CPU cores and balances the tasks distributed to these CPU cores. The method satisfies load balancing and raises the utilization rate of CPU cores. We choose a multiCPU and multiGPU heterogeneous architecture as the experiment platform and the common algorithms in media processing applications as benchmarks. Our experiments verify the effectiveness of the proposed methods.

Implementation and performance analysis of BFS

algorithm on a parallel prototype system

HENG Dongdong,TANG Yuhua,YI Xiaodong,LIU Xiangyang,ZHOU Tong

2017, 39(01): 27-34. doi:

Abstract ( 429 )

PDF (676KB) ( 514 ) 　　

Review attachment

Compared with traditional applications, big data applications present have diffident characteristics, such as high parallelism, large volumes of data, irregular memory access mode, and poor temporal locality, which bring new challenges to traditional computer architectures. Graph500 which focuses on data intensive loads, is a rating of supercomputer systems. And the Bread First Search (BFS) algorithm is the core of the benchmark. Based on the parallel prototype system designed for big data applications, we implement the BFS algorithm from the perspectives of 1D partitioning, optimized hybrid algorithm and convenient remote communication design. The performance achieves 803.8 MTEPS under the scale of an input graph of 222 vertexs and 226 edges. Then we analyze the experiment performance and provide a reference for future wok.

Design and implementation of BIRCH algorithm

parallelization based on Spark

LI Shuai1，WU Bin2，DU Xiuming3，CHEN Yufeng3

2017, 39(01): 35-41. doi:

Abstract ( 223 )

PDF (1053KB) ( 517 ) 　　

Review attachment

In the era when distributed computing and memory highly count, the technology of memorybased distributed computing framework, such as Spark, has gained unprecedented attention and is widely applied. We design and implement the BIRCH algorithm parallelization based on Spark, which can maximize performance optimization and reduce the frequency of shuffling and disk accessing. We do some theory analysis and describe the DAG of the BIRCH based on Spark. Finally, we compare the performance of the parallelized BIRCH algorithm with the BIRCH algorithm of a single machine and the MLlib KMeans clustering algorithm. Experimental results show that the parallel BIRCH algorithm based on Spark obtains ideal running time and speedup without obvious clustering quality loss.

An energy-efficient algorithm based on

busy time in parallel scheduling

CAI Lijun1，PAN Jiangbo1，CHEN Lei2，HE Tingqin1

2017, 39(01): 42-48. doi:

Abstract ( 166 )

PDF (791KB) ( 410 ) 　　

Review attachment

Minimizing the busy time of servers is an effective way to save energy in parallel scheduling applications in cloud computing, however, most of the existing busy time energy saving strategies are at the expense of job scheduling performance, and they are unable to cooperate with other job scheduling algorithms. We propose an effective energyefficient optimization algorithm based on busy time in parallel scheduling, called BTEOA. Firstly, based on the available resources of current servers, we divide the job request queue into a special job window and a general job window. Secondly, according to the completion time of job requests in the special job window, we can execute these job requests in the servers so as to ensure that the total busy time of all servers is locally optimal. Finally, when the execution of all job requests in the special job window finishes completely, the BTEOA continues to divide the general job window into a new special job window and a new general job window until the remaining job requests are all executed completely. During the job scheduling, the BTEOA can ensure the job scheduling performance unaffected by keeping the original job queuing model static. Instances and experimental results demonstrate the performance of the BTEOA, which can save energy consumption while does not affect the performance of job scheduling. Besides, it can also be used jointly with other job scheduling algorithms.

Power optimization of an outoforder superscalar processor core

SUN Caixia，LI Wenzhe，GAO Jun，WANG Yongwen

2017, 39(01): 49-54. doi:

Abstract ( 177 )

PDF (873KB) ( 445 ) 　　

Review attachment

To maximize the performance, the frequency of the processor core becomes increasingly higher, and the design of the processor core gets much more complex. As a result, power issues become a challenge and need special care. Beside processlevel and circuitlevel low power technologies, adopting some strategies according to the features of modules during the RTL design can reduce power consumption. We analyze the power of an outoforder superscalar processor core and optimize the designs of register files and the reorder buffer based on the analysis. Evaluation results show that the area and power of the processor core decrease with no obvious performance penalty.

A prognostics and health management model for

integrated circuits based on back propagation neural network

DU Tao,RUAN Aiwu,WANG Peng,LI Yongliang,LI Ping

2017, 39(01): 55-60. doi:

Abstract ( 189 )

PDF (620KB) ( 589 ) 　　

Review attachment

We propose a prognostics and health management (PHM) model for integrated circuits (ICs) based on back propagation (BP) neural network. The model is not only independent of physical mechanism of ICs aging, but can effectively fit the nonlinear function of IC failures as well. We conduct a large number of experiments on a programmed FPGA, and take the extracted experimental parameter samples as training samples to train the PHM model. Experimental results verify the trained model. The results show that the proposed PHM model is in agreement with the experiment and can meet the requirements of PHM for ICs.

A robust header compression algorithm

based on dynamic Bayesian network

ZHOU Wei1，ZHAO Baokang1,LIU Bo1，WU Shaokang1,LI Yan1,LIU Hua1,2

2017, 39(01): 61-66. doi:

Abstract ( 150 )

PDF (675KB) ( 417 ) 　　

Review attachment

Space flight systems use the IP protocol to carry information, which has a higher data rate and application flexibility in comparison with traditional wireless communication. In order to solve the problems of lowbandwidth and high error rate, we need an efficient and reliable header compression algorithm to improve payload efficiency. However, due to the complex wireless environment, as well as highspeed maneuverability of space flight systems, the transmission quality of the radio channel changes dynamically, and the protocol cannot be better suited for such a timevarying characteristic. We propose a robust header compression algorithm based on dynamic Bayesian network named DBROHC. According to discrete historical loss packet observation sequence of the decompressor, the algorithm can dynamically adjust key parameters to achieve better balanced compression ratio and robustness. Simulation results show that, compared with the conventional robust compression algorithm, the proposed algorithm is more suitable for complex wireless links.

Formal analysis and verification of mobile

payment protocol PCMS

WU Gege，ZHUANG Lei，ZHANG Kunli，WANG Guoqing

2017, 39(01): 67-73. doi:

Abstract ( 149 )

PDF (681KB) ( 446 ) 　　

Review attachment

In recent years, the formal analysis and verification of mobile commerce protocols become an important research hotspot in the field of mobile commerce protocols. We take the secure Payment Centric Model (PCM) using symmetric cryptography protocol as the research object, design a timed automata to model the PCMS protocol, and use the computation tree logic (CTL) to describe some properties of the protocol. Then we use the model checking verification tool UPPAAL to verify deadlockfree, timeliness, validity and money atomicity of the protocol. Verification results show that the PCMS protocol can satisfy the requirements of deadlockfree, timeliness, validity and money atomicity.

A satellite network mobility handover management

scheme based on locator/identifier separation

XIE Miao，FENG Zhenqian,YU Wanrong

2017, 39(01): 74-80. doi:

Abstract ( 166 )

PDF (991KB) ( 460 ) 　　

Review attachment

Mobile satellite networks receive extensive attention due to their wide geographical coverage, low communication delay and other advantages. Tremendous research efforts on the development of IP protocol networking technology have been made, and researchers strive to integrate it with terrestrial IP networks. One of the challenges of the converged network is the constant moving of the satellite. Frequent handover of satellite access causes complex mobile management issues. Existing IP mobility management technologies cannot effectively support mobile satellite network handover. In order to efficiently support mobile handover, we introduce the idea of locator/identifier separation into satellite networks, and study handover management issues in the locator/identifier separation architecture. We manage the terminal location through the mapping service system, and make a forwarding table in the previous satellite via the new satellite gateway and the terminal's identifier to handle the mobility handover management. Simulation results show that the proposed scheme outperforms the mobile IPv4 technology. It cannot only lower the handover latency and reduce binding update overhead or suboptimal routing, but also improve system performance and scalability in satellite networks.

ECC point multiplication lightweight improvement

for RFID applications over GF(2m)

WEI Guoheng1,2，WANG Ya2，ZHANG Huanguo1

2017, 39(01): 81-85. doi:

Abstract ( 130 )

PDF (451KB) ( 390 ) 　　

Aiming at the special applications of resource constrained devices such as RFID, we employ the elliptic curve algorithm with high security performance to improve the lightweight of point multiplication. We improve the modular multiplication and the inversion algorithm for the point multiplication in the core part, and redesign the algorithm execution structure by using the whole serial and partial parallel method. We implement the improved algorithm on FPGA, and experimental results show that it has obvious advantages in speed and chip occupied area, and is suitable for resource constrained applications such as RFID.

Performance evaluation of SDN

controllers based on hybrid queuing model

JIANG Lalin,PENG Xia,XIONG Bing

2017, 39(01): 86-91. doi:

Abstract ( 169 )

PDF (696KB) ( 412 ) 　　

Review attachment

oftware defined networking (SDN) expands the flexibility and programmability of networks by decoupling control logic from data forwarding, and becomes a hot topic in the area of future networks. SDN can face the challenge of controller performance bottleneck in its practical deployments, so it is necessary to understand the performance characteristics of SDN controllers. For this end, we first analyze the arrival process and the processing time of PacketIn messages in SDN controllers.Then, we propose a capacitylimited performance evaluation model of SDN controllers M/M/1/m based on the queuing theory. As a further step, we derive its performance parameters, including the average waiting queue length, the average queue time, the average queue length and the average sojourn time. Finally, the proposed performance model is evaluated via experiments by using the controller performance measurement tool Cbench. Experimental results indicate that the estimated delays of our model are closer to the measured delays than those of other existing models, thus providing a more accurate description of SDN controller performance.

A segmented Montgomery scalar multiplication algorithm

with resistance to simple power analysis SPA attacks

LI Yang 1,2,WANG Jinlin1,ZENG Xuewen1,YE Xiaozhou1

2017, 39(01): 92-102. doi:

Abstract ( 146 )

PDF (630KB) ( 351 ) 　　

Review attachment

Based on the Akishita’s idea of computing scalar multiplication kP+lQ on elliptic curve with Montgomery form, we propose a new algorithm to reduce the computation for scalar multiplication kP+lQ+tR by 23%.We then propose a subsection method on the basis of the above two algorithms to enhance the efficiency of computing scalar multiplication bP on elliptic curve by converting bP to kP+lQ or kP+lQ+tR, which combines the concept of sidechannel atomicity to resist SPA attacks. Simulations on Magma demonstrate that the twosegmentation algorithm is the fastest and the threesegmentation algorithm is the second, and they can both greatly improve the efficiency in comparison with the original Montgomery algorithm.

A robust zerowatermarking algorithm

for spatial domain color images

XIONG Xiangguang

2017, 39(01): 103-110. doi:

Abstract ( 170 )

PDF (874KB) ( 392 ) 　　

Review attachment

Traditional transform domain watermarking algorithms always embed watermarking signals by modifying transform domain coefficients, thus affecting the imperceptibility of the image. Using the stability of the numerical relationship between the entire image mean and block mean, we propose a robust zerowatermarking algorithm for spatial domain color images. A feature matrix is produced by judging the numerical relationship between the mean of the entire image and each block. The zerowatermarking information is constructed by performing XOR operation on both the feature matrix and the preprocessed watermarking information. After preprocessing, it is registered in the intellectual property rights database. When extracting watermarking, we obtain the final extracted watermarking information by the vote strategy. A large number of experimental results show that the proposed algorithm has strong robustness against conventional signal processing attacks and geometric attacks, such as row and column shifting, scaling and rotation. Compared with similar robust watermarking algorithms, the proposed algorithm has a better performance against most attacks.

Logging function recognition based on

machine learning technique

JIA Zhouyang,LIAO Xiangke,LIU Xiaodong,LI Shanshan,ZHOU Shulin,XIE Xinwei

2017, 39(01): 111-117. doi:

Abstract ( 204 )

PDF (946KB) ( 615 ) 　　

Review attachment

With software scaling up continuously, logging mechanism has become an indispensable part in failure diagnosis area. A pretty similar symptom may be caused by various software bugs, and the most obvious evidence is always logging messages. Meanwhile, the development of most pieces of largescale software is affected by developers' personal habits rather than being guided by certain conventional specification, so logrelated analysis suffers in largescale software. The recognition of logging function plays a precondition role in log analysis and affects the results of log analysis directly. We propose a machine learning method to fill the gap that logging function recognition has not been paid attention by most existing logrelated works. Learning from widelyused software, we summary three logging functions, extract five common features to complement automated loggingfunction recognition tool iLog based on machine learning. Evaluations show that the recognition ability of iLog is 76 times of those using key words. Additionally, 10fold crossvalidation shows that the FScore average is 0.93.

C++ program memory leakage analysis

based on static test

CHEN Bei1,XU Qingguo1,2

2017, 39(01): 118-124. doi:

Abstract ( 207 )

PDF (505KB) ( 471 ) 　　

C++ is a very popular computer programming language and memory leakage is one of the most hardtofind common error in C++ programs. We design a static analyzer that can detect memory leakage for C++ programs. Firstly, we extract the control flow graph from the abstract syntax tree generated from C++ source via g++. Secondly, we build some new control flow graphs by connecting the control flow graphs of memory functions for the same class. Finally, we design and implement an algorithm to analyze the new control flow graphs to report all possible errors. Experiment results, for some examined examples, show that our method is effective.

Airline baggage classification based on

the mean square error of the surface depth

GAO Qingji,WEI Yuanyuan

2017, 39(01): 125-130. doi:

Abstract ( 163 )

PDF (679KB) ( 375 ) 　　

Review attachment

According to the international standard of baggage consignment methods for airline passengers, we study the classification method based on the variance of the surface depth of the baggage, and attempt to provide research basis for consignment methods. The depth image above the baggage conveyor belt is acquired by the sensor of Kinect, and pixel values of the baggage area are extracted so as to calculate their variance to classify the baggage roughly. In combination with prior knowledge of baggage’s threedimensional surface morphology, and according to the distance between grids and the variance of the depth value’s difference, we design a selfadaptive clustering algorithm based on grid similarity, which fits the clustering results of the area of each top unit as well as the number of all the top units. The threedimensional morphological characteristics of baggage are analyzed and the classification of baggage is determined. Experimental results indicate that this algorithm is of low complexity, and the classification is accurate and effective.

A face recognition algorithm based on lowrank subspace

projection and Gabor feature via sparse representation

YANG Fangfang1,WU Xisheng1,GU Biaozhun2

2017, 39(01): 131-137. doi:

Abstract ( 160 )

PDF (625KB) ( 481 ) 　　

Review attachment

At present, face recognition algorithms often ignore the influence of noise in the training process, especially when the training data and the testing data are corrupted, so the recognition performance may be decreased significantly. To solve the problem of face recognition with illumination variation, occlusion, camouflage and expression variation, we propose a new face recognition algorithm based on lowrank subspace projection and Gabor feature via sparse representation. Firstly, we obtain the potential low rank structure and sparse error structure of training samples by the lowrank matrix recovery algorithm, and the transformation matrix of the Gabor feature of the low rank structure by using the principal component analysis. Then the Gabor feature vectors of training samples and testing samples are projected to the low rank subspace through the transformation matrix. Finally, we use sparse representation classification algorithm to perform classification and recognition in the low rank subspace. Experiments on the Extend Yale B and AR databases show that the proposed method has a high recognition rate and strong robustness.

An image feature matching

algorithm based on spatial structure

ZHOU Lili1，JIANG Feng2

2017, 39(01): 138-144. doi:

Abstract ( 151 )

PDF (1051KB) ( 411 ) 　　

Review attachment

Image binary keypoint descriptors provide an efficient alternative for floatingpoint competitors, as they enable faster processing while requiring less memory. We analyze common binary keypoint descriptors, modify the Fast Retina Keypoint (FREAK) descriptor by using the spatial structure information between keypoints, and propose a multipoint FREAK (MPFREAK) descriptor which improves its feature description ability. As for the slow speed of the nearest neighbor algorithm, we propose a fast feature matching algorithm for hamming space, called MLSH, which modifies the localitysensitive hashing (LSH) algorithm to create a much smaller candidates list. Experimental results show that the proposed feature descriptor is more accurate than other algorithms, and the feature matching algorithm has significant effect and is faster than its competitors.

An improved ray casting algorithm based on GPU

ZHANG Aguan1,4，JIANG Huiqin1,4，MA Ling1,4，YANG Xiaopeng2,4，LIU Yumin3,4

2017, 39(01): 145-150. doi:

Abstract ( 230 )

PDF (702KB) ( 445 ) 　　

Review attachment

Ray casting algorithms can reconstruct 2D tomographic CT images into 3D images, which provide an important means for accurate diagnosis and quantitative analysis of the disease. Because of the large amount of calculation and the slow calculation speed, it is almost impossible for traditional ray casting methods to achieve realtime rendering without hardware acceleration. We propose an improved ray casting algorithm based on GPU. Firstly, we acquire ray ending points and the direction by designing a GPU program. Secondly, we determine the position of resampling points by using the acceleration step sampling method and calculate the values of these points through the fast compound interpolation method. Thirdly, we further accelerate the reconstruction process by using the nontransparency ahead dead method. Experimental results show that the proposed method has the advantages of low complexity and high efficiency. Under the condition of guaranteeing the reconstructed image quality, the rendering speed of the proposed method is 6 times faster than the existing ray casting method based on CPU and 2 times faster than the traditional ray casting algorithm based on GPU. It therefore can provide an effective means for the reconstruction of CT images.

Current Issue

Author center

Review center

Online journal