Computer Engineering & Science

A bank selection instruction optimization
method in partitioned memory architecture

CHEN Yong1,YUAN Mengting2,LI Qingan2

2016, 38(02): 195-201. doi:

Abstract ( 195 )

PDF (872KB) ( 412 ) 　　

It is a research hotspot to minimize the number of bankselection instructions in the partitioned memory architecture. Analyzing the features of this problem, we construct a graph partition model for bank selection and propose a twostage heuristic search algorithm. This algorithm uses the size of nodes and partition capacity to get an initial solution quickly, and then uses the edge weights between nodes and the weighs between partitions as the heuristic parameters to search for the better solution. Experiments on MiBench benchmark and a real embedded system validate the effectiveness of the proposed model and algorithm. Compared to the VPAB algorithm, the proposal can achieve an average optimization rate of 37.99%, which is slightly better than the mature commercial compiler PICC, and can reduce the number of bank selection instructions largely.

Implementation and optimization of HYB based SpMV
on CPU+GPU heterogeneous computing systems

YANG Wangdong1,2,LI Kenli2

2016, 38(02): 202-209. doi:

Abstract ( 244 )

PDF (661KB) ( 506 ) 　　

Sparse matrix vector multiplication (SpMV) is an important issue in solving sparse linear systems. The sparse features and the low computing density lead to low computation efficiency. Regarding the irregularities of the sparse matrixes, some hybrid storage formats are used to compute SpMV to improve the compression efficiency and expand the range of adaptation. HYB is a hybrid compression format of ELL and COO formats, and is widely used on SpMV because of its stable performance. With the common application of parallel computing on GPUs and multicore CPUs, the heterogeneous computing system based on CPU+GPU is accepted. The ELL of HYB is assigned to the GPU for processing and the COO of HYB is assigned to the CPU, which can take full advantages of both CPU and GPU computing resources to improve the utilization efficiency of computing resources. In this paper, based on the analysis of the characteristics of the CPU + GPU heterogeneous computing model, we propose some optimization strategies to improve the performance of SpMV in the heterogeneous computing environment.

A vectorization method of QR
decomposition based on Matrix

LU Qingnan,LIU Zhong

2016, 38(02): 210-216. doi:

Abstract ( 164 )

PDF (545KB) ( 388 ) 　　

We propose a vectorization method of QR decomposition with Givens rotation on Matrix processors. According to the systematic characteristics of Matrix architecture, the computation tasks are evenly distributed to all vector processing elements by optimizing the memory access to vector data and calculation. We also design a double DMA buffering scheme to smooth the data transfers, which can fully overlap the kernel computation time and the DMA data transfer time so that the kernel computation is always at its peak speed and the best computation efficiency is achieved. Experimental results show that the proposal can achieve higher computation efficiency and performance speedup.

Design of a visual Deep Web crawler platform based on Hadoop

LIU Tong1,ZHANG Yang2,SUN Qi2,YUAN Chong2

2016, 38(02): 217-223. doi:

Abstract ( 199 )

PDF (822KB) ( 408 ) 　　

With the development of IT technology, internet information resources become much richer. We can obtain relevant knowledge from complicated internet information thanks to the rapid development of big data technology. The most essential part is the big data crawler technology which can crawl and save Internet data structurally. In this paper, we present and develop an efficient Deep Web information crawler based on Hadoop. This crawler employs the Webkit as the core engine which can implement the visual configuration and the deep data collection. To improve the efficiency, the data collection algorithm is also optimized by adjusting the strategy of task distribution in Hadoop. Experimental results demonstrate that the developed data collection platform can obtain better results.

Design and implementation of CUDA algorithm
for calculating effective resonance integral

REN Chenglei1,PU Peng2,HAN Dingding1

2016, 38(02): 224-230. doi:

Abstract ( 192 )

PDF (966KB) ( 409 ) 　　

Nucleon reactors need realtime accurate calculation of effective resonance integral or groupaveraged effective cross section to achieve the safety control of reactors. Because the whole calculation process involves a lot of integral operations and huge section data, the time consumption of conventional calculation methods is considerably huge . Based on the compute unified device architecture (CUDA) platform, the whole calculation process with multithreaded operations at the same time can greatly improve the computing speed and reduce the time consumption with the help of the computing power of the graphic processing unit (GPU). Experimental results show that the parallel computing results on the GPUs have no significant difference from the original data, and the proposal improves the speedup significantly.

Parallel implementation of LDA algorithm based on Hadoop

ZHANG Zhao1,2,3,ZHANG Xinfeng1,2,3,ZHENG Nan1,2,3,GUI Mingjun1,2,3

2016, 38(02): 231-239. doi:

Abstract ( 305 )

PDF (898KB) ( 441 ) 　　

With the rapid development of the Internet, the amount of data which needs to be dealt with is increasing constantly. The traditional standalone text clustering algorithm cannot meet the requirements of largescale data processing in the field of data mining. In order to solve the problem that standalone LDA algorithm is incapable of analyzing and dealing with largescale data, we propose a distributed parallel LDA program using Gibbs sampling based on the MapReduce framework. By utilizing the MapReduce distributed computing framework, we study the distributed implementation of LDA topic model, and test the performance of the distributed computing programs. Through the comparison tests between distributed computing based on Hadoop and standalone computing, we find out that the method can enhance the running speed of the algorithm when dealing with largescale data. As the number of clustering nodes is increasing, the proposal also has good speedup performance. The parallel implementation of the LDA algorithm is feasible, which can solve the problem that standalone LDA model is incapable of analyzing and dealing with the latent topic information of largescale data.

A novel congestion avoidance scheme based on reservation

ZHU Chengyang，CHAI Yantao，DONG Dezun，ZHANG Heying,PANG Zhengbin

2016, 38(02): 240-248. doi:

Abstract ( 184 )

PDF (1012KB) ( 319 ) 　　

Because of the unbalanced traffic in high speed connected networks, some nodes in the networks become hotspots, resulting in congestion in these nodes and related channels , which can harm the performance of high speed connected networks greatly. There is an existing congestion avoidance scheme named Speculative Reservation Protocol (SRP) which can avoid congestion actively, and the SRP eliminates the negative effect induced by hotspots problem tremendously. But in the hotspots model, the resources in most of routers which don't connect any hotspot are idle. In order to fully use the resources and enhance the performance of high speed connected networks, we put forward a novel congestion avoidance scheme called Intermediate Reservation Protocol (IRP) based on reservation. The IRP can effectively use the resources of the routers of neighbor nodes judged by different topologies, such as Fattree. The IRP first sends the packets to the idle routers via the multipath of Fattree. When the destination's router is available, the IRP packets can be sent to the destination nodes, and the latency of high speed connected networks is thus reduced.

A dynamic mechanism of critical data protection
based on hardwaresoftware cooperation

YUE Hong1,WANG Lei2,DENG Yu2,LIU Lei3

2016, 38(02): 249-254. doi:

Abstract ( 165 )

PDF (979KB) ( 321 ) 　　

In response to the physical attacks on the internal storage and offchip bus so as to ensure the safety of stored data, we propose a dynamic mechanism of critical data protection based on hardwaresoftware cooperation, which can extract userdefined key safety data, store them into the key safety area, and adopt dynamic integrity verification to examine whether the data has been tampered. Compared with the traditional way of protecting program memory data, the proposed method has the advantage of preventing attacks on the hardware and software, saving onchip and offchip memory, reducing the processing time and enhancing the safety performance.

Distributed swarm pattern mining algorithm
in big spatio-temporal trajectory data

YU Yanwei1,2,QI Jianpeng1,LU Yunhui1,2,ZHAO Jindong1,ZHANG Yonggang2

2016, 38(02): 255-261. doi:

Abstract ( 239 )

PDF (762KB) ( 426 ) 　　

We propose an efficient distributed mining algorithm based on MapReduce for mining swarm pattern from big spatiotemporal trajectory data. We first define the objectclosed swarm pattern based on the maximum moving object set, and optimize the serial mining algorithm using the strategy of minimum time support set to minimize the computation costs. We then propose a parallel swarm mining model based on the time independence, and the clustering and the objectclosed swarm mining on the time domain are parallelized. Finally, we propose a distributed mining algorithm based on MapReduce chained architecture, which quickly discovers swarm patterns in big trajectory data by a 4stage framework. Experimental evaluations on the Hadoop platform, using massivescale real world traffic trajectory datasets, demonstrate the effectiveness and efficiency of the proposed distributed algorithm.

A file storage scheme based on
cloud environment

ZHOU Lanfeng1,MENG Chi1,PENG Junjie2

2016, 38(02): 262-268. doi:

Abstract ( 145 )

PDF (940KB) ( 370 ) 　　

As a critical part of the cloud computing technology, cloud storage includes two aspects: selection of storing position and file transfer. File transfer consists of file uploading and downloading. As an important part of storage, file transfer has a profound impact on cloud storage efficiency. In recent years, much research focuses on the efficiency of data storage and data transfer. In order to solve the lowefficient transfer of a large number of streaming media files in the private cloud environment, we propose a file transfer scheme called threshold upload scheme (THU). We assume that both in different cloud environments and transfer client computers, there is a file size value fk. When a file size is smaller than fk, we package several of them into one file in a lossless way, whose size value is close to fk. Otherwise, if a file is larger than fk, it will be divided into several fk siz files. In this way the time consumption is reduced compared to the traditional methods. We carried out many experiments and the results prove that the existence of value fk in transferring streaming media files in the lab cloud environment via ftp protocol. When the package size is equal to fk, time for packing, unpacking and transferring can be optimized, which proves the efficiency and accuracy of the THU scheme.

Implementation of an algorithm for the
dining philosophers problem

GAO Sheng,CHEN Yuefeng

2016, 38(02): 269-276. doi:

Abstract ( 286 )

PDF (732KB) ( 475 ) 　　

Aiming at the dining philosophers problem, a wellknown classical example of the inter process of communication in the operating system field, this paper designs and presents a technical implementation scheme for a representative algorithm. The scheme takes Linux as its supporting platform, and the scheme reflects the characteristics of the concurrent behaviors of philosophers through processes rather than threads. The state of the philosophers and the switch between two states are emulated and controlled by the combination of automatic and random modes, which is a flexible and natural humancomputer interaction mode. Two forms of state monitoring programs are presented, which can express the state of philosophers in a vivid, direct and accurate way. The characterbased monitoring program can be used in both the character terminal and the graphic terminal, and is mainly used for those users who log in the multiuser Linux systems. The animationbased monitoring program is suitable for the desktop Linux users with graphic terminal.

An encryption algorithm of double chaos with
mutual feedback based on the features of social networks

YI Chengqi1,JIANG Jingchi2,XUE Yibo3,4

2016, 38(02): 277-283. doi:

Abstract ( 159 )

PDF (878KB) ( 428 ) 　　

Social network data acquisition has become a foundation of social network analysis. Although the majority of social media provides official APIs for the developers, there are always rigid limitations on frequency of calls, permissions and contents, which makes it difficult to acquire complete data. So data acquisition technologies based on user login simulation are particularly important. Nevertheless, the login process of the majority of social media has some inherent security risks. The login passwords are usually transmitted as clear text, and user privacy suffers serious threats. This paper gives a detailed analysis of the interactions between clients and servers during user login based on Twitter. When we parse POST requests on the network traffic level, we find that Twitter transmits login passwords as clear text. Hence, we propose an encryption algorithm of double chaos with mutual feedback based on the features of social networks. The algorithm utilizes users’ IDs, the creation time and the number of followings as the initial values of the encryption functions. By applying interactive operations between Logistic mapping and Tent mapping, the algorithm obtains a new sequence of secret keys. The particularity of input parameters leads to the unpredictability of the cipher text. Experimental results show that our algorithm obtains better encryption and decryption. Meanwhile, the speeds of encryption and decryption are on the time scale of milliseconds, thus users do not feel any waiting. Furthermore, the algorithm has the features of sensitivity to initial conditions, large key space and highstrength encryption. The algorithm can prevent attackers from breaking the codes via methods such as phase diagram, bruteforce computation and statistics, and has a wide application prospect.

A locationbased streaming mobile
application push system

JIA Lei,YANG Wang,WANG Zhaoyang,WANG Guojun

2016, 38(02): 284-289. doi:

Abstract ( 127 )

PDF (599KB) ( 295 ) 　　

Now more and more locationbased mobile applications emerge. The traditional application distribution mode requires users to search, download, install and uninstall applications manually. Besides, it is not conductive to promote the experience of using application services for users. We design and implement a locationbased streaming mobile application push system. In this system, servers analyze and install related applications by using mobile terminal location information and push them to the mobile terminal for display. The terminal loads the applications from servers according to users choice preference. When users' locations change, the system enables users to use the application services related to the current location without downloading and installing them. Experimental results show that the system can decrease the latency by 64.37% in the 3G network environment and 74.49% in the 4G network environment in comparison with the traditional application distribution mode.

Symbolic device driver environment for
detecting bugs in Linux device driver

XU Yongjian1,2,WANG Dan1,CHEN Yu2,FAN Wenliang2

2016, 38(02): 290-296. doi:

Abstract ( 205 )

PDF (663KB) ( 326 ) 　　

Studies have shown that driver vulnerability is one of the main causes of Linux security issues, which can lead to privilege escalation, denial of service and other highrisk situations. Considering the difficulty of driver vulnerability detection without real devices, this paper proposes symbolic execution of Linux drivers and implements the symbolic device driver environment (SDDE ), which can detect bugs in Linux device driver. The SDDE provides symbolic kernel services and symbolic devices, making symbolic execution of Linux driver and runtime driver vulnerability detection possible. The SDDE works without real hardware, and has high coverage, high performance and good scalability. The SDDE is applied to 6 Linux drivers, and we found six real bugs, three of which are confirmed by Linux developers.

Web service selection based on grey correlation analysis

DAI Xiaoling,TANG Mingdong,L Saixia

2016, 38(02): 297-304. doi:

Abstract ( 130 )

PDF (569KB) ( 369 ) 　　

To help users select the service that best satisfies their nonfunctionality requirements from a set of Web services, we propose a grey correlation analysis based QoSaware service selection method in this paper via analyzing service QoS attribute factors using the grey system theory. Based on that service QoS information is usually uncertain and incomplete, the proposed method uses intervals to model the QoS attribute values of Web services. To determine how well a service satisfies users’ concerned QoS requirements, the method firstly adopts a set of functions to normalize interval grey numbers of services’ QoS on various QoS attributes with different metrics and scales. Then it computes services’ grey incidence degree coefficients of interval grey numbers on each QoS aspects. Finally it combines each service’s grey incidence degrees on all QoS attributes to obtain an overall grey incidence degree. The service with the largest grey incidence degree is recommended to users. Compared with other Web service evaluation models, our approach is more suitable for real Web service systems where QoS information is uncertain and incomplete, and it can provide more effective and reasonable evaluation for Web service selection.

An efficient index for continuous uncertain XML data

ZHANG Xiaolin,GUO Dandan,HAO Kun

2016, 38(02): 305-311. doi:

Abstract ( 135 )

PDF (880KB) ( 290 ) 　　

At present, the uncertain XML index is not completely applicable to continuous uncertain XML data. We propose a continuous uncertain XML index (CUXI) algorithm to support probability threshold range query of continuous uncertain XML data. The algorithm refers to the idea of U Tree, which builds the spatial data index tree in a recursively topdown way. The CUXI index tree constructs a twodimensional data rectangle with the same father’s leaf nodes in XML documents, and the index tree is built accordingly based on the clustering. Leaf nodes calculate in advance and stores some related information of continuous uncertain data. In order to improve query efficiency, a filtering strategy of continuous uncertain data is introduced. When querying, it walks through the index tree to filter the subtrees that do not meet the query range. Experimental results show that the proposed index technique can improve query processing performance to a certain extent.

Optimization of compression strategy for
spatio-temporal data based on genetic algorithms

QIAN Jinghui,WANG Shanshan

2016, 38(02): 312-317. doi:

Abstract ( 129 )

PDF (760KB) ( 366 ) 　　

For the low accuracy of recovery caused by loss compression of spatiotemporal data, we propose a genetic algorithm to optimize the compression of spatiotemporal data. On the basis of DouglasPeucker algorithm, the data environment can adjust the compression parameters adoptively, encodes the chromosomes, and generates the initial population. Secondly, at the evolution phase, the elitist strategy is adopted to obtain the global optimal individuals. Finally, crossover and mutation operations are done. We adopt four different compression strategies for the experiment, and compare the details of the compression ratio and average deviation.Experimental results show that the proposal has a good effect on the optimization of spatiotemporal data compression, and can efficiently decrease the recovery error.

Research and application of image registration
technology in panoramic stitching

LAN Hong,HONG Yuhuan,GAO Xiaolin

2016, 38(02): 317-324. doi:

Abstract ( 167 )

PDF (1305KB) ( 543 ) 　　

Aiming at solving the insufficiencies of the SIFT registration algorithm such as large calculation, error matching when generating feature vectors and doing feature matching, we propose an optimized SIFT registration algorithm. The optimized registration algorithm firstly sharpens the edge of images by bringing in the Laplacian operator and extracts the characteristics of block images via the projection entropy of image unit information. Then feature points are matched according to the minimum Euclidean distance of entropy vector projection. Finally, the improved random sample consensus algorithm is adopted to eliminate error matching. We apply the optimized algorithm in panoramic mosaic and experimental results show that, the optimized registration algorithm exceeds the SIFT registration algorithm, which not only effectively improves the efficiency, but also reduces error matching, achieving good matching effect.

Frame skipping strategy optimization
in H.264 rate control algorithm

ZHOU Quan1,2,3,WANG Zhongyuan1,2,3

2016, 38(02): 325-330. doi:

Abstract ( 173 )

PDF (706KB) ( 361 ) 　　

Traditional frameskipping strategies in H.264 rate control algorithm use buffer occupancy and image complexity as a criterion. Although complexity can reflect the motion, it cannot reflect the temporal relationship between the current frame and previous frames, leading to low encoding quality. To address this problem, we improve it from two aspects. First, Complexity is replaced with relative complexity, so that the temporal dependency relationship can be better characterized. Second, bit factor is introduced to establish a comprehensive frameskipping measure along with buffer occupancy and relative complexity. Experimental results show that our optimization has higher PSNR and the video behaves better in continuity.

Point pattern matching based on polar coordinate transform

GAO Guandong1,2,WANG Jing1,LIU Fei1,DUAN Qing1,ZHU Jie1

2016, 38(02): 331-337. doi:

Abstract ( 193 )

PDF (807KB) ( 577 ) 　　

Point pattern matching (PPM) is an important issue in computer vision and pattern recognition, such as target recognition, medical and remote image registration, and pose estimation, and so on. We propose a novel PPM algorithm based on point features. We first construct feature attribute image of the points according to the distribution of point sets and points’ position. Then polar coordinate transform is applied to the feature attribute image, and we describe the feature attribute image with the moment invariants. We get the course matching results by comparing the feature vectors, and an iterative method is introduced for the final matching. We make two contributions in this paper. Firstly, we construct a polar coordinate transform based points features, and describe the features with the moment invariants. The features are invariant to rotation and transportation. Secondly, an iterative matching process by point features and geometry constraint is proposed. The method is insensitive to outliers and noises. Experimental results demonstrate the validity and robustness of the algorithm.

Current Issue

Author center

Review center

Online journal