Computer Engineering & Science

Modeling and evaluating Intel IMCI vgather instruction using stencilsJames

Lin1,2,WANG Yi chao1,QIN Qiang1,LI Shuo3,WEN Min hua1,Satoshi Matsuoka2

2016, 38(09): 1741-1747. doi:

Abstract ( 222 )

PDF (509KB) ( 455 ) 　　

Vgather is a hardwareimplemented vector instruction introduced by Intel Initial ManyCore Instructions (IMCI) for Xeon Phi. Its target is to help SIMD registers access data from noncontiguous memory locations. However, experimental results show that it can also be one of the key performance bottlenecks on Xeon Phi. We model the performance of Vgather by using the profiling tool PAPI to directly collect the results of hardware performance counters. Address Generation Interlock (AGI) events are profiled as the number of Vgather and the average latency of Vgather are estimated with VPU_DATA_READ events based on which we model the total latencies of Vgather instructions. 3D7P stencils are used to evaluate our model and the results show that Vgather spents nearly 40% of total kernel time. We implement a Vgatherfree version with intrinsic instruction to validate this model. Our contribution includes modeling Intel IMCI vgather instruction with hardware counters and validating it by stencils. The model can also be applicable on CPUs.

A matrix multiplication accelerator design for optimization blocking strategy

SHEN Jun zhong,XIAO Tao,QIAO Yu ran,YANG Qian ming,WEN Mei

2016, 38(09): 1748-1754. doi:

Abstract ( 226 )

PDF (829KB) ( 532 ) 　　

Large scale floating point matrix multiplication is one of the most time consuming computational kernels in many applications. There is a feature in emerging applications that matrices usually own at least one small dimension, which is called non uniform large scale matrix multiplication. Due to the limited amount of onchip memory for storing intermediate results on FPGA, partitioning largescale matrix multiplication into fine grained subblock computational tasks is needed. When accelerating non uniform matrix multiplications, most of the existing hardware matrix multipliers with a linear array architecture can suffer great performance reduction due to the fixed sub block size support. To solve this problem, we propose an efficient optimization blocking strategy. Based on it, we implement a novel matrix multiplier to support variable subblock operations on a Xilinx Zynq XC7Z045 FPGA. By integrating 224 processing elements (PEs), the multiplier achieves up to 48 GFLOPS for non uniform matrix multiplication in real application at 150 MHz with requirement of 4.8 GB/s of memory bandwidth. Results show that our proposed blocking strategy can improve up to 12% of performance in comparison with traditional blocking algorithms.

A twice-Hash based convergent encryption strategy for data deduplication

ZHOU Yu kun1,2,FENG Dan1,2,XIA Wen1,2,FU Min1,2

2016, 38(09): 1755-1762. doi:

Abstract ( 213 )

PDF (800KB) ( 541 ) 　　

With the explosive growth of digital data, data deduplication has been widely used in cloud storage to reduce storage space and network bandwidth. Although the existing solutions use the convergent encryption (CE) to improve data confidentiality, the CE faces two main challenges: 1) the CE is subject to offline bruteforce dictionary attacks because it is deterministic and keyless; 2) the CE has to encrypt all data and calculate the fingerprint based on its ciphertext, thus the computation cost increases as the data deduplication ratio increases. In order to solve these problems, we propose a twicehash based convergent encryption strategy (TCE). The TCE encrypts data after deduplication via computing the hash twice. And the trusted third party adds secret information to make random convergent keys. The TCE uses the second hash as the fingerprint and eliminates useless operations for duplicate data encryption. Experimental results show that the TCE can reduce the backup window by 30%~50% in comparison with the CE.

An adaptive dualcode parallel mechanism LDPC decoder for QKD

YIN Qing qing,ZHAO Guo hong,ZHAO Bao kang,LIU Bo

2016, 38(09): 1763-1768. doi:

Abstract ( 170 )

PDF (854KB) ( 401 ) 　　

Information reconciliation is a critical step in quantum key distribution (QKD), and current research focuses on LDPC based approaches. Constrained by the poor performance of the LDPC decoder, current LDPC based approaches are difficult to meet the performance and security requirements of high speed QKD systems with high quantum bit error rate. We therefore propose a novel adaptive dualcode parallel decoder mechanism, named ADCPM. In the ADCPM, a specially designed decision module makes the hard decision process and node computation checking process work in parallel. Experimental results on real platform show that the proposed ADCPM LDPC decoder is able to work with high QBER up to 10%, and its decoding rate is beyond 140Mbps. The ADCPM can meet the security and performance requirements of the high speed QKD systems with high quantum bit error rate.

Design and implementation of communication signal testing platform based on software defined radio

WEN Yan dong1,2,WEN Shuang chun1,LIU Yu2,WEI Bao yue2,ZHANG Yi tao2,CAO Jian fei1,2

2016, 38(09): 1769-1774. doi:

Abstract ( 176 )

PDF (1346KB) ( 536 ) 　　

Software defined radio technology is widely used in military and civilian radio communication because of its flexibility and openness. Using computer, MATLAB, FPGA and ADIs AD9364 radio frequency (RF) agile transceiver, we design a signal testing platform for different mobile communication systems with varied characteristics, including WCDMA, TD_SCDMA, GSM and LTE. The testing platform can generate arbitrary digital sine waves, analyze the information of communication signals in real time and adjust the parameters of RF chains. Experimental results show that the communication signal testing platform is in accordance with the principle of software defined radio, and it has good versatility, portability and reconfigurability.

Congestion control and balanced energy consumption in WSNs based on improved AOMDV

CHEN Wen-guang,NIU Yu-gang,ZOU Yuan-yuan

2016, 38(09): 1776-1783. doi:

Abstract ( 166 )

PDF (819KB) ( 381 ) 　　

In wireless sensor networks (WSNs), features such as the burst of the data flows, limited energy of the nodes, many-to-one transmission, etc., often result in network congestion and unbalanced energy consumption. Using multi-path data transmission cannot only relieve network congestion, but also achieve balanced energy consumption. The ad hoc on-demand multipath distance vector routing protocol (AOMDV) is a reactive multi-path protocol for Ad-hoc networks. We propose an improved AOMDV protocol, called I_AOMDV protocol, which does not use the congested or low-energy nodes in the route discovery phase, and exchanges “the residual energy” and “the queue length” by means of ‘HELLO’ messages in the route maintenance phase. Meanwhile, it also introduces “congestion recovery time” and “the flag of residual energy” to the list of paths in order to make the I_AOMDV more adaptable to the static WSNs. Furthermore, based on the I_AOMDV, we also propose a new congestion control and balanced energy consumption strategy. The proposed congestion control strategy uses a new congestion detection scheme and sets the "recovery time" for the shortest path falling into congestion. In order to cope with unbalanced energy consumption, the proposed balanced energy consumption strategy sets the “the flag of residual energy” for each path. Simulation results show that the proposed congestion and balanced energy consumption strategy can reduce the expense of routing protocol, packet loss rate, and the difference of residual energy of nodes.

Improving the reliability of the most forward mechanism within radius scheme in VANETs

LI Xue-song,YE Xue-mei,CAI Yan-ning,FAN Qing-gang

2016, 38(09): 1784-1789. doi:

Abstract ( 119 )

PDF (600KB) ( 341 ) 　　

VANETs is an application of traditional mobile ad hoc networks in traffic roads. In VANETs, most forward within radius scheme can effectively reduce hops of data dissemination and redundancy sending, however, its reliability can be adversely affected due to the farthest node failure. Given that the high mobility of VANETs may cause serious farthest node failure, which is analyzed and verified by experiments, we propose two improved methods, i.e. the safe distance method and the failure prediction method. The safe distance method chooses the neighbors whose distance are closest to the calculated safe distance as relay nodes, while the failure prediction method estimates neighbors’ position by state parameters, therefore avoiding choosing those neighbors who may be out of communication ranges as relay nodes. Simulation results show that the two methods can reduce the failure rate of relay nodes and improve the reliability of data dissemination to certain extent.

An improved DFT channel estimation algorithm based on wavelet denoising

XIE Bin，LE Hong-hao，CHEN Bo

2016, 38(09): 1790-1796. doi:

Abstract ( 177 )

PDF (626KB) ( 456 ) 　　

Given that the channel estimation algorithm which combines wavelet de-noising with interpolation in DFT cannot eliminate the noise in the cyclic prefix length, we propose a new channel estimation algorithm, a combination of wavelet de-noising and the improved DFT interpolation. This channel estimation algorithm first uses the discrete wavelet transform of least squares (LS) method to estimate the results of threshold de-noising processing, and sets the corresponding threshold in the process of DFT interpolation based on the noise variance of the mean inside and outside cyclic prefix. Then the noise of the cyclic prefix length is handled again to reduce further noise influence. Simulation results show that under the premise of unchanged complexity, the proposed algorithm cannot only remove the influence of additive white Gauss noise better, but also improve the accuracy of channel estimation effectively. In general, the channel estimation algorithm performance is better than the combination of wavelet de-noising and DFT interpolation.

An architecture for integrated spatial-terrestrial networks

LI Gang1,FENG Zhen-qian2,ZHAO Bao-kang2,HE Qian1,YANG Hui-ya3

2016, 38(09): 1797-1802. doi:

Abstract ( 170 )

PDF (870KB) ( 424 ) 　　

Spatial network, regarded as an extension of terrestrial network, is an important component of the integrated network. In the scenario that the spatial network is a mobile constellation, the traditional integrated network model takes the spatial network as an independent autonomous system, utilizing the border gateway protocol (BGP) to connect it with the terrestrial network and to maximize their compatibility between. However, the traditional integrated network model faces the problem of frequent session interruption and a large number of routing updates existing between border gateways. By re-examining the role of space network and border demarcation, we propose an integrated network oriented architecture，called space links for the integrated network (Slink). The core idea of the Slink is to utilize the spatial network to provide an interconnection channel for terrestrial networks and the routing updates between spatial networks and terrestrial networks are isolated by border gateways. Simulation results show that the Slink architecture can effectively reduce the need for storage space of the space routers and the bandwidth consumption between spatial networks and terrestrial networks.

A risk assessment framework based on attack events in dynamic networks

LI Yan，HUANG Guang-qiu，ZHANG Bin

2016, 38(09): 1803-1811. doi:

Abstract ( 144 )

PDF (7464KB) ( 382 ) 　　

By applying the evolution theory of dynamic network into the risk assessment of computer network, we propose a new risk assessment framework based on attack events in dynamic networks. We first construct the dynamic access relation network based on static physical links. Then the Timeline algorithm uses its time characteristic effectively to describe the attack evolution trend and find important attacks. The graph approximation algorithm is also adopted to simplify the analysis process as an analysis among approximate graphs and reduce the impact of noise behaviors effectively. In addition, the framework can track the network segment evolution and do correlation analysis. Case study shows that the proposal has good practicability, reveals attackers' attack strategies better, and finds the close ties between important attacks.

Event reconstruction for traffic flow simulation based on sequential Monte Carlo

FENG Xiang-wen，YAN Xue-feng

2016, 38(09): 1812-1817. doi:

Abstract ( 149 )

PDF (728KB) ( 396 ) 　　

Combining with the nonlinear and non-Gaussian characteristics of the traffic flow, we propose a traffic flow congestion event reconstruction framework based on the Sequential Monte Carlo (SMC) to deal with the congestion reconstruction problem. The simulation's states can get close to the real scene continuously when the data assimilation model assimilates the real-time sensor data constantly. The congestion event in real scene can be estimated based on the simulation data. Thus, the simulation model can simulate the congestion via different particle simulations and finally reconstruct the congestion event. Experimental results show that the framework can detect and reconstruct the congestion event on the real road network; the average error of the start position is 17m, while the average coverage rate of the congestion range is 82%.

A real-value negative selection algorithm in WSNs intrusion detections

ZHANG Feng-bin,YANG Qiu-jie,XI Liang

2016, 38(09): 1818-1822. doi:

Abstract ( 108 )

PDF (666KB) ( 381 ) 　　

The negative selection algorithm in intrusion detections (IDs) for wireless sensor networks (WSNs) generally adopts the r-continuous binary string matching mechanism, which leads to low detection rate and cannot reflect the dynamic features of WSNs in a period of time. Aiming at the problem, we present a real-value negative selection algorithm called RNS-WSNs, which uses the change rate of the attribute value during an interval of time as the antigen and antibody, and the Manhattan distance between the two vectors as the affinity. Simulation results on network simulator 3 show that the real-value negative selection algorithm has higher detection rate under the same energy consumption and false positive rate.

A depth image enhancement algorithm based on improved anisotropic diffusion

ZHENG Chuan-yuan，LI Liang-fu，XIAO Zhang-shu，LU Cheng

2016, 38(09): 1823-1829. doi:

Abstract ( 167 )

PDF (1157KB) ( 502 ) 　　

Depth images have some problems such as edge mismatching, invalid pixels and noise, which are induced by the measuring distance principle. We present a depth image enhancement method based on the improved anisotropic diffusion algorithm. Firstly, we correct the position relationship between the depth image and the color image, and choose more frames to preprocess the multi-frame mean filtering according to the time continuity. Secondly, we build a depth image model with 4-neighborhood by introducing the idea of weights to the color image. The anisotropic diffusion is conducted on the depth image under color image guidance and the holes are filled. Finally, we use the improved adaptive median filtering smooth algorithm to remove image noise. Experimental results show that the proposed method can effectively repair the black holes composed of invalid pixels in the original depth image and remain the detailed information of the object’s edge in the depth image while suppressing the noise.

Adaptive Retinex image enhancement based on domain transform filter

TU Qing-hua,DAI Sheng-kui

2016, 38(09): 1830-1835. doi:

Abstract ( 177 )

PDF (1193KB) ( 449 ) 　　

In order to improve the brightness and contrast of low illumination images, we propose a new color image enhancement method based on Retinex theory. Firstly, we evaluate the illumination image of the V component by domain transform filtering in the HSV space based on the Retinex theory. We divide the V component by illumination to get the reflection image. Then, we enhance the illumination image by the adaptive Gamma correction and increase the contrast of the illumination image by the CLAHE. Finally, we multiply the enhanced illumination image by the reflection image and then transform the enhanced image into RGB space. The proposed method can achieve more natural enhancement effect, suppress strong light enhancement and highlight image details. Furthermore, the proposed method overcomes the drawbacks of low contrast, color distortion, over enhancement and halo artifact, which exist in the former methods. It can also deal with many kinds of images, such as high dynamic range images (HDR), non-uniform illumination images and low exposure images. Experimental results demonstrate that the proposed method can achieve better visual quality than traditional methods.

Image segmentation based on fractional-order Darwinian particle swarm optimization

YU Sheng-wei，CAO Zhong-qing

2016, 38(09): 1836-1842. doi:

Abstract ( 166 )

PDF (759KB) ( 374 ) 　　

Image segmentation mainly extracts the objectives users are interested in, and it is the basis for image classification and pattern recognition. We present a novel image segmentation method based on fractional-order Darwinian particle swarm optimization, called FODPSO . The algorithm utilizes the fractional calculus strategy to control the convergence of particles and is able to determine the n-1 optimal for n-level threshold on a given image. Compared with the APSO and the CFPSO algorithms, testing results show that the FODPSO algorithm can enhance the performance in terms of convergence speed, stability, solution accuracy and global optimality, and greatly overcome the shortcomings of traditional methods, such as local optima and slow convergence speed. Hence, the FODPSO is applicable to practical projects.

An improved VIBE algorithm based on QPSO

WANG Ji-zhou1,2,LU Chang-hua1,JIANG Wei-wei1

2016, 38(09): 1843-1848. doi:

Abstract ( 127 )

PDF (779KB) ( 309 ) 　　

Review attachment

Compared with the traditional background subtraction model, the VIBE algorithm does not require an estimation of a probability function for background pixels or a sequence of training samples. So there is a good balance between computation complexity and accuracy performance for the algorithm. It can be usually used in real time video embedded sequence surveillance systems. The resolutions of a video sequence can be changed under some circumstances. However, the fixed parameters method adopted in the traditional VIBE leads to degraded detection accuracy. We present an improved VIBE algorithm which obtains optimal parameters using the QPOS algorithm during initialization in order to improve the generalization ability of the VIBE. Experimental results verify the proposed algorithm quantitatively and qualitatively, which can greatly improve the accuracy performance.

Traffic flow estimation and traffic state detection based on the space-time map

WU Zhi-fang1,LIU Xin2

2016, 38(09): 1849-1857. doi:

Abstract ( 140 )

PDF (878KB) ( 458 ) 　　

We propose a novel traffic flow estimation and traffic state detection approach. Firstly, the detection line used to compute the space-time map is set via man-machine interaction, and the processes such as edge detection, image segmentation and so on are then performed in the space-time map. The traffic flow for a period of time is estimated by using the information of edge, shape and the rate of road occupation. In addition, the current traffic state is divided into three levels: clear, crowd and jam by using the difference between edge information in the space-time map. Experimental results show that the proposed approach can effectively compute the traffic flow and detect the traffic state, and it is practical for commerce.

KPCA face recognition based on geodesic distance

LIN Ke-zheng,WEI Ying,ZHONG Yan,LI Hui

2016, 38(09): 1858-1862. doi:

Abstract ( 195 )

PDF (624KB) ( 349 ) 　　

Review attachment

Aiming at the problem that the information of the face detection data is high eigenvector and face recognition is easily affected by expression changes, we propose a kernel principle component analysis (KPCA) face recognition method based on geodesic distance. We extract principal components by the nonlinear method. First, we adopt the KPCA method to map face data to the high dimensional space, and then principal components of the face are extracted in the high dimensional space, where the kernel function is a polynomial kernel. Finally, geodesic distance is introduced to replace the original Euclidean distance as the similarity measure, which can more accurately measure the actual distance between two points, and face recognition rate is less affected by expression changes as well. The proposed method cannot only achieve dimension reduction, but also achieve effective data extraction. Experiments on the ORL face database show that the recognition rate of the proposed method is superior to that of the PCA and the KPCA.

An improved CAMShift tracking algorithm and a face detection framework

YANG Chao,CAI Xiao-dong,WANG Li-juan,ZHU Li-wei

2016, 38(09): 1863-1869. doi:

Abstract ( 107 )

PDF (997KB) ( 411 ) 　　

To make full use of human face videos in temporal and spatial domains and obtain accurate face image sequence for face alignment, we propose a face detection framework in combination with a face tracking algorithm. We use the simple and fast frontal face detection algorithm to detect face images from videos, and then employ the detection results to initialize, check and adjust the face tracking results. However, the CAMShift tracking algorithm is vulnerable to the negative impact of skin color areas, which leads to more redundant information. To solve this problem, we propose an improved CAMShift-KLT algorithm, which utilizes interest points to trace the edge of the face image and acquires accurate face alignment images. Experimental results show that compared with the CAMShift algorithm, the proposed algorithm can obtain more accurate face regions, and meanwhile, it has smaller tracking migration distance, higher hit rate of tracking and tracking effectiveness. Besides, compared with the contrast algorithm, the improved CAMShift-KLT algorithm can better track the face region.

An improved particle swarm optimization algorithm for portfolio based on mean-CVaR model

LI Feng-gang1,2，LUO Lin1,2，CHEN Ya-bo1,2，JIANG Xiang-fei1,2

2016, 38(09): 1870-1877. doi:

Abstract ( 127 )

PDF (499KB) ( 354 ) 　　

The particle swarm optimization (PSO) has a strong capability of global search, but it easily falls into global extremum. Besides, it can only solve the continuity problems. In order to improve these problems, we present a discrete complex method of local search, which can enhance the search capability when solving discrete problems. Since the PSO is easy to fall into local minimum, we introduce the adaptive particle migration operation to ensure the diversity of particles and avoid falling into local convergence effectively. Simulation experiments adopt the CVaR risk measurement method to measure portfolio risks, and establish an optimization mean-CVaR model which contains the transaction costs and the limitation proportion of the assets. Experimental results verify the effectiveness of the algorithm. Compared with other algorithms, the improved PSO algorithm has higher precision and stability.

Current Issue

Author center

Review center

Online journal