Computer Engineering & Science

A multilevel instruction Cache structure

for array-based many-core processor

CHEN Yifei,LI Hongliang,LIU Xiao,GAO Hongguang

2018, 40(04): 571-579. doi:

Abstract ( 162 )

PDF (1254KB) ( 336 ) 　　

Review attachment

Because of their high computational performance and energy efficiency ratio, arraybased manycore processors have been widely used in the high performance computing field. To build the future high performance computing systems, processors must solve the severe challenge of ‘memory wall’ and the core synergy problem. The kernel of the common arraybased manycore processor uses a singlethreaded structure to reduce the overhead, but the memory access demand is higher. In this paper, the hardware simultaneous multithreading technology is introduced into the single core structure. Aiming at the problem that the hit rate of the L1 instruction Cache is significantly reduced with the increase of the number of threads, this paper proposes an instruction Cache structure (Redundancy Instruction Cache) for the arraybased manycore processors. Based on this structure, a FIFO replacement strategy and an analogous LRU replacement strategy are proposed. Experimental results demonstrate that, based on the optimized Cache structure design, the instruction Cache miss rate of the dualthreaded structure is decreased by 25.2% and the CPI performance is increased by 30.2%.

Accelerating high throughput cryptography algorithm

on many-core computing platforms

FU He1，LI Chunjiang2，WANG Hao2，XIE Yongfang1

2018, 40(04): 580-586. doi:

Abstract ( 128 )

PDF (545KB) ( 290 ) 　　

Review attachment

Many-core processors are suitable for accelerating high-throughput computingintensive applications, while cryptography algorithms require large amounts of mathematical calculations and hence require high-throughput computing platforms. This paper proposes a coarse-grained parallel acceleration framework for many-core computing platforms. The framework does not take into account the internal computing process of the algorithm, and allocates the data to the many-core coprocessor for execution in units of computational functions. Based on the MIC many-core accelerator, the framework adopts the three-level parallel structure and task allocation mechanism, and develops the parallelism of the high-throughput cryptography algorithm. Experimental results for a variety of algorithms show that the framework can make full use of many-core computing platforms to achieve high-throughput encryption and decryption processing coarse-grained parallelism.

Memory optimization of Spark parallel computing framework

LIAO Wangjian1,2，HUANG Yongfeng1,2，BAO Congkai1,2

2018, 40(04): 587-593. doi:

Abstract ( 175 )

PDF (545KB) ( 349 ) 　　

The cluster parallel computing framework represented by Spark is widely used in the big data and cloud computing, and its performance optimization is the key in applications.The paper analyzes the framework of the execution process and memory management mechanism of Spark framework. Combining the characteristics of Spark and JVM memory management,three strategies are proposed:(1) Serialization and compression are used to reduce the cache data size and reduce the occupied memory space, then reduce the GC consumption, thus improving the performance.(2) The running memory size is reduced within a certain range, and recalculation replaces the cache, thus improving the performance. (3)By adjusting the proportion of the old generation and new generation of the JVM,the ratio of Spark computing and cache space,and other memory allocation parameters, the performance can be improved greatly.Experiments show that the serialization and compression can reduce the cache space by 42%,the performance is increased by 21% when the submitting memory is reduced from 1 000 MB to 800 MB, and optimizing the memory ratio can improve the performance by 10% to 30%.

A real-time compression engine

for network communication

REN Xiujiang1,WANG Lei1,ZHOU Jianyi1,XIE Xianghui2

2018, 40(04): 594-601. doi:

Abstract ( 128 )

PDF (1412KB) ( 340 ) 　　

Review attachment

With the development of integrated circuit technology, the performance of the interconnection network in highperformance computing systems is increasing much slower than the performance of the processor. The bandwidth of interconnection network has become one of the bottlenecks restricting the performance improvement of highperformance computing. From the point of view of reducing the amount of data injected into the network, the realtime compression technology for network communication is studied. A hardware realtime compression engine RTCE is designed, which can automatically compress the packets during their transmission. Experiments on SPEC2006, Graph500 and CFD show that the compressed network packet data are reduced by 32.8%~48.7% on average.

A dynamic binary translation optimization

method based on frequency statistics

LI Nan，PANG Jianmin，SHAN Zheng

2018, 40(04): 602-608. doi:

Abstract ( 145 )

PDF (660KB) ( 301 ) 　　

In a dynamic binary translation process, putting the code with high executing frequency into the translating cache for long time and extending the amount of code executed by the translator at one time is an effective way to reduce the context switching overhead and improve the system efficiency. Therefore, we propose the optimization clue of “identifying the hot code→constructing the super block→improving the T-Cache management method”, and design a hot code identification algorithm based on frequency statistics, which regard the basic block whose executing frequency is more than the preset threshold value as well as its subsequent basic blocks as the hot codes. Besides, we propose an idea of constructing super blocks generated by physically chaining the basic blocks in the hot code and hence owns a larger capacity in order to support the T-Cache system. Based on these, the original search method and replacement strategy of the T-Cache system are improved. Experimental results verify the correctness and the performance of the optimization method. On the domestic ShenWei processor platform, the method makes the standard test set SPEC 2006 achieve an average performance improvement of 9.34%.

A high-efficient parallel neuronal network simulation

algorithm based on synaptic ion channel kinetic

PENG Xia，WANG Zhijie,HAN Fang，GU Xiaochun

2018, 40(04): 609-615. doi:

Abstract ( 147 )

PDF (671KB) ( 301 ) 　　

Review attachment

In the field of computational neuroscience, the parallel simulation of largescale neuronal networks plays a very important role in exploring and revealing the information processing mechanism of the brain. In order to accelerate the largescale neuronal network simulation,an efficient parallel neuronal network algorithm based on synaptic neurotransmitterreceptor ion channel kinetic characteristics is proposed. By analyzing the information transmission mechanisms and the neurotransmitterreceptor ion channel kinetic characteristics of the chemical synapse, we propose an idea of the separation of presynaptic neurotransmitter computing from postsynaptic receptor computing. The new idea enhances the independence of the two concentration calculation: neurotransmitter emitted by presynaptic neurons, and the bound postsynaptic receptors.During each synaptic current computing, the new idea reduces the coupling degree of the presynaptic neuron and the postsynaptic neuron.Based on the new idea, we design a parallel algorithm for parallel neuronal network simulation based on synaptic ion channel kinetic.Simulation results show that the algorithm is efficient.

Context-aware personalized metric

embedding for next POI recommendation

XIAN Xuefeng1,CHEN Xiaojie2,ZHAO Pengpeng2,YANG Yuanfeng1,Victor S.Sheng3

2018, 40(04): 616-625. doi:

Abstract ( 231 )

PDF (1275KB) ( 425 ) 　　

Review attachment

With the rapid development of LocationBased Social Networks (LBSN) recommender system, PointofInterest (POI) recommendation has become a hot topic. The research of POI recommendation aims to recommend POIs for users and to provide services such as advertising and potential customer discovery. Due to the high data sparseness of users’ checkins, POI recommendation faces a great challenge. Many researches combine geographical influence, time awareness, social relevance and other factors to improve the performance of POI recommendation. However, in most POI recommendation researches, the periodicity of mobility and the user preference varying with the change of contextual scenario have not been excavated in depth. Moreover, there exists high data sparseness in Next POI recommendation. Based on the above considerations, this paper proposes a Contextaware Personalized Metric Embedding (CPME) algorithm, which is based on the user's periodic behavior pattern. It takes into account the contextual information of users’ checkins, which can enrich the valid data, alleviate the data sparseness, improve the recommendation accuracy, and further optimize the algorithm to reduce the time complexity. The experimental analysis on two real LBSN datasets show that the proposed algorithm has better recommendation performance.

A weed monkey algorithm for optimal sensor placement

YIN Hong1,DU Guozhang1,PENG Zhenrui1,MA Li2

2018, 40(04): 626-635. doi:

Abstract ( 139 )

PDF (1206KB) ( 312 ) 　　

Review attachment

The simple monkey algorithm has the shortcomings of initial distribution randomization, fixed climb step length and incapability of inheriting the characteristics of excellent monkeys, which limits the solution performance of the algorithm. In order to solve the above problems, a weed monkey algorithm for optimal sensor placement is proposed. Normal distribution is used to enhance the diversity of initial monkey populations. Selfadaptive climb step is introduced to improve the solution accuracy and convergence rate. Both weed reproduction evolution and competitive exclusion mechanism are used to enlarge the influence of excellent monkey on the monkey offspring population. The commonly used Modal Assurance Criterion (MAC) is used as the objective function of optimal sensor placement. The commonly used 8 test functions and 3 algorithms are used to verify the feasibility and effectiveness of the algorithm. Finally, the optimal sensor placement is carried out on the gelatinize mechanism of bag bottompasting machine. The results show that, compared with the simple monkey algorithm, the solution accuracy of the weed monkey algorithm precision is greatly improved.

A LBS location hiding method based

on virtual users in real environment

CHEN Si,ZHAO Ming,ZHANG Jinjian

2018, 40(04): 636-641. doi:

Abstract ( 130 )

PDF (659KB) ( 242 ) 　　

Review attachment

At present,Location-Based Service (LBS) has been widely developed and greatly facilitated people’s lives. However, location exposure may reveal users’ privacy,which makes the protection of location privacy increasingly important.Although current technologies can solve the privacy protection concern to a large extent, people still expect better protection of location privacy. In this regards, a privacy protection method based on real environment, named DummyEx, is proposed. Through generating a series of virtual users and building realistic movements, DummyEx can protect the user privacy well. Experiments show that this method can reach very good results in terms of effectiveness and safety.

Interconnection gateway between underwater

acoustic sensor networks and VHF communication

networks based on embedded technology

YANG Qiuling1,2,XIE Bingshan2,3,JIN Zhigang2,SU Yishan2,HUANG Xiangdang1

2018, 40(04): 642-648. doi:

Abstract ( 152 )

PDF (898KB) ( 349 ) 　　

Review attachment

Aiming at the long delay and high cost when water ship users monitor the underwater acoustic sensor networks, based on the characteristics of underwater acoustic sensor networks and VHF communication equipments commonly used in water ship users, we use embedded technology to design a water ship usersoriented gateway between underwater acoustic sensor networks and VHF communication networks, including its hardware and software systems. Simulation results show that the embedded gateway can realize the interconnection and communication between underwater acoustic sensor networks and VHF communication networks, which has a high theoretical and practical value.

Outdoor positioning technology based on telecom data

LIAO Shanhe,ZHAO Qinpei,LI Jiangfeng,RAO Weixiong

2018, 40(04): 649-659. doi:

Abstract ( 160 )

PDF (1001KB) ( 306 ) 　　

Review attachment

More and more mobile computing applications rely on location information to provide locationbased services, and outdoor positioning technology for mobile devices is critical. The widely adopted method is currently GPS, but the GPS position information of the mobile device is dependent on the GPS sensor of the mobile device such as a mobile phone. Although telecom operators provide users with calling and data services, they cannot obtain the users’ precise GPS positions.In view of this situation, we propose to use the connection signal data (telecommunication data for short) between the mobile terminal and the telecom base station to achieve the positioning service of the mobile device. Considering that telecom operators have accumulated vast amounts of telecom data, it is possible to allow operators to acquire user locations by studying outdoor positioning technologies based on telecom data. This paper extracts the telecom feature data, takes the GPS location of the mobile phone as the tag data, and studies five d outdoor positioning algorithm based on machine learning model. The GPS coordinate points are successfully predicted from the base station signal data. A large number of experiments are carried out to compare positioning accuracy,computation time, positioning accuracy of different data collection modes, positioning accuracy of different features among these algorithms. Besides, the effect of postprocessing on positioning accuracy is explored. Finally, the experiments show that the random forest classification model based on gridding is the best model and can achieve an average error of 15 to 20 meters and a median error of 10 meters. Compared with the previous regression algorithms, the proposal improves the accuracy by 39.46% and 54.28% on the 2G and 4G data respectively, thus achieving the positioning accuracy close to GPS positioning.

An identity-based signature scheme

with message recovery over lattice

KAN Yuanping

2018, 40(04): 660-664. doi:

Abstract ( 126 )

PDF (409KB) ( 271 ) 　　

Review attachment

With the development of quantum computers, the latticebased cryptography becomes one of the cryptographic systems which can resist its attacks. This paper proposes an identitybased signature scheme with message recovery over lattice, which has the characteristics of high efficiency, counterfeit attack and short signature length and can be widely used in the field of lightweight authentication.

A binocular stereo-view 3D reconstruction algorithm

based on contour extraction and depth screening

MA Jianshe,WEI Yunfeng,SU Ping

2018, 40(04): 665-672. doi:

Abstract ( 187 )

PDF (1100KB) ( 427 ) 　　

Review attachment

Aimming at the phenomeon that the 3D model, which is obtained by the 3D reconstruction algorithm based on depthmapfusion, is easily influenced by depth information error, a binocular stereoview 3D reconstruction algorithm based on contour extraction and depth screening is proposed. A standard checker board is adopted to calibrate the binocular 3D reconstruction system. Canny operator is used to detect the boundary of the target object, and morphological corrosion andexpansion methods are comprehensively used to extract the continuous boundary in a specified direction.On this basis,the depth information of the target object is filtered and interpolated in order to calculate the continuous depth value.The results show that, compared with the conventional 3D reconstruction algorithm, the target object of this algorithm has higher surface integrity, and the background hot pixels surrounding the target object are removed.

An improved visual background extraction

algorithm for foreground detection

CHEN Shu,DING Baokuo

2018, 40(04): 673-680. doi:

Abstract ( 130 )

PDF (1174KB) ( 368 ) 　　

Review attachment

Visual background extraction algorithm (ViBe) for foreground detection has the disadvantage that there is ghost and it is difficult to eliminate it for a long time, so an improved visual background extraction algorithm is proposed. Firstly, being introduced in the first sequence with n video frames, the OTSU algorithm can calculate an adaptive threshold for the frame difference method so as to get a more accurate foreground region. Secondly, a sample image is synthesized by using the first n frame images without foreground regions. Finally, the model is initialized with the extended neighborhood range in the synthesized sample images, and the ViBe background model can be updated by the extended domain. The proposed algorithm is compared with various classical algorithms in a large number of video databases. Simulation results show that the improved ViBe algorithm can quickly eliminate the effect of ghost on foreground detection, so the foreground detection is more accurate.

RGB-D saliency detection based

on multiple perspectives fusion

ZHANG Zhihua，LIU Zhengyi

2018, 40(04): 681-689. doi:

Abstract ( 118 )

PDF (1464KB) ( 308 ) 　　

Review attachment

Saliency detection is an important part of computer vision and many creative studies have been proposed. However, most existing methods focus on detecting salient objects in 2D images and they cannot be used to detect salient objects in RGBD images. In the paper, a new RGBD saliency detection method based on multiple perspective fusion is proposed. The method considers multiple complementary relationships including color and depth, global and local as well as foreground and background to detect salient objects. Firstly, it combines color feature and depth feature to construct the graph model. Secondly, it computes the global and local compactness feature based on the graph model to get the initial saliency map. Thirdly, it uses the boundary connectivity based on color and depth to calculate the background probability as the weight on initial saliency map and then produces the background optimization saliency map. Finally, it extracts the foreground inquiry node from background optimization saliency map and then uses the manifold ranking algorithm to get the final saliency map. Experiments show that the fusion policy of multiple perspectives can effectively improve the accuracy of saliency detection and outperforms other state-of-the-art algorithms in RGBD1000 benchmark.

A tracking and registration method based on improved KCF

YONG Jiu1,2，WANG Yangping1,2，LEI Xiaomei3

2018, 40(04): 690-698. doi:

Abstract ( 193 )

PDF (1393KB) ( 344 ) 　　

Review attachment

Since 3D registration is easily affected by the environment and target tracking and detection algorithms are timeconsuming with low precision, we propose a tracking and registration method based on an improved kernerlized correlation filter (IKCF). The method includes four steps: (1) utilizing the regularized least squares classifier for sample training to obtain kernel correlation filter and position information; (2) searching scale kernel correlation filter and the maximum of position output to achieve the detection of the scale and position; (3) updating the model by referring to the MOSSE tracker; (4) adopting the oriented FAST and rotated BRIEF (ORB) to do feature extraction and matching, and then calculate the registration matrix. We utilize 6 sets of data in the Visual Tracker Benchmark datasets and video sequence to simulate. The results show that the IKCF generally outperforms the KCF, trackinglearningdetection (TLD), structured output tracking with kernel (Struck) and compressive tracking (CT) in precision, success rate and efficiency when rotation, scale variation, partial occlusion, illumination or motion blur occurs. Besides, the target position is highly aligned with OpenGL cube registration, and the augmented reality (AR) system based on IKCF is more realtime, stable and robust.

An adaptive fractional-order partial

differential image enhancement model

LI Weikai1,WANG Zhengxia1,JIANG Wei2

2018, 40(04): 699-706. doi:

Abstract ( 147 )

PDF (1045KB) ( 371 ) 　　

Review attachment

Aiming at the shortcoming that the conventional adaptive image enhancement algorithms based on fractionalorder partial differential equations have not good enhance performance on the dark texture region, taking into account the different sensitivity of the human eye to the light sensation, the impact of brightness on vision is combined with the conventional adaptive image enhancement algorithms based on fractionalorder partial differential equations. A new adaptive fractionalorder partial differential image enhancement model is established by using the gradient and gray value as parameters, which improves the shortcoming of the previous model in enhancing the dark region image. The average gradient of the enhanced image is increased significantly and the image’s visual effect is improved superiorly, which shows that the proposed algorithm has some validity.

A video quality assessment model based

QIU Liang1，XIA Huiming1，CHU Jiuliang2

2018, 40(04): 707-711. doi:

Abstract ( 172 )

PDF (482KB) ( 280 ) 　　

Review attachment

Video quality estimation algorithms can be used in multimedia network system optimization and the improvement of video encoding and decoding algorithms, thus they have become a popular orientation of image quality estimation in the past few years. Based on the feature similarity index for image quality assessment (FSIM), using the temporal correlation among video local multiframes, and adopting a new 3D gradient operator to calculate the gradient similarity matrix between the original video sequence and the distorted video sequence, we present a new video quality assessment model based on gradient similarity of temporal domain (TGSMFSIM). The test results on LIVE video data show that the proposed model conforms well to subjective video assessment, and the indexes of SROCC and PLCC are better than those of the VSSIM and VQM, two widely used video assessment algorithms.

An improved genetic algorithm

for site selection problem

ZOU Guixiang,ZHANG Feizhou

2018, 40(04): 712-722. doi:

Abstract ( 236 )

PDF (905KB) ( 709 ) 　　

Review attachment

Site selection problem is one of the most important research fields of modern geographic information resource distribution. Genetic algorithm with strong universality and robustness can solve this problem. A common approach is to use a binarycoded genetic algorithm to locate sites on a grid map. To deal with the early maturity of the standard binarycoded genetic algorithm, this paper studies two methods that use different operators and observation concepts to solve early maturity problem of the standard binarycoded genetic algorithm, chooses to introduce the diversity measure and niche technology to improve the genetic algorithm, further explores the improvement of accuracy, online performance function and offline performance function of genetic algorithm in solving the site selection problem, and proposes a method to improve the niche technique in order to make every individual in the genetic group involved in genetic operation and avoid the situation where two identical individuals are involved in crossover operation. Finally, in the site selection experiment, the improved niche genetic algorithm is combined with the diversity measure to improve the performance of genetic algorithm successfully.

A dual population based molecular

kinetic theory optimization algorithm

FAN Chaodong1,REN Ke1,YI Lingzhi1,XIAO Leyi2,ZHU Biaoming1,LI Jie1

2018, 40(04): 723-730. doi:

Abstract ( 129 )

PDF (744KB) ( 291 ) 　　

Review attachment

The molecular kinetic theory optimization algorithm has the disadvantages of poor optimization accuracy and ease of being stuck into local extremum. A new molecular kinetic theory optimization algorithm with two populations is proposed. In this algorithm, the whole population is divided into two subgroups: the elite subgroup and the ordinary subgroup. The ordinary subgroup uses the search strategy of traditional molecular kinetic theory optimization algorithm to do a wide range search. Through cooperation, the elite subgroup does a refinement search to improve the convergence accuracy of the algorithm. Information exchange among subgroups is completed through individual migration. Two subgroups complete the search process by cooperation. Test results show that the improved algorithm significantly improves the convergence speed, accuracy and stability.

Current Issue

Author center

Review center

Online journal