Computer Engineering & Science

Dynamic task scheduling and optimization of data flow program

YANG Sheng-zhe,YU Jun-qing,TANG Jiu-fei

2017, 39(07): 1201-1210. doi:

Abstract ( 156 )

PDF (1392KB) ( 389 ) 　　

Review attachment

In order to solve the availability problem of data flow programming models, which can be used for flow applications of dynamic data exchange rate under the premise of taking into account the parallelism of programs, we design and implement a dataflow compiling system which combines dynamic scheduling and static scheduling. The compiling system takes COStream programs as its input. It analyzes the program and uses communications with dynamic rate to divide stream applications into coarse-grained sub graphs. Then in sub graphs, static optimization is used. The system assigns the computing units of sub graphs to CPU cores based on the workload of each computing unit and the utilization of computing resources. After stage division, the system allocates each computing unit to its corresponding pipeline stage. At runtime, each sub graph starts a thread on each processor core. The optimization of communications between threads avoids the expenses of multiple threads’ reading or writing memory at the same time. We use semaphore to control the synchronization between threads and dynamically schedule each sub graph based on the dynamic rates of communications and the states of threads. Then we construct the software pipeline scheduling and create multi-thread codes. We choose common X86-64 as the experiment platform to test programs and analyze the results. Experimental results show that the compiling system can realize dynamic stream applications. It expands the usability of the system and increases the speedup.

Large-scale kinetic Monte Carlo simulation

of metal material under irradiation

SONG Meng-zhao,FENG Yang-de

2017, 39(07): 1211-1218. doi:

Abstract ( 267 )

PDF (825KB) ( 419 ) 　　

Review attachment

Kinetic Monte Carlo (KMC) simulation can be used to simulate the radiation effects and defect diffusion of reactor materials in nuclear power plants, which is helpful to understand and predict how microstructures and mechanical properties evolve under irradiation. We present a parallel simulation method of vacancy diffusion based on synchronous sub-lattice (SL) algorithm. We adopt a dynamic communication data update method and an adaptive time synchronization method to decrease communication frequency and load so as to achieve both accuracy and parallel efficiency. Simulation results show that the serial implementation by the SL algorithm saves the total execution time by 60.31% in comparison with the original algorithm, and the parallel implementation achieves 39X speedup with 80 cores. The algorithm also performs well on large scale problems.

Impact of Linux kernel parameters on Spark workloads

WANG Li1,2,WANG Jing1,2,ZHANG Wei-gong2,3,QIU Ke-ni2,3,LU Ke-zhong4

2017, 39(07): 1219-1226. doi:

Abstract ( 172 )

PDF (541KB) ( 351 ) 　　

Research on the performance of Spark becomes a hot topic, however, optimization strategies are mostly used on the application level instead of system level. As the first software above hardware, the operating system plays a fundamental role in the performance of hardware. The Linux kernel provides abundant parameters as the interface to optimize the performance of the system. However, in practice, kernel parameters have not fully played their roles. Most people use their default values rather than change them to fit the specific environment. However, our experiments prove that the default values are not always the best choice, and sometimes it is even the worst. We define the concept of "influence ratio", and put forward a method based on the concept to understand the influence of parameters on Spark applications by analyzing the kernel functions. According to the features of the memory computing of Spark, we analyze the influence of Linux kernel parameters on several typical Spark workloads from the aspects of Transparent Huge Page and NUMA, which closely relates to the use of memory, and then give some conclusions. We hope that the analysis and conclusions can provide some experience of tuning kernel parameters reasonably for the Spark platform.

Measuring performance isolation of cloud file systems

ZHOU Li1,ZHANG Tian-ming1,REN Zu-jie1,SHI Wei-song2,WAN Jian1,3,ZHANG Ji-lin1,LI You-hui-zi1,YE Zheng1

2017, 39(07): 1227-1233. doi:

Abstract ( 136 )

PDF (1036KB) ( 344 ) 　　

Review attachment

With the development of cloud computing, cloud file systems, such as GFS and HDFS, play an increasingly significant role within the backend of cloud infrastructures. Various benchmarks for evaluating the performance of cloud file systems have been proposed. However, most of these benchmarks focus on the traditional performance metrics, like input/output operations per seconds (IOPS) and throughput, while ignoring the measurement of the performance isolation in multi-tenant environments. Because the I/O workloads in clouds are heterogeneous and dynamic, it is a challenge to measure performance isolation of cloud file systems. We propose a novel model for measuring the isolation, and implement this model in a benchmark suite, called Porcupine. The Porcupine schedules request streams following a workload-specific arrival pattern. These request sequences conform to the characteristics of workloads in cloud file systems. Porcupine can be used to deploy benchmark test on cloud file systems conveniently. Experimental results demonstrate that isolation measurement model is effective.

Application of OpenMP multicore technology

in granule hydrodynamics method

WEI Chao-lei1,YAN Min2,ZHAO Fang1

2017, 39(07): 1234-1240. doi:

Abstract ( 144 )

PDF (739KB) ( 322 ) 　　

Review attachment

In order to improve the computational efficiency of the granule hydrodynamics method (GHM) model, we propose a multi-thread parallel computing model based on OpenMP for multi-core computers. We analyze the primary computational modules and extract the ones that can be paralleled. We then use the OpenMP compiler directive statements and runtime library to realize the parallel computation of the solving process in the GHM model to solve the granules motion equations by using the numerical integration method. Experimental results on Windows platform with 4 cores and 8 threads CPU show that the parallel program's speedup can reach 2.5, which proves that the OpenMP technology can improve the computational efficiency of the GHM significantly, and that it has advantages of easy programming and good portability as well.

A 3D FDTD parallel algorithm based on heterogeneous

computing and its application in electromagnetic simulation

ZHOU Lan-hua1，FU Bin1,2，Li Ren-fa1，LIU Xin-zhong1，HUANG Jing1

2017, 39(07): 1241-1248. doi:

Abstract ( 186 )

PDF (806KB) ( 392 ) 　　

Review attachment

The finite difference time domain (FDTD) method is one of the important methods for solving Maxwell’s equations in electromagnetics, and it is widely used. But it is time consuming when applied to the simulation of electrically large targets. In order to solve this problem, we take advantage of the parallel processing capacity of the graphics processor unit (GPU) together with the compute unified device architecture (CUDA). Taking a low pass filter as an example and using five million targeting grids, we realize three-dimensional FDTD high performance speed calculation with time-domain convolution perfectly matched layer (CPML) absorbing boundary. Experiments are carried out on the Quadro 4000 and Tesla M2050 GPUs with the Fermi architecture, whose error rate is within the range of 10-4, and can obtain 36 and 55 times faster speed than the CPU of the same period. The results show that the method has the characteristics of high precision, high efficiency, versatility and strong practicability.

A novel big data order-preserving matching

algorithm based on similarity filtration

JIANG Wen-chao1,LIN De-xi1,SUN Ao-bing2,WU Xiao-qiang2

2017, 39(07): 1249-1256. doi:

Abstract ( 129 )

PDF (1478KB) ( 325 ) 　　

Review attachment

Data order-preserving matching is a key problem in big data applications. Data matching can be transformed into character or number matching through abstraction or reduction. We present a novel data order-preserving matching algorithm based on similarity filtration which includes three steps: data transformation, data reduction and similarity computation. Firstly, to reflect the relation of convex growth (descent) or concave growth (descent), the data is transformed into a binary string according to the relationship among the three neighbor numbers. Secondly, to compute the similarity more accurately, the data array and pattern array are both reduced into stable interval ［0,1］. Finally, according to the variety range of the relevant nodes between data array and pattern array, the similarity can be computed and sorted. Theory analysis shows that the time complex is O(n), which is lower than the algorithm presented by Cho et al. Furthermore, our algorithm can overcome the deficiencies of the algorithm presented by Cho et al. including the incontrollable min-max values and the subsection inconsistency. Based on the similarity computation, all the sub-strings can be sorted for data retrieval or searching in big data applications.

FPGA implementation of high precision digital BDPSK

communication system based on improved Costas loop

XING Fang-cheng,WANG Su-zhen,WANG Tao,ZONG Wei-hua

2017, 39(07): 1257-1263. doi:

Abstract ( 118 )

PDF (1031KB) ( 317 ) 　　

Review attachment

With the development of software-defined radio digital communication systems, taking the field-programmable gate array (FPGA) as the control core has become a hotspot of the study on digital communication systems. In a high precision BDPSK communication system, the traditional way of acquiring a carrier based on the Costas loop can consume a large number of resources of FPGA devices. We introduce an improved Costas loop structure which can reduce the number of multipliers and adders and improve operation speed. In this 14-bit high precision intermediate frequency transmitting-receiving system, all basic units are integrated on a single FPGA chip, which can improve the integration degree of the system and its reliability. Meanwhile, all parameters and output data width of the system are programmable, which has good application prospects.

A civil aviation meteorological information receiving model

MA Jun1,2,LI Xiang1,GUO Hong1

2017, 39(07): 1264-1268. doi:

Abstract ( 138 )

PDF (1750KB) ( 332 ) 　　

Review attachment

Based on civil aviation business processes, we systematically study the original civil aviation meteorological information processing and transmission system. The research indicates that with the development of the civil aviation business, the original system is unable to meet the real-time performance demand, so we propose a new civil aviation meteorological information receiving and transmission scheme. Multi-node parallel processing technology, load balancing and main-memory database are adopted in this new model to facilitate the process of decoding, quality inspection and the storage and distribution of meteorological information. The new model has already been successfully applied in the civil aviation meteorological information processing and transmission proving system.

Design and implementation of a single precision inverse

based on the first order Taylor series look-up table method

YAN Min1,HE Xin1,LI Sha1,ZHU Long1,ZHAO Li2

2017, 39(07): 1269-1272. doi:

Abstract ( 239 )

PDF (656KB) ( 457 ) 　　

Review attachment

Based on the analysis on the existing problems in the single precision inverse algorithm, we design and implement a single precision inverse based on the first order Taylor series. Compared with the traditional algorithm, the resource consumption, operation cycle and efficiency are improved. The main logic module of this floating point algorithm is composed of a 24 bit integer adder, a ROM and a 24 bit multiplier. The mantissa in range of ［1, 2) is divided to 4096 intervals on average, and the reciprocal square of the starting point of each interval is stored in a lookup table. Then the first order Taylor series is applied to compute the inverse value of each interval. Simulation results are consistent with the theoretical results, which meets the accuracy requirement of the single precision. This algorithm has been successfully applied to the third generation of GPU JM7200.

A community detection algorithm based on node

dependence and similar community fusion

NIE Xiang-lin1,2,ZHANG Yu-mei1,2,WU Xiao-jun1,2,WU Xia1

2017, 39(07): 1273-1280. doi:

Abstract ( 129 )

PDF (1076KB) ( 323 ) 　　

Review attachment

As one of the topological properties of complex networks, the community structure has important theoretical and practical significance. We propose a community detection algorithm based on node dependence and the fusion of similar communities. The algorithm firstly divides the whole network into several local networks with large average clustering coefficients, thus constructing a skeleton of the complex networks. Then according to the definition of connectivity, the algorithm continuously absorbs the edge nodes of the community and small communities into the backbone network until all the nodes are accurately allocated to the community. This algorithm is applied to Zachary Karate Club network and the dolphin social network, and compared with the Girvan-Newman algorithm (GN) and Newman fast algorithm (NFA). The results show that our algorithm can effectively classify fuzzy edge nodes and the result of the community division has high accuracy.

Data center network service planning

based on backbone networks

LI Yao-fang1,2,WU Bin2,XIAO Jie2,LI Wei1,LIU Qi1,SUN Ying-guang1

2017, 39(07): 1281-1287. doi:

Abstract ( 139 )

PDF (1034KB) ( 301 ) 　　

Review attachment

Data centers are the key of cloud computing. Current data center architectures, which are based on electronic switches, conventional multi-stage switch network, and centralized deployment and management, cannot meet the requirements of survivability, high-availability and design flexibility of high-performance data centers for future cloud services. We optimize data center placement with service routing and protection to achieve survivability and cost minimization. An integer linear program (ILP) is first formulated to achieve optimal design. It integrates preconfigured protection cycle (p-cycle) for fast protection against a single link failure, and a data center replicas and fast service rerouting against a service failure. We then propose a two-step heuristic algorithm for large-size network scenarios. The first step solves data center placement and service routing problem in the failure-free scenario, and the second step takes fast service protection into account. The proposed design is validated by extensive simulations.

A context-adaptive segmentation heterogeneous

RSSI fitting positioning method

CHEN Yong-xi1,LIU Ren-ren1,CHEN Yi-qiang2,WANG Shuang-quan2,JIANG Xin-long2

2017, 39(07): 1288-1294. doi:

Abstract ( 161 )

PDF (774KB) ( 396 ) 　　

Review attachment

Indoor positioning plays an important role in many applications, such as public safety, healthcare, location-based services and so on. How to improve positioning accuracy and the model' adaptivity to the environment becomes a key issue of indoor positioning, where most existing techniques generally use the value of the received signal strength indication (RSSI) to obtain the distance. Given the fact that the traditional logarithmic distance path loss model cannot adapt to the complex indoor environment very well, we propose a context-adaptive segmentation heterogeneous RSSI fitting positioning method. The proposed method firstly utilizes the difference in signal transmission under different application scenarios to divide the RSSI data into several different segments. Then it finds the optimal piece-wise fitting point by RSSI's differentiated characteristics, and selects the optimum function for each seg-ment, enabling the number of segments, segment position, and piecewise functions of all segments to a-dapt to corresponding application scenarios. Finally high accurate RSSI signal fitting is achieved. Exper-imental results show that the proposed method in RSSI fitting can achieve higher accuracy than the traditional single-fitting function, and improve the accuracy of position algorithms significantly.

A cookie-based cross-domain single sign-on scheme

GUO Hao，WANG Guo-cai，LUO Pin

2017, 39(07): 1295-1299. doi:

Abstract ( 149 )

PDF (582KB) ( 310 ) 　　

Review attachment

Aiming at the problem of low efficiency and poor system security due to the multiple authentication of users under multiple application systems, we propose a cookie-based cross-domain single sign-on scheme. Users can login once but access multiple systems in different domains. We provide the overall model of the scheme, analyze the login process and explain the implementation of the cross-domain. The mutual authentication is explained in details, which ensures the legitimate identity of both sides of communication. The management of role-identity is added to reduce the coupling between single sign-on systems and web application systems.

Anomaly detection based on hidden Markov model in videos

LI Juan1，ZHANG Bing-yi1，FENG Zhi-yong1，XU Chao2，ZHANG Zheng3

2017, 39(07): 1300-1308. doi:

Abstract ( 171 )

PDF (867KB) ( 411 ) 　　

Review attachment

Widely used video technology brings a large amount of video data, so it is impossible to detect abnormalities in surveillance video relying on human operators only. Automated abnormal events detection is extremely important for public safety. We propose an abnormal events detection approach with comprehensive consideration of target features and temporal-spatial context. The method exploits the texture of optical flows to describe the rigidity of moving objects. Then, we establish an abnormal events detection model of temporal context based on the hidden Markov model (HMM). Afterwards, radon features of abnormal events are extracted. We also establish the classification model of spatial context by using the HMM based on pre-classification results obtained by the support vector machine (SVM). Experimental results on public dataset—UCSD PED2 show that the performance of our method outperforms the existing algorithms in abnormal events detection and localization. Furthermore, our approach can classify abnormal events.

A fast reading algorithm for pointer

instrument based on scan line processing

#br#

KONG Rui1,2，JIE Ying-da1，CHENG Lin1

2017, 39(07): 1309-1316. doi:

Abstract ( 176 )

PDF (751KB) ( 326 ) 　　

Review attachment

Concerning the illumination changes and the low speed of pointer instrument detection which make it difficult for the existing algorithms to achieve readings quickly and accurately, we propose a new algorithm to solve these problems based on scan line processing. The new algorithm firstly extracts the illumination invariant image by single scale Retinex. In order to reduce the pointer detection time, we adopt the scan line processing to extract the characteristic pixels for Hough transform. Finally, we use the dual threshold Hough transform to detect straight lines. Experimental results show that the proposed algorithm cannot only solve the problem of illumination changes and the low speed of readings, but also has high precision and adjustable speed.

Adaptive color image enhancement

based on multi-scale top-hat transform

AN Jing,ZHANG Gui-cang,LIU Yan-ni

2017, 39(07): 1317-1321. doi:

Abstract ( 190 )

PDF (564KB) ( 386 ) 　　

Review attachment

Given the problems of excessive enhancement and information loss caused by space change in the processing of color image enhancement, we propose a method to enhance color images in RGB space based on mathematical morphology top-hat algorithm, which utilizes standard deviation weight ratio of each component as the regulatory factor. Firstly, we extract the bright and dark features under multi-scale of R、G、B channels. Secondly, we use control factors to enhance valuable detail features. Finally, we combine the three components to get the enhanced target image. Experimental results indicate that the method can effectively enhance image contrast, avoid excessive enhancement, keep the brightness, and has good visual effect.

User-defined space situation visualization

expression based on Mashup

LU Wan-jie，LAN Chao-zhen，SHI Qun-shan，L Liang

2017, 39(07): 1322-1327. doi:

Abstract ( 173 )

PDF (696KB) ( 352 ) 　　

Review attachment

Current space situation visualization program cannot achieve user-defined expression according to fast changing missions and data. In the field of Web application, Mashup can integrate a number of different sources of applications supporting Web API by certain mode, which can generate new Web applications, and make use of the contents of the data retrieved from outside to create new services. Based on Mashup, we design a user-defined space situation visualization expression. We can achieve internal Widgets data interaction through the common Widget language and the visual widgets of 3D and 2D situational information, and display situational information synchronously. We manage the situation information display through the network layer management, thus realizing subscription and the release of events. Experimental results show that the user-defined space situation visualization expression based on Mashup can quickly adapt to rapid changing missions and data.

Path planning of mobile robots based on

improved artificial potential field method

SONG Jian-hui,DAI Tao,LIU Yan-ju

2017, 39(07): 1328-1332. doi:

Abstract ( 253 )

PDF (710KB) ( 412 ) 　　

Review attachment

Path planning of mobile robots based on the traditional artificial potential field method has the problems of non-reachable targets with nearby obstacles and local minimum points. We analyze the cause of this problem, and propose a path planning algorithm for mobile robots based on the improved artificial potential field. In this algorithm, the relative distance between the robot and the target is intro-duced into the repulsion function, and the robot can walk out of local minimum points and reach the tar-get point by establishing a virtual target point and isolating the original target point. Simulations verify the effectiveness of the improved algorithm.

A global stereo matching algorithm based on mean-shift

WANG Zhao-yue,CHEN Li-fang

2017, 39(07): 1333-1337. doi:

Abstract ( 127 )

PDF (498KB) ( 307 ) 　　

Review attachment

We propose a global stereo matching algorithm based on mean shift image segmentation to improve image global stereo matching for its high accuracy but large calculation. Firstly, we use the mean shift algorithm to segment the original image to get the number of homogeneous regions and their labels. When calculating matching cost, we choose proper pixels according to the segmentation region of pixels, which can improve the computation speed of matching cost. Secondly, before calculating the cost aggregation, we use the
K-means algorithm to cluster the pixels according to the number of homogeneous regions K which is obtained by the mean shift algorithm before. This can improve the accuracy and speed of stereo matching. Finally, we utilize the TRW-S belief propagation algorithm to solve the energy minimization problem. Experimental results show that compared with pure global stereo matching, the proposed algorithm can improve the stereo matching accuracy and speed obviously.

Current Issue

Author center

Review center

Online journal