Computer Engineering & Science

Behavior-aware memory scheduling for GPGPU applications

LIU Zi-jun1，HE Yan-xiang1,2，ZHANG Jun1,3，LI Qing-an1,2，SHEN Fan-fan1

2017, 39(6): 1011-1021. doi:

Abstract ( 421 )

PDF (1614KB) ( 728 ) 　　

Review attachment

As general purpose computing graphic units are widely used in high-performance computing, a new concurrent execution model is proposed, under which the current memory scheduling policy is unable to achieve maximum memory throughput. We characterize different memory access behaviors of applications in the concurrent kernel execution on a single GPU platform, analyze the unbalanced performance loss across them, and propose a behavior-aware memory scheduling policy for GPGPU applications. Different priority scheduling methods are employed to exploit the advantages of application types. Experimental results show a significant improvement on the unbalanced performance loss among different types of applications. Averaged memory system throughput and fairness across all benchmarks are improved by 9.7% and 15.0% respectively over the baseline architecture.

Porting and optimizing of NAMD

on SunwayTaihuLight System

YAO Wen-jun,CHEN Jun-shi,SU Zhi-chao,YU Yang,LIAO Chen-zhi,AN Hong

2017, 39(6): 1022-1030. doi:

Abstract ( 691 )

PDF (908KB) ( 715 ) 　　

Review attachment

Nanoscale molecular dynamics (NAMD) is an open source free molecular dynamics simulation software based on the Charm++ parallel programming model, which can quickly simulate millions of atomic-level macromolecules in large-scale parallel computer system. SunwayTaihuLight System is a supercomputer developed independently by China, which has a theoretical peak performance of 125.4 Pflop/s with 10 million cores，and the LINPACK efficiency is no less than 70%. The NAMD partitions the atoms in the space, divides the force in the calculation, fully exposes the parallelism of the single-step simulation, and controls the load balance through the CHARM++. In response to the characteristics of NAMD simulation calculation, we port and optimize NAMD calculation code, allowing it to run better on the supercomputer SunwayTaihuLight System. After optimization, the performance is improved by nearly 20 times, and the performance of the single core group is 3 times higher than Intel XeonE5-2650V2. Scalability at present parallel degree reaches about 3.25 millions of cores, which makes a breakthrough of a million of cores.

Improving the feasibility of task-sets based on

modified generalized scheduling algorithm

LI Rui,LI Zhi-ze

2017, 39(6): 1031-1041. doi:

Abstract ( 439 )

PDF (730KB) ( 530 ) 　　

Review attachment

Real-time systems require the task to get the correct results before its deadline in the worst-case. If the deadline is exceeded, the task is considered a wrong behavior. Improvement in task scheduling analysis and the task-sets schedulability is particularly important. Generalized scheduling can integrate the advantages of fixed priority scheduling, prevent tasks from unnecessary preemption, reduce additional memory usage, and improve the schedulability of task-sets. But the method of task schedulability analysis is too rough, which can impact worst-case response time analysis of the task and reduce the task-sets' schedulibility. For these problems, we re-establish a task model based on generalized scheduling by increasing the number of task's running stage and propose a method for improving the schedulability by assigning task's preemption threshold, adjusting preemption thresholdand length of task running stage and optimizing the block tolerance of the task. Finally, experimental results show that the proposed scheduling algorithm can effectively improve the feasibility of the task-sets.

Distributed SVM parameter optimization based on Hadoop

WU Yun-wei,NING Qian

2017, 39(6): 1042-1047. doi:

Abstract ( 415 )

PDF (694KB) ( 555 ) 　　

Review attachment

The classification and prediction accuracy of an algorithm are directly influenced by the choice of parameters, and among the methods of parameter selection, global grid search has obvious advantages, such as reliable and simple calculation, and obvious optimization effect, which are suitable for engineering operations that have high reliability requirement, for instance, parameter optimization of the fault pattern recognition algorithm in fault diagnosis of system. However, the global grid search is time-consuming in the search process, therefore there is still a constraint on use, especially for the system which has high real-time requirement. Using the global parameter optimization of support vector machine as a case, Hadoop platform is used for distributed parameter optimization in order to overcome the disadvantage of grid search. With HDFS, the parameters can be automatically divided into calculation nodes. We establish the distributed parameter optimization model by using the MapReduce computing framework, then conduct model training and prediction as well as parameter optimization. Experimental results show that the optimization efficiency is improved without reducing algorithm performance.

Integrity check of cloud storage threat model based

on pseudo random bilinear mapping

ZHANG Zhi-xin1,WANG Chun-dong2,JIANG Shu-hao1

2017, 39(6): 1048-1055. doi:

Abstract ( 338 )

PDF (1634KB) ( 582 ) 　　

Review attachment

User files are not stored locally in the application of cloud storage, so the key problems include file security, data confidentiality and data robustness. Firstly, according to the security problems of the existing literature, which does not consider data robustness for data recovery, we construct a threat model by using pseudo random bilinear mapping. Secondly, the interface file block structure is prepared, and an integrity check scheme is designed according to the relevant literature algorithm, which can realize the implementation of multi key server security erase code storage system algorithm, and the calculation complexity of the algorithm is given. Experimental results show that the proposed integrity check scheme can achieve a large probability of successful data retrieval.

Real time similarity of streaming time sequence

QU Zhen-xin，WANG Hong-yu

2017, 39(6): 1056-1062. doi:

Abstract ( 380 )

PDF (755KB) ( 571 ) 　　

Review attachment

Although the dynamic time warping algorithm is suitable for measuring time sequence similarity, streaming time sequence has a large quantity of sequences, potential infinite length, and requirement for high real-time performance in the big data background, thus facing problems of simple algorithm and complex computation. We propose a new streaming dynamic time sequence algorithm according to the features of streaming time sequence based on the Spark calculation platform, which can calculate the approximate value of dynamic time sequence in real-time, and has controllable error, good stability, and ability of processing big data. Experimental results verify its feasibility and stability.

A fault-tolerant and verifiable data search

scheme in the e-health environment

AO Zhang-heng，ZHANG Ying-hui，ZHENG Dong

2017, 39(6): 1063-1069. doi:

Abstract ( 327 )

PDF (597KB) ( 488 ) 　　

Review attachment

With the rapid development of e-health, medical institutions need to spend a lot of resources managing large electronic data, meanwhile it is difficult to share data among these institutions. We present a cipher-text search scheme used in the e-health environment. The scheme supports multiple fault-tolerant keywords and verifiable cipher-text search and achieves a unified and effective data management. The fault-tolerant multi-keyword search scheme is based on the fuzzy extractor, which enhances the effectiveness of search and practical application. Based on the bilinear cumulative tree data structure, the verifiable mechanism can provide reliability verification for the search results. Security analysis shows that the scheme meets the requirements of confidentiality and privacy of queries, and the search performance analysis shows the effectiveness of the multi-keyword search scheme.

Effect of distribution of plaintext on DPA in WSN nodes

LIU Yong-chang1,2,LI Xiang-yu1,2

2017, 39(6): 1070-1078. doi:

Abstract ( 308 )

PDF (751KB) ( 487 ) 　　

Review attachment

Cryptographic modules in wireless sensor network (WSN) nodes often face side channel attack threat. However, since the data encrypted in WSN nodes comes from natural physical signals, the distribution of plaintext will follow their natural properties, in all probability instead of being uniformly random. In order to explore the success rate of differential power analysis (DPA) when applied to nonuniform-distributed data from real sensors, we theoretically and experimentally analyze the relationship between the distribution of plaintext and the success rate of DPA on block ciphers. It turns out that the success rate of DPA targeting the first round is negatively correlated to the Cramer-von-Mises distance between the plaintext's distribution and the uniform distribution. The conclusion infers that if the attacker can construct plaintext, the uniform data makes the biggest success rate, and if the attacker only has access to real data, attacking the last round is better than attacking the first round.

A multi-rate digital intermediate frequency system

CAO Jian-fei1,WEN Shuang-chun1,LIU Yu2,WEN Yan-dong1

2017, 39(6): 1079-1086. doi:

Abstract ( 316 )

PDF (1790KB) ( 542 ) 　　

Review attachment

As the mobile communication technology develops rapidly, traditional communication devices based on proprietary hardware have limited function and are hard to upgrade, which cause a compatibility issue for a variety of communication protocols and software versions. Software defined radio technology has the advantage of flexibility and openness, and can solve these problems easily. We present a multi-rate digital intermediate frequency (IF) system implemented by FPGA. The system is designed in a full duplex mode and supports multi-rate transceiver; it can also be configured according to specific applications in order to achieve a variety of data conversion rates. The system includes a mixer module, a sample rate conversion (interpolation and decimation) module, and a low-pass filter module. It realizes 32, 96, 480 times down/up sampling and 240kHz down conversion process. The spurious signals are effectively suppressed while the signal stopband attenuation can reach 60dB. With flexible system configuration, the module is optimized and reused, which can save hardware cost and improve the performance.

A high dimensional global optimization method

for antenna design based on Kriging model

CHEN Xiao-hui，PEI Jin-ming，GUO Xin-xin

2017, 39(6): 1087-1091. doi:

Abstract ( 354 )

PDF (543KB) ( 542 ) 　　

Review attachment

Traditional antenna optimization designs need numerous simulation trials of different parameter combinations to reach the optimum, which leads to low efficiency in solving high dimensional antenna design and optimization problems. To address this issue, we design an initial Kriging model by using a few uniformly distributed sampling points and their simulation data. During the optimization iterations, the population of each generation is comprised of individuals with high fitness as well as individuals with high diversity. The optimal individual is selected according to its responses and uncertainty predicted by the Kriging model. Electromagnetic simulations are conducted for this individual, and the results are used to update the Kriging model. This algorithm is applied to optimize the resonant frequencies of an E-shaped antenna with 6 variables. Compared with other optimization methods, the number of EM simulation is reduced by about 80%.

A high precision TDOA and FDOA estimation

method for CDMA signals

HAN Yu

2017, 39(6): 1092-1096. doi:

Abstract ( 371 )

PDF (523KB) ( 512 ) 　　

Review attachment

Given the overlap in time-frequency domain of code division multiple access (CDMA) multi-user signals, we propose a novel estimation method for time difference of arrival (TDOA) and frequency difference of arrival (FDOA). Based on the acquiring and dispreading of spread spectrum signals, with short-term signal samples and limited computation, the method adopts only two iterations of time-frequency searching to separate signals and estimate time frequency difference, and then uses the time domain and frequency domain interpolation to further improve estimation accuracy. Simulation results show that the presented algorithm can effectively improve the estimation accuracy of time-frequency difference of CDMA signals and lower the computational complexity in comparison with the cross ambiguity function correlation method.

Alert classification based on cost

sensitive neural networks

PAN Zhi-hui,YANG Dan,ZHANG Xiao-hong,XU Ling

2017, 39(6): 1097-1103. doi:

Abstract ( 354 )

PDF (695KB) ( 482 ) 　　

Review attachment

Static analysis tools can help developers locate potential code errors in the early phase of development. However, studies show that such tools always report a large number of alerts, and most of them are meaningless false ones. To enhance the availability of static analysis tools, researchers divide alerts to actionable and unactionable alerts using statistics and machine learning techniques. These classification techniques do not consider the class imbalance problem caused by false positives and the unequal cost of different misclassifications. Aiming at these problems, we apply the BP neural networks and cost sensitive neural networks based on over sampling, threshold moving and under sampling techniques to classify alerts respectively. Experimental results show that, compared with BP neural networks, the cost sensitive neural networks techniques can on average increase actionable alert recall rate by 44.07%. And when the cost of misclassification of an actionable alert is above a certain value, cost sensitive techniques can have a lower classification cost.

PPIBF: A privacy preservation invertible Bloom filter

XIE Kun，SHI Wen

2017, 39(6): 1104-1111. doi:

Abstract ( 472 )

PDF (1086KB) ( 542 ) 　　

Review attachment

The Bloom filter is used in wireless sensor networks due to its feature of space efficiency. In order to support list operation in sink nodes, all elements must be listed by the Bloom filter. In the existing work, only the invertible Bloom filter can list all elements. In order to protect the privacy of sensing information transmission, based on the homomorphic encryption function, we propose a privacy preservation invertible Bloom filter (PPIBF) and design its operations such as insert, aggregation and list. The PPIBF's aggregation operation can implement aggregation of multiple encrypted PPIBFs without decrypting the cipher texts, thus ensuring that the message transmitted in the network cannot be leaked when intermediate nodes are under attacks. Detailed security analysis and calculation analysis show that the proposed PPIBF is an efficient algorithm to protect information.

A projection-based estimation approach to software usability

YUE Chuan

2017, 39(6): 1112-1117. doi:

Abstract ( 361 )

PDF (486KB) ( 502 ) 　　

With the continuous development of IT industry, users and experts have paid more and more attention to software usability which determines the competitiveness and user satisfaction of software. So how to quantify and evaluate comprehensively the software usability is an important job. We propose a software usability evaluation method based on projection and the intuitionistic fuzzy theory. Firstly, we derive evaluation matrices based on user survey, and establish the ideal decision of evaluation matrices. Secondly, the projections of evaluation matrices on ideal decision are given based on group decision-making. Then, we make a ranking of evaluated software based on their projections. Finally, a practical evaluation example demonstrates the feasibility and practicability of the proposed method. The results show that the model is effective for evaluating software quality.

An FPGA-based parallel RICE decoder

TAO Wen-ze1,2，WEI Hong-wei1，ZHANG Hong-qun1

2017, 39(6): 1118-1125. doi:

Abstract ( 442 )

PDF (1259KB) ( 531 ) 　　

Review attachment

The RICE algorithm is widely used in the lossless compression system. Since it adopts the variable-length adaptive entropy coding, it’s necessary to make bit-wise judgment and analysis in the compressed stream when decoding. However, this makes it difficult to achieve high speed decompression. Existing RICE decoding implementation methods have unsatisfactory performance in decoding speed and versatility. Given the characteristics of the adaptive entropy coding in the RICE algorithm, we propose a parallel RICE decoding structure based on finite state machine (FSM) and look up table (LUT), which can perform 8-bit width parallel decoding on the FPGA at the highest speed of 176 MB/s. And meanwhile, the decoding structure is suitable for the case where the coding parameter k is changeable, and hence it has strong versatility.

Tongue spots and petechiae recognition and

extraction in tongue diagnosis images

WANG Sheng1，LIU Kai-hua1，WANG Li-ting2

2017, 39(6): 1126-1132. doi:

Abstract ( 463 )

PDF (867KB) ( 698 ) 　　

Review attachment

Tongues spots and petechiae are important tongue patterns in the computer tongue diagnosis system. We propose a method to recognize and extract spots and petechiae in tongue images based on blob detection, support vector machine (SVM) and k-means clustering. Firstly, we apply the SimpleBlobDetector algorithm to detect blobs in tongue images. Secondly, we obtain the characteristic values of blob number, size and distribution to generate the feature vector. Thirdly, we utilize the SVM classifier to recognize tongues with spots or petechiae. The detection of spots or petechiae also bases on blob detection. Blob detection result is clustered into several groups by using k-means clustering after extracting color features. To extract the spots or petechiae, we define a discriminant function based on weighted color space distance, compare the clustering results with the former blob detection results, and achieve a binary classification of clustering groups. The positive class is the extraction results. Experimental results show that the recognition accuracy can reach 97.4%, the false alarm rate is 6.0% and the missing alarm rate is 10.1%. The results also verify the availability and application value of our method.

A nonlocal means filter for images with salt-and-pepper noise

XU Guang-yu,JIANG She-xiang

2017, 39(6): 1133-1140. doi:

Abstract ( 404 )

PDF (865KB) ( 546 ) 　　

Review attachment

According to the fact that the nonlocal means (NLM) method cannot adequately remove noise from the images corrupted by salt-and-pepper noise, we extend the NLM to remove salt-and-pepper noise by introducing the noise detection results. At the noise detection stage, we divide pixels into two categories: noisy and noise-free pixels, depending on two extreme values Lmin and Lmax. At the filtering stage, noise-free pixels remain unchanged, while for each noise pixel, if the adaptive filtering window does not contain any noise-free pixel, we regard the current noise pixel located in image uniform regions composed of noise-free pixels with the same gray value Lmin or Lmax. And then the calculated statistics is used as the restored value. Otherwise, we employ the improved NLM filter for noise removal. The joint noise detection mask in the proposed method can avoid the influence of noise pixels on calculating similar weights in the presence of noise pixels, and only noise-free pixels are used for the weighted average. In addition, the iterative filtering scheme is used to remove noise of high-density. Experimental results demonstrate the effectiveness of the proposed filter even though its computational complexity is still high.

Neighbor propagation and shape context based

MEAP for moire images automatic classification

JIANG Ming1,CHEN Lei-lei2,GE Hong-wei2,SU Shu-zhi2

2017, 39(6): 1141-1148. doi:

Abstract ( 333 )

PDF (832KB) ( 544 ) 　　

Review attachment

Moire is a unique treasure of Chinese ancient decorative patterns. Curved moire images are an important branch which not only has very high artistic value, but also plays a far-reaching role in the enlightenment on contemporary art design practice. Therefore classifying and analyzing curved moire images to find the artistic connotations and modeling techniques has great significance in art design and clustering research. Moire patterns are changeful and complicated, so the manual classification of moire patterns is very inefficient. In order to solve this problem, we propose an adaptive threshold neighbor propagation based multi-exemplar affinity propagation algorithm (ANP-MEAP), which combines shape context features to classify moire patterns automatically. Experimental results verify the feasibility and superiority of the automatic clustering algorithm. Moreover, it also has certain significance for the clustering and analysis of other traditional art patterns.

Compound-filled path generation and optimization for FDM

FENG Guang-lei,LIU Bin，CHEN Hui-hui

2017, 39(6): 1149-1154. doi:

Abstract ( 491 )

PDF (673KB) ( 571 ) 　　

Review attachment

Fused deposition modeling (FDM), a technique of rapid prototyping, is based on filling to process the cross-section of each part. Choosing a correct algorithm for filling path planning is very important because the filling path directly influences the quality and processing efficiency of each part. Complex scanning, an algorithm recently used for path planning, attracts more and more scholars’ attention due to its advantages. Based on the advantages of fractal and offset fillings, we present a complex filling optimization method. In order to reduce the switching frequency and stringing nozzles to ensure the accuracy and strength of molded parts, we use a divide and conquer algorithm to achieve optimal path generation, and it has been successfully used in actual processing.

A hardware resource allocation method for multi-antenna

ground station based on improved genetic algorithm

#br#

ZHANG Peng1,2，FENG Xu-xiang1，GE Xiao-qing1

2017, 39(6): 1155-1163. doi:

Abstract ( 352 )

PDF (874KB) ( 580 ) 　　

Review attachment

The hardware resource allocation of the multi-antenna satellite ground station is an issue of combinative optimization based on constraint satisfaction. According to the analysis of task execution time, the time-window of the ground station, and the receiving capacity and the link constraints of the equipment, we establish an allocation model for the hardware resource allocation issue of the multi-antenna ground station. To maximize weighted task execution time, the scheduling algorithm improves related operators based on the classical genetic algorithm. We use the depth-first search algorithm in the progress of genetic variation to identify the optimized resource allocation method for the individual chromosome, and meanwhile the heuristic information retrieving is implemented to optimize the search process. Simulation results validate that the proposal is feasible and effective.

Current Issue

Author center

Review center

Online journal