Computer Engineering & Science

Parallel implementation and optimization of a

large scale ocean data assimilation algorithm

WAN Weiqiang,XIAO Junmin,HONG Xuehai,TAN Guangming

2019, 41(05): 765-772. doi:

Abstract ( 313 )

PDF (792KB) ( 535 ) 　　

Review attachment

Ocean data assimilation is an effective method to integrate ocean observation data into the ocean numerical model. Assimilated ocean data is closer to the real situation of the ocean, so it is of great significance for human to understand and study the ocean. We design a general parallel implementation method for ocean data assimilation based on the domain decomposition strategy. We further propose a new parallel algorithm based on IO proxy. Firstly, IO proxy processes are in charge of parallel reading of data. Then, they split data into many blocks, and send different blocks to corresponding computation processes. After completion of local data assimilation, IO proxy processes collect local assimilation results from computation processes, and write them into the disk. The main advantage of this parallel method is that IO proxy processes takes charge of IO, rather than allowing all processes to participate in IO (direct parallel IO). This can prevent a large number of processes from accessing the disk simultaneously, thus effectively avoiding the waiting caused by processes queuing. Test results based on Tianhe2 clusters show that, for the assimilation of data with 1degree resolution, when there are 425 cores, the total running time of the proposed parallel implementation is 9.1s, which is nearly 38 times faster than that of traditional serial programs. In addition, for the assimilation of data with 0.1 degree resolution, the parallel assimilation algorithm using IO proxy still has a good scalability on 10,000 cores, and its IO time can be limited to at most 1/9 of the direct parallel IO time.

A thread-level SLO-aware I/O framework for virtualization

LIU Ximing，LI Yuxuan，GONG Xiaoli，ZHANG Jin

2019, 41(05): 773-779. doi:

Abstract ( 181 )

PDF (808KB) ( 262 ) 　　

Review attachment

Virtualization technology takes an important place for the cloud service provider (CSP) because of the rapid development of cloud computing. In order to obtain more profits, the CSP need to maximize hardware performance as much as possible while ensuring good user experience. By leveraging information such as the priority and importance of I/O requests, researchers have proposed many methods to improve program performance in Linux kernel. However, the information in virtual machines can get lost during the transfer to the host. We therefore propose an I/O assurance framework based on servicelevel objective (SLO). We firstly explore the reasons for the loss of information such as I/O request priority, and then present the key issues to be addressed for information delivering. Our framework transmits the SLO information of the threads in the virtual machine to the host machine successfully by expanding Linux kernel, the virtio protocol and KVM I/O virtualization program QEMU. What's more, we implement a scheduler based on SLO. Experimental results verify the feasibility of the proposed framework. The thread with highest priority has a throughput up to 260 KB/s while the throughput of the thread with lowest priority is 10 KB/s. we prove that the SLO information delivered by the proposed framework plays a positive role in the scheduling of the scheduler in the host machine.

A survey of frequent itemset mining

algorithms for sparse dataset

XIAO Wen，HU Juan

2019, 41(05): 780-787. doi:

Abstract ( 217 )

PDF (831KB) ( 441 ) 　　

Review attachment

Frequent itemset mining (FIM) is one of the most important data mining tasks. The characteristics of datasets have a significant impact on the performance of FIM algorithms. In the era of big data, sparseness, a typical feature of big data, brings severe challenges to the performance of traditional FIM algorithms. Aiming at the problem of how to perform FIM in sparse datasets efficiently, based on the characteristics of sparse datasets, we analyze the main effects of sparse datasets on the performance of three FIM algorithms, summarize current sparse datasets FIM algorithms, discuss the optimization strategies used in these algorithms, and analyse the performance of the typical sparse datasets FIM algorithms through experiments. Experimental results show that the pattern growth algorithm with pseudo-structural strategy is most suitable for FIM in sparse datasets and outperforms the other two algorithms in both operation time and storage space.

Cache management and memory scheduling

based on inter-warp heterogeneity

FANG Juan,WEI Zelin,YU Tingwen

2019, 41(05): 788-795. doi:

Abstract ( 17 )

PDF (1263KB) ( 36 ) 　　

Review attachment

All threads within a warp execute the same instruction in the lockstep in a GPU. Memory requests from some threads are served early while requests from some other threads have to experience long time latency. Warp cannot execute the next instruction before the last request is served, which can cause memory divergence. We study the inter-warp heterogeneity in GPU, implement and optimize a cache management mechanism and a memory scheduling policy based on interwarp heterogeneity, which can reduce the negative impact of memory divergence and cache queuing latency. Warps are classified according to the hit rate of L2 cache to drive the following three components: (1) A warp-type based cache bypassing mechanism to bypass the L2 cache for warps with low cache utilization; (2) A warptype based cache insert/improvement policy to prevent the data from warps with high cache utilization being cleared prematurely; and (3) A warp-type based memory scheduler to prioritize requests received from warps with high cache utilization and the requests from the same warp. Compared with the baseline GPU, the cache management mechanism and the memory scheduling policy based on inter-warp heterogeneity can speed up 8 different GPGPU applications by 18.0% on average.

Towards convolutional neural network acceleration

and compression via K-means algotrithm

CHEN Guilin,MA Sheng,GUO Yang,LI Yihuang,XU Rui

2019, 41(05): 797-803. doi:

Abstract ( 232 )

PDF (640KB) ( 277 ) 　　

Review attachment

In recent years, the field of machine learning develops rapidly. As a typical representative, neural networks are widely used in various industrial fields, such as speech recognition and image recognition. As the environment of application becomes more complex, the accuracy requirements become higher, and the network scale becomes larger. Large-scale neural networks are both computationintensive and storage-intensive. The convolutional layer is computationintensive and the fully connected layer is storage-intensive. The processing speed of the former cannot keep up with its memory access speed, while the access speed of the later cannot keep up with its processing speed. Based on the confidence interval of the prediction accuracy of neural network training, we propose a neural network acceleration and compression method using the K-means algorithm. We reduce the amount of calculation by compressing the input feature map during the convolution process; and reduce the amount of storage by compressing the weight of the fully connected layer. The proposed method can greatly reduce the calculation amount of a single convolution layer of AlexNet network by up to 100 times. By adding appropriate K-means layer, the speedup of the processing time of the whole network can reach 2.077, and the network compression can reach 8.7%.

A clustering-based annular k-nearest neighbor algorithm

KUANG Zhenxi，WU Jigang，LI Jiaxing

2019, 41(05): 804-812. doi:

Abstract ( 249 )

PDF (722KB) ( 280 ) 　　

Review attachment

The traditional knearest neighbor (kNN) algorithm has been widely used in data classification, but its redundant distance computation consumes much time for highdimensional data processing. Meanwhile, for the classification algorithms based on landmarkbased spectral clustering (LCkNN and RCkNN), the nearest neighbor points of the current testing point are partially missing, which results in low accuracy. Aiming at the above problems, we propose a clusteringbased annular knearest neighbor algorithm (AkNN). Based on the traditional clustering algorithm, the proposed algorithm collects data points with high similarity in the training set into a cluster. Then an annular filter centered on the testing point is set. Finally, the points in the filter are classified by the kNN algorithm. We can choose a clustering algorithm freely according to practical needs. The performance of the proposed algorithm is tested on 6 open data sets in the UCI database. Experimental results show that the AkNNE and AkNNH algorithms reduce calculation by 51% on average compared with the kNN algorithm, and enhance the accuracy by 3% on average compared with the LC-kNN and RC-kNN algorithms. In addition, the proposed algorithm is still valid in 10000 dimensional datasets.

A provable secure from CLPKC to IDPKC

online/offline heterogeneous signcryption scheme

ZHANG Yulei1，LIU Xiangzhen1，ZHANG Yongjie2，LUO Guangping1，CHEN Wenjuan1，WANG Caifen1

2019, 41(05): 813-820. doi:

Abstract ( 163 )

PDF (513KB) ( 293 ) 　　

Online/offline signcryption can not only enhance the computation efficiency of mobile devices, but also ensure the confidentiality and unforgeability of data simultaneously. Under the heterogeneous cryptography environment, the online/offline heterogeneous signcryption between different public key cryptographies should be taken into consideration. We define the online/offline heterogeneous signcryption security model from certificateless public key cryptography (CLPKC) to identitybased public key cryptography (IDPKC), and propose a concrete online/offline heterogeneous signcryption scheme from CLPKC to IDPKC. When performing signcryption operation, the scheme does not require any billinear pairing operation. Besides, it only needs two billinear pairing operations when performing unsigncryption. Compared with existing online/offline heterogeneous signcryption schemes, the proposed scheme has no certificate management problem while having equivalent efficiency, and thus it is suitable for mobile devices with limited computing power. The security proof shows that the scheme can meet the need of confidentiality and unforgeability. We analyze the efficiency of of online/offube signcryption and unsigncryption of the proposed scheme in the simulation step. The scheme adopts independent system parameters, which makes it more suitable for practical application environments.

Research and realization of time synchronization

in high-speed frequency hopping networks

SHI Liping1，ZHAO Jianhong2

2019, 41(05): 821-827. doi:

Abstract ( 171 )

PDF (1069KB) ( 298 ) 　　

Aiming at the large error of keeping time between communication nodes in wireless communication networks, we propose a twoway time synchronization algorithm based on timingsync protocol for sensor network (TPSN). We design a miniaturized and modular highprecision time synchronization system which can be used in frequency hopping synchronization and time slot synchronization under the TDMA network architecture. Firstly, we set the node which has not received any call within 10 seconds as the host. Secondly, the host responds to only one slave within 1 second. Finally, the synchronization error between nodes is calculated by recording the sending time of masterslave node, deducting or compensating the logic processing delay, RF transmission delay, and the frequency difference between the two ends of the calibration. Simulations under different conditions show that the synchronization error between nodes can be as accurate as about 30 nanoseconds. This indicates that the solution's time synchronization accuracy is high, and the error is small, thus it can meet the needs of tactical objectives such as collaborative detection, collaborative attacks and collaborative jamming between nodes in the wireless communication network.

A trajectory (k，e)anonymous algorithm

against trajectory similarity attacks

JIA Junjie，HUANG He

2019, 41(05): 828-834. doi:

Abstract ( 158 )

PDF (568KB) ( 230 ) 　　

Aiming at the problem of trajectory privacy leakage caused by the high similarity between the anonymous centralized trajectories, we propose a trajectory (k，e)anonymous algorithm to resist trajectory similarity attacks. In the preprocessing process, the algorithm adopts trajectory synchronization to reduce information loss. In clustering process, the trajectory slope is regarded as the sensitive value of trajectory data, and at least k trajectories with different trajectory slopes are selected to satisfy the trajectory k-anonymity. To prevent the privacy leakage caused by the high slope similarity of trajectories in the set, the trajectory slope difference value in each class should be at least e. Experimental results show that the proposal can effectively resist trajectory similarity attacks, reduce information loss while enhancing the availability of trajectory data, and achieve better trajectory privacy protection.

Automatic generation of SystemC code from Mediator

ZHANG Qi1,2,LI Yi1,SUN Meng1

2019, 41(05): 835-842. doi:

Abstract ( 211 )

PDF (521KB) ( 494 ) 　　

Mediator is a hierarchical componentbased modeling language that provides proper description of models by using automata and systems. By transforming the models described in Mediator to executable code, we can avoid errors caused by human negligence in the encoding process and enhance encoding reliability, and shorten model development cycle. We present a new automatic code generation tool, which can convert Mediator model to SystemC code. The tool is designed to generate models on specific platforms (SystemC in this case) into directly executable code without any manual adaption.We first analyze the syntax and semantics of Mediator, and select a proper SystemC organization. Then, we design generation rules for each component of the Mediator model, in which we mainly focus on analysis of the generation of types, the generation of transition statement，and the generation of synchronize statement. Finally, we build a power failure detection model by Mediator and convert it to SystemC executable code through our code generator, and it has been executed successfully in C++ environment.

Qsimulation: A tool for simulating quantum computation

DENG Xi，DENG Yuxin

2019, 41(05): 843-850. doi:

Abstract ( 343 )

PDF (659KB) ( 376 ) 　　

We explicitly introduce Qsimulation—a tool for simulating quantum computation on classical computers. The tool is composed of four main components: an imperative quantum programming language, a quantum computation interpreter, a graphic user interface for simulating the execution of quantum programs, and an error handling module. This design allows instructors and novices to design and test simple quantum programs and circuits.

Non-rigid medical image registration with bending

energy and corresponding constraints of landmarks

ZHU Tianyu，QUAN Huimin，LIU Guocai

2019, 41(05): 851-857. doi:

Abstract ( 175 )

PDF (930KB) ( 352 ) 　　

Non-rigid medical image registration based on mutual information often leads to unreasonable deformation due to the lack of geometric space constraints in images. We therefore propose a non-rigid medical image registration method combining bending energy and corresponding constraints of landmarks. In order to restrain the unreasonable deformation of soft tissues in medical images, two regular terms including bending energy penalty and Euclidean distance between corresponding landmarks are added in the mutual information registration target function. The image registration results of brain MRI, head and neck CT and chest CBCT show that this method can improve the quality of registration effectively.

A sea-surface target image segmentation algorithm

based on improved fuzzy C-means

REN Jia1,2，ZHANG Shengnan1，DONG Chao2，ZHAO Minjun1

2019, 41(05): 858-864. doi:

Abstract ( 187 )

PDF (1968KB) ( 318 ) 　　

We propose an image segmentation algorithm to solve the problem of fast image segmentation when surface unmanned vehicles perform target tracking and recognition tasks. Firstly, the algorithm uses the mean filtering algorithm to filter the RGB ocean background image, and uses its nonparametric property to obtain the clustering center of the image and the number of categories, which are used as the initialization parameters to perform fuzzy C-means clustering on the image. On this basis, the image is binarized by the Otsu method to realize target extraction. The BSDS500 standard dataset and marine background images are used to verify itse segmentation effect and efficiency. Comparison with the traditional fuzzy C-means algorithm, pulse coupled neural network algorithm, adaptive genetic algorithm and Markov random field algorithm, proves the effectiveness of the proposed algorithm.

An image foreground and background segmentation model

based on improved second order total generalized variation

KONG Xiaoran,ZHU Huaping

2019, 41(05): 865-872. doi:

Abstract ( 191 )

PDF (1115KB) ( 288 ) 　　

In order to segment an image into foreground and background and overcome the staircase effect often caused by the total variation (TV) segmentation model, we propose a new image foreground and background segmentation model based on the second order total generalized variation (TGV). To improve the quality of image segmentation, an edge indicator function is introduced to the regularization term of the second order TGV-based image foreground and background segmentation model, which can reduce the diffusion to preserve image edge features in image edge regions, and enhance the diffusion to remove impulse noise and overcome the staircase effect in image smooth regions. Then, in order to highlight the foreground information of the image, we mark the foreground information of the image with a rectangular frame and perform distance mapping on the pixels inside the frame, outside of the frame, and on the edges. In addition, according to the energy minimization principle, the distance mapping function is introduced into the data item of the second-order TGV model so that the total energy of the model can be smaller. Finally, we propose a primal-dual segmentation method to solve the proposed model effectively. Experimental results show that the proposed model can not only avoid staircase effect, but also keep the edge information of the image and reduce the total energy of the model. The segmented image has a better visual effect.

An improved UAV image stitching

algorithm based on AKAZE feature

SONG Wei1，WANG Yongbo1，ZHANG Peipei1,2

2019, 41(05): 873-878. doi:

Abstract ( 180 )

PDF (1402KB) ( 285 ) 　　

Aiming at the low time efficiency and poor effect of UAV image stitching, we propose an improved UAV image stitching algorithm based on AKAZE feature. In image matching phase, the MLDB descriptor is replaced by the BRISK descriptor. The scale information of feature points and root mean square error (RMSE) are taken as the constraint conditions to eliminate false matching points. In addition, the idea of parallel computation is applied to feature point extraction and feature descriptor calculation to improve the time efficiency of the algorithm. In image fusion stage, the RANSAC algorithm combined with the LM algorithm is used to calculate the monotone matrix so as to improve its precision. Finally, the multiband fusion algorithm is used to achieve image stitching. Experimental results show that the proposed algorithm can achieve more accurate and more precise matching results while improving the time efficiency, thus realizing fast and seamless stitching of UAV images.

An image reconstruction algorithm for

electrical capacitance tomography images

based on improved particle swarm optimization

YAN Chunman,LU Genyuan,ZHANG Daoliang,DONG Junsong

2019, 41(05): 879-884. doi:

Abstract ( 177 )

PDF (703KB) ( 372 ) 　　

The Landweber image reconstruction algorithm for electrical capacitance tomography (ECT) images combining with the particle swarm optimization (PSO) can improve image quality. However, the PSO is easy to fall into local optimum when it is used for image reconstruction. Aiming at the problem, we propose an image reconstruction algorithm based on the improved PSO and Landweber algorithm. The new algorithm adds the PSO with exponentially decayed inertial weights，and adds the exponentially decayed restraint factors in particle speed updating formula, so that the algorithm has strong global search ability in prophase and strong local search ability in later stage, which further optimizes the initial reconstruction result of the Landweber algorithm to improve the quality of reconstructed images. We compare the ECT image reconstruction results of our algorithm with those of the typical LBP, the improved Tikhonov iterative algorithm and Landweber image reconstruction algorithm, and the comparison results verify the proposal’s effectiveness. Experimental results also show that the new algorithm outperforms the other algorithms in terms of subjective and objective qualities of reconstructed images.

An outlier detection algorithm based on

improved OPTICS clustering and LOPW

XIAO Xue,XUE Shanliang

2019, 41(05): 885-892. doi:

Abstract ( 181 )

PDF (3555KB) ( 273 ) 　　

Aiming at the problems of the high time complexity and poor detection quality of current outlier detection algorithms, we propose a new outlier detection algorithm based on the improved OPTICS clustering and LOPW. Firstly, the original data set is preprocessed by the improved OPTICS clustering algorithm and the preliminary outlier dataset is obtained by filtering the reachability graph of clustering results. Then, we use the newly defined local outlier factor based on Pweight (LOPW) to calculate the degree of outliers of the objects in the primary outlier dataset. When distances calculated, the leaveone partition information entropy gain is introduced to determine the weight of features, thus improving the precision of outlier detection. Experimental results show that the improved algorithm can improve the computational efficiency and the precision of outlier detection.

A MOOCs dropout rate prediction

method based on deep learning

SUN Xia，WU Nannan,ZHANG Lei,CHEN Jing，FENG Jun

2019, 41(05): 893-899. doi:

Abstract ( 350 )

PDF (780KB) ( 541 ) 　　

In recent years, massive open online courses (MOOCs) have received extensive attention. Due to the unreasonable learning styles of learners, their interest in learning is declining and some learning effect is not good, so the dropout rate of MOOCs is very high. In order to solve this problem, we automatically extract continuous features over a period of time from learners’ learning activity logs, and establish a MOOCs dropout prediction model by taking learners’ behavior features as independent variables. Experiments on the KDD Cup 2015 dataset show that the dropout rate prediction model in the long shortterm memory in the convolutional neural network (CNN_LSTM) can help MOOCs curriculum teachers and designers track the learning states of course learners at different phases, and dynamically monitor the dropout behavior of different stages. The prediction accuracy of the model is high, so it can provide teachers with more reasonable guidance and advice on improving their teaching methods.

Popcorn algorithm: A self-adaptive algorithm for

adjusting population of offspring and optimizing step size

ZHAO Zhigang,MO Haimiao,WEN Tai,LI Zhimei,GUO Yang

2019, 41(05): 900-909. doi:

Abstract ( 185 )

PDF (829KB) ( 297 ) 　　

We present a new swarm intelligent optimization algorithm called popcorn algorithm. The popcorn algorithm learns from the advantage of the explosion mechanism of the fireworks algorithm and takes advantage of the individual particle’s fitness value in the optimization process to adjust the number of offspring dynamically. The better the individual particle’s fitness value is, the larger of the offspring population, and the more of the offspring searching in the vicinity of the individual particle. The algorithm adjusts the number of offspring dynamically to control the balance between local search and global search. In addition, it uses the memory mechanism of the particle swarm optimization algorithm as reference, and introduces the best individual particle and the best global particle to construct a new explosion radius, so that it can adjust the step size dynamically in the optimization process. The Gaussian perturbation is performed on the best global particle to increase the diversity of the population. Experimental results show that compared with other optimization algorithms such as the bat algorithm, standard particle swarm optimization algorithm and fireworks algorithm, the overall performance of the proposed popcorn algorithm is better.

A scene management method based on

task relevance in virtual maintenance system

CHEN Jingjie，LI Hao

2019, 41(05): 910-916. doi:

Abstract ( 121 )

PDF (905KB) ( 259 ) 　　

In order to solve the problem of slow loading speed and high memory uage when loading the scene and model at one time in the virtual maintenance training system of machine service, we propose a scene management method based on task relevance. The method uses the TFIDF algorithm to get the similarity of each task card in the virtual maintenance system and then classify them. The high similarity of work cards indicates that the correlation between virtual resources such as virtual maintenance scene, maintenance tools and maintenance objects is stronger. When loading scene resources and allocating memory, the virtual resources described by the task cards with a relevance greater than 68% are loaded and distributed by the buddy system. The task scenarios with relevance less than 42% apply for a memory in the buddy system and then the memory is divided into memory pools. The sub scenes with a task reference between 42% and 68% are managed by the double dynamic double linked list. The proposed method solves the problem that the traditional virtual maintenance training system has to face. That is there lacks maintenance resource related allocation management when loading resources. The allocation method has no taskspecific limitation, and avoids system allocation time for separate memory blocks. Experimental results show that the improved allocation method reduces memory usage by 17% and improves the frame rate by 17.57.

Current Issue

Author center

Review center

Online journal