Computer Engineering & Science

Performance modeling of multi-stream programming

mechanism on heterogeneous platforms

PENG Lin，ZHANG Peng，FANG Jianbin，HUANG Chun，TANG Tao

2019, 41(07): 1145-1154. doi:

Abstract ( 149 )

PDF (926KB) ( 310 ) 　　

Review attachment

Multi-stream programming mechanism can fully provide a variety of resource utilization methods such as pipelining and resource partitioning for heterogeneous many-core accelerators, but there is currently no effective guidance on how to choose effective resource utilization methods. Based on hStreams on heterogeneous manycore processor Intel MIC, we design a performance model for multi-stream program’s multihardware partitioning execution. Based on our performance model, we can identify the reasons for the performance difference of multi-stream programs under varied configurations, find out key factors that affect the performance, and provide a partitioning optimization strategy for multi-stream programs. In addition, it can also judge the effect of algorithm implementation. Our evaluation results show that the error between the estimated results of the performance model and the actual test results of multi-stream configuration is within 1%. Compared to the single-streamed version, our model also realizes a 5.83% performance improvement when guiding multi-stream programs of the dense matrix multiplication.

HPC cluster and low cost auto

scaling model based on public cloud

TIAN Yongjun,HE Wanqing,SUN Xiangzheng,YU Yang

2019, 41(07): 1155-1160. doi:

Abstract ( 200 )

PDF (674KB) ( 309 ) 　　

Review attachment

For many HPC users, computing cost is one important factor for whether moving workloads to the public cloud. Alibaba cloud provides “preemptible instance”. It is an on-demand instance to reduce the cost of using public cloud computing resources. The market price of “preemptible instance” fluctuates and it can be as low as 10% of “pay as you go instance”. And “preemptible instance” cannot be kept as long as users’ requirement, and be released due to datacenter scheduler or some other reasons, so it can be used in some stateless scenarios. On the public cloud, based on users’ application types, job submission patterns, performance requirements, timing and cost, we propose an auto scaling strategy on the public cloud for general HPC cluster schedulers, which can automatically deploy computing resources and control cost. HPC users only pay for what they want and what they use. Due to abundant resource types and resource rent models, and taking advantages of auto scaling service, “preemptible instance” and application checkpoint/restart, we can supply a low cost auto scaling model. When users submit jobs, they can set their expectation cost, and the auto scaling service will find the “preemptible instance” under this cost setting, and use checkpoint/restart technique to keep job running during computing resource exchanging. Finally, we verify the feasibility and effectiveness of our solution through LAMMPS and GROMACS applications.

Implementation and optimization of fast multipole

method on Sunway manycore processors

WANG Wu1，WANG Shuyang1,2，JIANG Jinrong1,MENG Hongsong3

2019, 41(07): 1161-1167. doi:

Abstract ( 272 )

PDF (779KB) ( 346 ) 　　

The fast multipole method (FMM) is a fast and efficient numerical algorithm for solving the Nbody problem and has various applications in cosmology and molecular dynamics. Sunway SW26010 is a heterogeneous manycore processor developed independently by China with 260 cores (4 core groups). We design and implement an FMM on SW26010 manycore architecture. We also systematically optimize the performance of kernel functions (especially for the most timeconsuming particle pair interaction), including asynchronous direct memory access (DMA), SIMD vectorization, loop unrolling and inline assembly tuning. Taking the particle pair interaction kernel as an example, the computational speed after optimization is about 400 times higher than the raw code running on the host core, and the floating-point performance on each core group is 250 GFLOPS, which is 32.5% of the theoretical peak performance.

Efficient and transparent CPU-GPU data

communication through partial page migration

ZHANG Shiqing，YANG Yaohua，SHEN Li，WANG Zhiying

2019, 41(07): 1168-1175. doi:

Abstract ( 211 )

PDF (1010KB) ( 278 ) 　　

Review attachment

Despite the increasing investment in integrated GPUs and nextgeneration interconnect research, discrete GPUs connected by PCI Express still dominate the market, and the management of data communication between CPUs and GPUs continues to evolve. Initially, the programmers control the data transfer between CPUs and GPUs explicitly. To simplify programming, GPU vendors have developed a programming model to provide a single virtual address space for “CPU + GPU” heterogeneous systems. The page migration engine in this model transfers pages between CPUs and GPUs on demand automatically. To meet the needs of high-performance workloads, the page size tends to be larger. Limited by low bandwidth and high latency interconnections, larger page migration has longer delay, which can reduce the overlap of computation and transmission and cause severe performance degradation. We propose a partial page migration mechanism that only transfers the requested part of a page to shorten the migration latency and avoid performance degradation of the whole page migration when the page becomes larger. Experiments show that the proposed partial page migration can well hide the performance overheads of the whole page migration when the page size is 2MB and the PCI Express bandwidth is 16GB/sec. Compared with data transmission controlled by the programmers, the whole page migration degrades the performance by 98.62 on average, while the partial page migration upgrades the performance by 1.29 on average. Additionally, we examine the impact of page size on TLB miss rate and the impact of migration unit size on execution time, enabling designers to make informed decisions based on this information.

FAE: An action extension mechanism for

FlowMod messages in OpenFlow protocol

XIE Ying,QIU Weihao,SUN Zhigang,QUAN Wei

2019, 41(07): 1176-1183. doi:

Abstract ( 207 )

PDF (956KB) ( 278 ) 　　

In SDN networks, the abstraction of the data forwarding plane is the precondition of the formulation of the southern interface protocols, such as OpenFlow. Currently, the standard OpenFlow protocol assumes that each flow table item of the data plane is composed of defined rules and corresponding action sets. FlowMod messages of the data plane carry these rules and action sets. However, with the continuous development of SDN application scenarios, the limited and predefined actions of the OpenFlow protocol make it difficult to meet some new configuration requirements, such as centrally controlled segment routing management configuration, SDPA configuration and so on. To improve the flexibility of action definition in SDNs, and remove the restrictions on actions, we firstly propose a new action extension mechanism, called flexible action extension (FAE). The FAE bases on FlowMod messages in OpenFlow, and can satisfy the needs for userdefined actions which can extend in a flexible and simple way. And then we implement the FAE useing open source Floodlight controller and FAST switch, and experiments show that the FAE can effectively support the extended actions in centrally controlled segment routing management configuration.

A dynamic resource allocation strategy

in mobile edge computing environment

#br#

ZHU Xinfeng1,ZHANG Zhihao1，WANG Yanling2

2019, 41(07): 1184-1190. doi:

Abstract ( 294 )

PDF (804KB) ( 421 ) 　　

In the era of explosive growth of communication equipment, mobile edge computing is a core 5G communication technique, and it is important to allocate resources reasonably. The idea of mobile edge computing is to sink cloud computing centers to the base station deployment (edge cloud) and to bring cloud computing centers closer to users to quickly solve the problem of computing resource allocation. However, compared with large cloud computing centers, the computing resources of the edge cloud are limited, and the traditional virtual machine allocation method cannot flexibly deal with the problem of computing resource allocation of the edge cloud. To solve this problem, we propose a dynamic computing resource and spectrum allocation algorithm (DRFAA) based on users’ comprehensive needs. It adopts the “divide and conquer” strategy, and simulates resources into "fluid" resources for allocation so as to seek for a larger throughput and a lower transmission delay. Simulation results show that the dynamic computing resource and spectrum allocation algorithm can effectively reduce the transmission delay between the user and the edge cloud, and also improve the throughput of the edge cloud.

A cycle diagnosis strategy for hypercube networks

CHEN Fang,ZHANG Qian

2019, 41(07): 1191-1196. doi:

Abstract ( 187 )

PDF (658KB) ( 264 ) 　　

Traditional fault diagnosis strategies and conditional diagnosis strategies for multiprocessor systems have been widely studied，however, they have not solved the problem of a large number of fault nodes in the system. Therefore, we propose a new cycle diagnosis strategy, which uses the cycle-partitioning method to diagnose Hamilton cycles and find out all the fault nodes in the system. We also give out the cycle diagnosis strategy and some important properties of hypercube networks. Meanwhile, we also propose a quick cycle diagnosis algorithm of hypercube networks to quickly locate all fault nodes in the system. Based on the above strategy and under the PMC model, the cycle diagnosis degree of an ndimensional hypercube network is (n2+n)/2 and the time complexity is

O(n), where n represents the number of processors in multiprocessor systems. Compared with traditional fault diagnosis strategies and condition diagnosis strategies for hypercube networks, the proposed cycle diagnosis strategy has the advantages of large diagnosis degree and low time complexity.

An energy consumption balanced clustering algorithm

for wireless sensor networks based on ant colony strategy

YU Xiaohui1，ZHANG Jing1,2,TAO Tao3,GONG Libo4,HUANG Yunming1,FU Tiewei1

2019, 41(07): 1197-1202. doi:

Abstract ( 275 )

PDF (523KB) ( 387 ) 　　

The problem of single control factor selection of cluster head in multihop routing can shorten the entire lifecycle of the wireless sensor network. To solve this problem, we construct the fitness function based on the residual energy, node degree and connection distance, and guarantee the optimal selection of the cluster head according to the value of cluster head evaluation functions. At the same time, the fitness factor and residual energy tradeoff factor are added to optimize the ant colony algorithm, which effectively controls the increase and decrease of pheromone in the complete path. We apply the algorithm to the multihop transmission of data between cluster heads to protect the low energy cluster head. It is beneficial to make each node's energy consumption close to the mean, and the network can monitor and transmit data more persistently. Experimental results show that compared with the LEACH and HEED algorithms, the proposed scheme is more effective in balancing energy consumption and prolonging life cycle.

A virtual hand grasping rule fusion strategy

ZOU Yu1,2，CHAO Jiangang1，LIN Wanhong1

2019, 41(07): 1203-1211. doi:

Abstract ( 128 )

PDF (887KB) ( 266 ) 　　

Virtual hand grasping rules are interaction rules for a virtual hand and the virtual object to perform grasping operations. It’s used to determine whether the virtual hand can successfully grasp the object. We study geometrybased virtual hand grasping rules and physicsbased virtual hand grasping rules respectively. The former rules are simple but the simulation effect is poor. The latter rules are complicated to calculate and difficult to implement real-time simulations. In view of the above problems, (1) we improve geometry-based virtual hand grasping rules, and formulate the gripping rules by the contact point position, the normal vector and the gripping’s surface normal vector to make the effect approach to force closure virtual hand grasping rules; (2) based on the invariability characteristic of the closed contact point and the normal vector in the force closure calculation, we avoid the nonlinear programming solution in the grasping operation by the internal force ratio, and realize real-time simulations in the grasping operation stage; (3) through the strategy of “initial grasping judgment based on geometric constraintforce closure calculation correction-internal force ratio for force closure calculation”, we realize real-time simulations of complete grasping operation. We design interaction experiments to verify that the proposed grasping rule can achieve high immersion and real-time grasping simulations, and it can be applied to simulation platforms such as virtual training and virtual assembly.

Optical remote sensing image classification

based on manifold learning

WANG Yunyan1,2，LUO Lengkun1，WANG Chongyang1

2019, 41(07): 1212-1219. doi:

Abstract ( 169 )

PDF (822KB) ( 326 ) 　　

With the rapid development and wide application of optical remote sensing image technology, accurate classification of optical remote sensing images has farreaching research significance. The high-dimensional features extracted by traditional feature extraction methods are mixed with much redundant information, and the classification process can lead to over-fitting. The traditional linear dimension reduction algorithm cannot maintain the internal structure of the original data, and is easy to cause data distortion. We propose an optical remote sensing image classification algorithm based on manifold learning. Firstly, the SIFT features of the image are extracted, and the manifold learning is applied to feature dimension reduction. Finally, the support vector machine is used for training and recognition. Experimental results show that the classification accuracy of glaciers, buildings and beaches is effectively improved on the experimental data of Satellite, NWPU and UC Merced, reaching about 85%. For remote sensing images of desert, rock and water, the classification accuracy is improved by about 10%. In summary, the data based on manifold learning can maintain the topological structure in the original high-dimensional space through the dimension reduction algorithm. Similar feature points can effectively aggregate, which prevents the "dimensional disaster", reduces the calculation amount and guarantees classification accuracy.

TC-Coons patch with shape parameters

ZOU Qian1,2，HAN Xuli1,2

2019, 41(07): 1220-1226. doi:

Abstract ( 196 )

PDF (953KB) ( 238 ) 　　

In order to flexibly control the shape of the bi-cubic Coons patch and improve the bicubic Coons patch, we construct a triangular mixing function group consisting of four functions with shape parameter λ, named TC-Hermite basis. The TC-Hermite basis has properties of endpoints, monotonicity of parameter λ, adjustability of shape etc.. Based on the TC-Hermite basis, we also construct a new Coons patch with shape parameters, called TC-Coons patch. The TC-Coons patch not only adjusts its shape through shape parameters, but also accurately represents quadratic surfaces, such as sphere and torus. Furthermore, the application of the TC-Coons patch in geometric modeling is also discussed.

A full coverage path planning algorithm

based on backtracking method

LI Kai,CHEN Yongfu,JIN Zhiyong,LIU Tian,WANG Zhenting,ZHENG Jiongzhi

2019, 41(07): 1227-1235. doi:

Abstract ( 581 )

PDF (1027KB) ( 731 ) 　　

With the rapid development of sweeping robots, as the core technology, full coverage path planning technology has become increasingly important. Many algorithms have been proposed so far, such as the artificial potential field method, template method and unit decomposition method, however, they have problems such as low coverage, high repetition rate and low operating efficiency. In order to solve the problems existing in the proposed algorithms, we propose a full coverage path planning algorithm based on backtracking method. The algorithm firstly uses the westmove first algorithm to achieve local area coverage. Then, in order to solve the problem that the missing area is not covered in the process of local area coverage, we establish a perfect backtracking mechanism. And the improved A* algorithm is used to plan a smooth unobstructed path from dead points to backtracking points. Simulation results show that compared with the BA* algorithm, the proposed algorithm has higher operating efficiency and lower overlap rate.

Dynamic rule induction for hybrid information

systems based on neighborhood granulation

CHENG Yi1,2,LIU Yong3

2019, 41(07): 1236-1243. doi:

Abstract ( 132 )

PDF (592KB) ( 306 ) 　　

The data types of existing knowledge discovery models of hybrid information systems are mostly symbolic, numerical conditional and symbolic decision attributes. Most of the models focus on attribute reduction or feature selection, but research on rule extraction is relatively few. We construct a dynamic rule induction model for hybrid information systems covering more data types. Firstly, the existing formulas for calculating value differences of different types of attributes are modified, and a definition of the distance of cross-level symbolic values is given, thus a new mixed distance is defined. Secondly, we propose three methods to induce the decision class for numerical decision attributes. Then, we propose a generalized neighborhood rough set model based on neighborhood granulation, and the lower and upper approximations of an arbitrary subset under dynamic granulation are presented, which underlies a foundation for the construction of a dynamic rule induction algorithm. The model can be used to extract rules from the information systems with the following features, namely: (1) condition attribute set includes singlelevel symbolic, crosslevel symbolic, numeric, intervalvalued, setvalued and missing data; (2) decision attribute set can include symbolic and numeric data. The rule induction algorithm is evaluated on several data sets from the UC Irvine Machine Learning Repository. Experimental results show that the algorithm can achieve good classification performance.

Service recommendation based on

heterogeneous user network embedding

WU Hao1，WANG Xiaochen1，ZENG Cheng1,2，HE Peng1,2

2019, 41(07): 1244-1250. doi:

Abstract ( 166 )

PDF (659KB) ( 257 ) 　　

In order to make full use of user’s tagging and social relationship information and improve the accuracy of recommendation results in service recommendation, we propose a new recommendation method based on heterogeneous user network embedding. It maps a user node to a lowdimensional vector, and then utilizes the obtained user vector to carry out collaborative recommendation. We verify the method on Delicious, a public dataset. Experimental results show that our method outperforms the two existing methods, and the recommended accuracy is increased by 18.1% and 16.6% respectively. Furthermore, the results also suggest that the direct relationship between nodes is as important as the "friends of friends" relationship in representing user node structural information when learning the user representation vector. At the same time, it is most suitable to return top 25 similar users for the target user in the recommendation process.

A grassland classification algorithm using convolutional

neural network based on feature integration

ZHANG Meng，QIAN Yurong，DU Jiao，FAN Yingying

2019, 41(07): 1251-1256. doi:

Abstract ( 220 )

PDF (537KB) ( 281 ) 　　

In order to improve the precision of grassland classification from remote sensing images, we analyze the characteristics of image features extracted from convolutional neural networks (CNNs), and propose a remote sensing image feature extraction method based on feature-integrated depth neural networks. Firstly, PCA whitening is performed on the remote sensing image to reduce the correlation between data and accelerate the learning rate of neural networks. Secondly, both low-level features and high-level features are bilinearly integrated to enhance and optimize the integrated features. Finally, the remote sensing data is trained. As the introduction of effective information in new features, both feature expression ability and the grassland classification accuracy are improved. Experimental results show that the proposed algorithm can effectively improve the accuracy of grassland classification. The classification accuracy reaches up to 94.65%. Compared with the traditional convolutional neural network, BP neural network and SVM algorithm, our accuracy is increased by 4.3%, 10.39% and 15.33% respectively.

English-Chinese translation based on

an improved seq2seq model

XIAO Xinfeng1,2,LI Shijun2,YU Wei2,LIU Jie2,LIU Beixiong1

2019, 41(07): 1257-1265. doi:

Abstract ( 180 )

PDF (949KB) ( 344 ) 　　

Current machine translation systems optimize and evaluate the translation process in Indo-European languages to enhance translation accuracy. But researches about Chinese language are few. At present the seq2seq model is the best method in the field of machine translation, which is a neural machine translation model based on the attention mechanism. However, it does not take into account the grammar transformation between different languages. We propose a new optimized English-Chinese translation model. It uses different methods to preprocess texts and initialize embedding layer parameters. Additionally, to improve the seq2seq model structure, a transform layer between the encoder and the decoder is added to deal with grammar transformation problems. Preprocessing can reduce the parameter size and training time of the translation model by 20%, and the translation performance is increased by 0.4 BLEU. The translation performance of the seq2seq model with a transform layer is improved by 0.7 to 1.0 BLEU. Experiments show that compared to the existing seq2seq mainstream model based on the attention mechanism, the training time for English-Chinese translation tasks is the same for corpus of different sizes, but the translation performance of the proposal is improved by 1 to 2 BLEU.

Monaural voiced speech separation based

on computational auditory scene analysis

ZHANG Lina，ZHANG Erhua,JIANG Junliang

2019, 41(07): 1266-1272. doi:

Abstract ( 165 )

PDF (912KB) ( 329 ) 　　

Aiming at the problem of voiced speech separation in monaural speech separation, we propose an accurate pitch period estimation method. Firstly, using the shortterm stability of speech and the continuity of the pitch period as clues, we use the cepstrum peak of speech signals to form the pitch spectrum, and the pitch period track is automatically extracted. Then, the spectrum of each harmonic is picked up by using the property that the harmonic frequency is an integer multiple of the fundamental frequency. Finally, the voiced speech is reconstructed by the inverse Fourier transform. Experimental results show that this method can accurately extract the pitch period track and effectively separate voiced signals.

A hybrid culture algorithm optimization strategy

for traveling salesman problem

MA Han，CHANG Anding，CHEN Tong，LI Jiangjie

2019, 41(07): 1273-1278. doi:

Abstract ( 141 )

PDF (630KB) ( 311 ) 　　

Combining the genetic algorithm and simulated annealing algorithm with the culture algorithm, we design a hybrid culture optimization algorithm to solve the traveling salesman problem (TSP). The strategy contains two parts: the population space and the reliability space. The population space evolves according to the hybrid genetic annealing algorithm and sends optimal individuals to the reliability space. The reliability space extracts the information contained by the optimal individuals to guide population evolution. Experiments on TSP benchmark show that compared with other optimization algorithms, the hybrid cultural optimization strategy can reduce the deviation rate of the result to be 0.6% to 13.01% when obtaining the optimal path. Experiments verify the effectiveness and superiority of the hybrid cultural optimization strategy for solving the TSP.

Design of virtual hand operation for

virtual engineering maintenance training

GENG Hong，WEN Fei

2019, 41(07): 1279-1284. doi:

Abstract ( 142 )

PDF (796KB) ( 355 ) 　　

null

Optimization and simulation of

multivariable dynamic matrix control

XING Yichao，ZHANG Guangming，YU Hui，YANG Jiadong

2019, 41(07): 1285-1290. doi:

Abstract ( 183 )

PDF (757KB) ( 370 ) 　　

Based on the multivariable dynamic matrix control algorithm, we optimize the feedback correction part. When the prediction vector is corrected, a new function is introduced to make a dynamic feedback correction module. Furthermore, we optimize the parameters in the displacement matrix, compensate the system and improve the control effect. Taking a distillation column as the research subject, we analyze the process, select reasonable variables, and conduct simulations of the optimized multivariable dynamic matrix control algorithm. Simulations show that this algorithm can effectively improve system response speed, reduce system overshoot, improve the overall performance with strong robustness, good control effect and good application prospect.

Current Issue

Author center

Review center

Online journal