Computer Engineering & Science

State of the art analysis of China HPC 2020

YUAN Guo-xing, ZHANG Yun-quan, YUAN Liang

2020, 42(12): 2103-2108. doi:

Abstract ( 544 )

PDF (933KB) ( 410 ) 　　

In this paper,according to the latest China HPC TOP100 rank list released by CCF TCHPC in the late November,the total performance trends of China HPC TOP100 and TOP10 of 2020 are presented.Followed with this,characteristics of the performance,manufacturer,and application area are analyzed separately in detail.

A virtual machine memory access feature extraction method under Sunway architecture

SHA Sai, WANG Chao, DU Han-lin, LUO Ying-wei, WANG Xiao-lin, WANG Zhen-lin

2020, 42(12): 2109-2116. doi:

Abstract ( 642 )

PDF (946KB) ( 386 ) 　　

Virtualization technology is one of the important pillars of cloud services. Virtualization fully extends the flexibility of physical resources and improves the utilization of physical resources. With the development of national informatization level, the core technology of cloud servers is more and more independent, controllable, safe and efficient. In recent years, as a typical representative of domestic servers, Sunway has been fully developed. In order to further improve the performance of virtual machines in Sunway architecture, this paper proposes a method of extracting the memory access features of virtual machines in Sunway architecture. By combining the advantages of Sunway architecture, this paper calculates the memory page miss ratio curve of virtual machines online based on the least recently used stack method, and reduces the performance cost of feature by using the hot set mechanism. Experiments show that the method can accurately calculate the working set size of virtual machine, the average error is less than 3%, and the average performance cost is no more than 8.3%. This work provides conditions for dynamic memory allocation of Sunway virtual machines, and finally improves the overall performance and memory utilization of Sunway cloud servers.

A highly scalable general purpose CFD software architecture and its prototype implementation

GUO Xiao-wei, LI Chao, LIU Jie, XU Chuan-fu, GONG Chun-ye, CHEN Li-juan

2020, 42(12): 2117-2124. doi:

Abstract ( 846 )

PDF (1020KB) ( 715 ) 　　

Based on the current status of the domestic general-purpose CFD software development, this paper analyzes the goals of the architectural design of highly scalable general-purpose CFD software, and proposes an object-oriented and highly decoupled general-purpose CFD software architecture. Based on this architecture, a general-purpose CFD software prototype system is developed. Finally, through a complete case, the CFD prototype system is tested and analyzed, which verifies the feasibility of the software architecture.

Research on hardware implementation technology of LT code’ encoder based on heterogeneous multicore SoC

JIANG Zhong-ming, YANG Quan-sheng

2020, 42(12): 2125-2132. doi:

Abstract ( 727 )

PDF (669KB) ( 364 ) 　　

In this paper, a heterogeneous multi-core SoC is modeled and analyzed, and the implementation problem of LT code’ (Luby Transform code) encoding on a heterogeneous multi-core SoC is transformed into a mapping problem of LT code encoding subtasks on the heterogeneous multi-core SoC, Besides, a task mapping method based on genetic algorithm is given. Finally, the encoder is implemented on the heterogeneous multi-core SoC based on Xilinx ZYNQ-7000. Experimental results show that the Experimental results show that the designed LT code encoder can adapt to different performance and resource requirements, increasing the practicability of the hardware platform and the flexibility of application system design.

An efficient parallel computing method for structural analysis based on heterogeneous supercomputer

DING Jun-hong, MIAO Xin-qiang, LI Gen-guo

2020, 42(12): 2133-2140. doi:

Abstract ( 892 )

PDF (820KB) ( 490 ) 　　

In order to exploit the efficient computing power of many integrated cores on heterogeneous cluster, a multi-level and multi-granularity collaborative parallel computing method is proposed for finite element structural mechanical analysis. Computing tasks are divided into three levels: inter-node parallelism, inter-device parallelism and inter-core parallelism. Through mapping decomposable comput- ing jobs to different hardware layers of heterogeneous MIC system, the proposed method not only effectively resolves the load balancing problem between CPU and MIC devices, but also significantly reduces the communication overheads of the system. Different engineering simulation case experiments for large scale parallel computing were conducted on “Tianhe 2” supercomputer. Up to 39000 CPU+MIC cores were employed and the finite element size of the analysis was more than 100 million units. Test results show that the proposed method can achieve good speedup and parallel computing efficiency in large scale parallel computing of finite element structural analysis. The optimized adaptation of finite element structural analysis and heterogeneous MIC computing platform is realized, which can provide reference for parallel porting and performance optimization of similar applications.

Performance optimization of secure application based on TrustZone

YANG Bao-xuan, DONG Pan, ZHANG Li-jun, DING Yan

2020, 42(12): 2141-2150. doi:

Abstract ( 535 )

PDF (928KB) ( 347 ) 　　

TrustZone technology has been widely used in the security protection of various smart systems, such as data encryption, fingerprint login, DRM protection, electronic payment and so on. TrustZone technology provides programs with a trusted execution environment (TEE) that is isolated from the host environment to provide the runtime protection for important code and data. Therefore, the calling process of the security application based on TrustZone has changed, then the application adds processes such as data sharing and messaging between the secure and non-secure worlds, which causes additional performance overhead. This paper locates four key elements that affect the performance of the security application: world switch, interrupt, shared memory management, and data copy. On this base, four corresponding performance optimization methods are proposed. Through the AES encryption ser- vice based on TrustZone technology, the proposed performance optimization methods were compared and tested to verify their effectiveness. Experimental results show that: 1. Setting parameters reasonably can improve the performance by 31% at most. 2. Masking external interrupts can improve the perfor- mance by 4.5% at most. 3. Memory reusing can improve the performance by 37% at most. 4. Reducing memory copy can improve the performance by 39% at most.

Optimization and application of atomic Kinetic Monte Carlo program OpenKMC in defect damage of reactor pressure vessel steel

SHANG Zi-hao, , SHANG Hong-hui, WANG Dong-jie, ZHANG Yun-quan, HE Xin-fu, CHEN Ze-hua, WANG Dong, ZHANG Guang-ting

2020, 42(12): 2151-2162. doi:

Abstract ( 611 )

PDF (2682KB) ( 403 ) 　　

The Fe-Cu binary alloy with BCC structure is used as the RPV simulation steel. The kinetic Monte Carlo method based on Pair potential and EAM potential is adopted to simulate the precipitation process of Cu-rich clusters in the system under thermal aging by introducing vacancy point defects. At the same time, the program is optimized, and the correctness and validity of the algorithm are verified. The performance of the optimized program is analyzed by using high performance computing resources. The numerical results show that, by introducing a certain number of vacancy point defects, Cu-rich clusters and Cu-vacancy complex clusters can be precipitated simultaneously in the system, and the complex clusters are more likely to become the largest clusters in the system. The precipitation process can be accelerated by increasing the number of vacancies in the system. In addition, increasing the number of vacancies does not have a significant impact on the total number and density of clusters in the system, but it can promote the coarsening of clusters and make them grow into larger size precipitates.

Research on the parallel computing method of submarine’s torpedo defense model with self-propelled acoustic decoy

LI Wen, CHI Li-hua, ZHANG Hui, ZHANG Zhe, LIU Jie,

2020, 42(12): 2163-2168. doi:

Abstract ( 777 )

PDF (599KB) ( 508 ) 　　

One of the main means of underwater defense is to use self-propelled acoustic decoy to defend the acoustic homing torpedo. The calculation amount of traditional exhaustive statistical method increases rapidly with the increase of decision parameters, which cannot satisfy the real-time requirements. According to the torpedo defense model based on a multi-entity finite state machine, this paper proposes a two-level parallel strategy, which divides the simulation cycle between processes and threads, and makes the optimal decision through data exchange. The experimental results demonstrate that the parallel model can make decisions identical to the actual combat situation in a short time. Under the calculation amount of 404 schemes simulation, the running time of the model is shortened from 144.65 s to 1.2 s, 120 times faster than the original program, which effectively solves the dilemma of real-time scheme decisions.

Design and implementation of heterogeneous architecture for database query acceleration#br#

#br#

LI Ren-gang, REN Zhi-xin, HUANG Guang-kui, SUN Jie, WANG Feng, ZHANG Chuang,

2020, 42(12): 2169-2178. doi:

Abstract ( 868 )

PDF (1106KB) ( 577 ) 　　

Database is a key workload in data analysis, artificial intelligence, cloud computing, big data, and other fields, and it is the key to improving the overall performance of the system. The query execution efficiency of traditional database systems is low, and the CPU usually needs to process tran- saction loads first, so that data query becomes a bottleneck that restricts the performance and efficiency of the entire system. In order to improve the systems ability of supporting large-scale data and high- intensity instantaneous concurrent access, a CPU+FPGA heterogeneous architecture is proposed to accelerate database queries. The accelerator is integrated into the CPU by using coherent acceleration processor interfaces. The way of customizing multi-engine configurable query logic in FPGA accelerates database query. The commonly used SQL query statement SELECT is focused on, and the advantages of the system in terms of latency and simplified software stack are analyzed in detail. Finally, the Inspur F37X acceleration card and Inspur server were used to verify the function and performance of the acceleration model. The test results show that, compared with the latest CPU-based query method, the proposal increase the overall processing speed by 3 to 9 times. The accelerated structure of structural computing can be applied in the design of database hardware specialization in the future.

Design and implementation of RISC-V assembler supporting vector instructions

DENG Ping, ZHU Xiao-long, SUN Hai-yan, Ren Yi

2020, 42(12): 2179-2185. doi:

Abstract ( 566 )

PDF (543KB) ( 488 ) 　　

Vector computing can effectively improve the computing efficiency of computers and reduce unnecessary hardware overhead. With the improvement of CPU computing capability, the expansion of register number, and other hardware development trends, vector computing has becoming one of the widely used technologies to improve the CPU performance. The RISC-V architecture, which is highly focused on, also needs vector technology to improve the architecture performance. The open source RISC-V assembler only support standard instructions, and does not support vector instructions until now. In order to support RISC-V vector instructions, this paper details the design and implementation of RISC-V assembler supporting vector instructions.

Multiple moving targets detection under complex background by integrating spatio-temporal context#br# #br# #br#

ZHANG Yin, CAI Xu-yang, XU Qian-qian, YAN Jun-hua, SU Kai, ZHANG Kun

2020, 42(12): 2186-2192. doi:

Abstract ( 551 )

PDF (734KB) ( 417 ) 　　

In order to solve the problem of low detection rate when targets enter/leave the field of view, or targets are partially occluded, a multi moving targets detection algorithm based on spatio- temporal context is proposed. Firstly, the time context information is used to extract the candidate target region, which is based on the forward and backward motion history map. Secondly, the spatial context information and the target apparent information are used to calculate the target confidence map, which is based on the sparsely encoded CRF model. Finally, the target confidence of the candidate target region is calculated to detect the multiple moving targets. The experimental results show that the proposed algorithm has good detection performance, and it has higher recall rate, precision and F measure than other multi-target detection algorithms under the condition of guaranteeing high positioning accuracy.

Macaque brain extraction based on level set of fusion partition and Canny functional

GUO Jin-xiu, ZHANG Yue-fang, DENG Hong-xia, LI Hai-fang

2020, 42(12): 2193-2198. doi:

Abstract ( 425 )

PDF (594KB) ( 304 ) 　　

The traditional level set has randomness in the location selection of the initial contour, and lacks the processing of edge information. Therefore, accurate extraction of brain tissue edges cannot be achieved. Therefore, firstly, the level set algorithm of fusion partition and Canny functional fuses the idea of partition and combines the morphological information of each region to complete the initial contour position selection, so that the initial contour contains more brain tissue, and improve the efficiency of brain tissue extraction. Secondly, the Canny operator is integrated into the energy functional, which improves the accuracy of detecting the edge of the macaque brain tissue while retaining the superiority of the traditional level set on the uneven grayscale image. Results show that the algorithm achieves accurate extraction of macaque brain tissue with an accuracy of up to 86%.

Occlusion face recognition based on discriminant low-rank matrix recovery and collaborative representation

SUN Yu-hao, TAO Yang, HU Hao

2020, 42(12): 2199-2207. doi:

Abstract ( 374 )

PDF (845KB) ( 387 ) 　　

In the field of face recognition, when the training samples and test samples are subject to severe noise pollution, the performance of the traditional subspace learning and the classical sparse re-presentation (SRC) will drop sharply. In addition, the method based on sparse representation also faces the problem of computational complexity. In order to alleviate those problems, an occlusion face recognition method based on discriminating low-rank matrix recovery and collaborative representation is proposed. Firstly, the low-rank matrix recovery can recover the clean training samples with low-rank structure from the contaminated training samples and the structural non-correlation constraints can improve the discriminating ability of the recovered data effectively. Secondly, by learning the low-rank projection matrix between the original contaminated data and the recovered low-rank data, the contaminated test samples are projected into the corresponding low dimensional subspace to perform its correction. Finally, the modified test samples are classified by the collaborative representation classification method (CRC) to obtain the final recognition result. Experimental results on Extended Yale B and AR databases show that the proposed method has better recognition performance in occlusion face recognition.

UAV image mosaicking and verification algorithm with geographic information processing

LIANG Zhong-yan, QI Hong-yu, WANG Wei-liang, HU Jie

2020, 42(12): 2208-2216. doi:

Abstract ( 587 )

PDF (859KB) ( 339 ) 　　

In the traditional video and image processing technology field, researchers often focus on the processing of the image content, especially the accuracy and the speed. However, the geographic information data carried by UAV video and image is often ignored, resulting in the image only containing the scene information and losing the geographic data information after the image processing is complet- ed, so that the user cannot quickly obtain the geographic information of the target of interest from image processing results. In order to process the geographic information efficiently, an image mosaicking and verification algorithm for UAV image with geographic information regards geographic information data as multi-channel double floating-point matrix data, which can be calculated synchronously using matrix processing algorithms. Meanwhile, the accuracy and speed of a large number of image mosaicking tasks can be improved by using image-splicing algorithm based on grouping control with geographic information. The experimental results show that the proposed algorithm can effectively process the UAV image with geographic information, especially in image mosaicking.

Target tracking by deep fusion of fast multi-domain convolutional neural network and optical flow method

ZHANG Xiao-li, ZHANG Long-xin, XIAO Man-sheng, ZUO Guo-cai

2020, 42(12): 2217-2222. doi:

Abstract ( 489 )

PDF (727KB) ( 554 ) 　　

Aiming at the problem of slow speed of the convolutional neural network target tracking algorithm, a target tracking algorithm combining fast multi-domain convolutional neural network (Faster MDNet) and optical flow method is proposed. The optical flow method is used to obtain the moving state of the target, and the preliminary selection box is used as the tracking target position. Then, the preliminary selection box is used as the input of Faster MDNet, and Faster MDNet is used as the detector to obtain the exact position and bounding box of the tracking target. Experiments on the target tracking benchmark data set VOT2014 prove that the algorithm’s online tracking speed is increased by 8 times and the accuracy is improved by 10%.

Prediction of crowd massing abnormity based on multi-scale convolutional neural network

LUO Fan-bo, WANG Ping, XU Gui-fei, LEI Yong-jun, FAN Yang

2020, 42(12): 2223-2232. doi:

Abstract ( 699 )

PDF (1038KB) ( 636 ) 　　

There are few methods for detecting abnormal behaviors of crowd massing in public places, and they have the following problem: most of the detection methods are performed after the crowd has gathered abnormally, and the detection accuracy is not high, and the timeliness is not good enough. Therefore, a crowd massing anomaly prediction model based on
multi-scale convolutional neural network (MCNN) is proposed. Firstly, a crowd counting model is built through MCNN for testing the video of crowd massing anomaly. In the test, the number of crowd and the coordinate points of their heads are acquired. Secondly, the crowd density, crowd distance potential energy and crowd distribution entropy are calculated. Finally, the predictive model is built through the eigenvalues of three crowd motion state by PSO-ELM. Through the change of characteristic data, the prediction is completed. The experimental results show that, compared with the existing algorithms, the proposed algorithm can effectively achieve the early warning and detection of abnormal behaviors in crowd massing. With a prediction accuracy rate of 97.17%, it’s more time-sensitive and provides more time for taking corresponding emergency measures.

A mutually beneficial adaptive satin bowerbird optimization algorithm based on non-uniform mutation

WANG Yi-rou, ZHANG Da-min, FAN Ying

2020, 42(12): 2233-2241. doi:

Abstract ( 573 )

PDF (1020KB) ( 434 ) 　　

To solve the problem that the satin bowerbird optimizer (SBO) is prone to low accuracy and slow convergence, this paper proposes an improved satin bowerbird optimizer (ISBO). Firstly, the non-uniform mutation operator is introduced to dynamically adjust the search step size of each iteration bowerbird, so that the algorithm can quickly and efficiently find the global optimal value. Secondly, the mutually beneficial factor is used to introduce more combinatorial modes to the social part of the algorithm so as to no longer searches around the previous bowerbird, thus obtaining a better optimal solution. Finally, in order to better balance the local and global search ability of the algorithm, the inertia weight factor of cosine change is introduced to update the bowerbird position formula. Convergence rate analysis, Wilcoxon test and 8 benchmark functions are used to evaluate the efficiency of the improved satin bowerbird optimization algorithm. The results show that the improved algorithm has better global search capability and solution robustness, and the optimization precision and convergence speed are also better than the original algorithm.

A scheduling algorithm for online customer service system

JI You-lang, ZHU Jun, ZOU Yun-feng, ZHOU Zi-xin, CHEN Xing

2020, 42(12): 2242-2251. doi:

Abstract ( 570 )

PDF (1046KB) ( 408 ) 　　

Different from traditional customer service systems, online customer service systems offer business services for multiple customers simultaneously, which makes the adaptation and scheduling between service providers and customers a big challenge. Based on the characteristics of online customer service, this paper proposes a scheduling model for online customer service systems. The scheduling model is composed of three constituents: a multi-priority customer queue, the states of the scheduling system and the transition relations between them, and the correspondence between scheduling strategies and states of the system. Its scheduling algorithm is designed. Experiments verify the rationality of the scheduling model and the effectiveness of the scheduling algorithm. In comparison to the operating customer service system, the algorithm can not only considerably reduce the average waiting time of customers, but also achieve load balancing among service providers, when guaranteeing high quality of services.

Semi-online algorithms for hierarchical scheduling on three parallel machines

XIAO Man, DING Lu, ZHANG Yi

2020, 42(12): 2252-2258. doi:

Abstract ( 472 )

PDF (395KB) ( 270 ) 　　

This paper studies a semi-online hierarchical scheduling problem on three identical machines. In the problem, there is only one machine with hierarchy 1 and two machines with hierarchy 2, and the goal is to minimize the makespan. When the total size of low-hierarchy is known, an online algorithm with the competitive ratio of 5/3 and the lower bound of 3/2 is given. When the total size of high-hierarchy is known, an online algorithm with the competitive ratio of 9/5 and the lower bound of 3/2 is given. When the total size of each hierarchy is known, an online algorithm with the competitive ratio of 3/2 and the lower bound of 4/3 is given. When the total size of jobs is known, a best possible online algorithm with the competitive ratio of 3/2 is given.

Frequent itemsets mining for data stream based on AO algorithm

WEN Kai, GENG Xiao-hai, ZHU Lu-wei, XU Meng-meng,

2020, 42(12): 2259-2264. doi:

Abstract ( 363 )

PDF (639KB) ( 320 ) 　　

In view of a series of problems existing in support update, window update mode and frequent k-itemset mining of traditional frequent itemset mining algorithm in data flow, which results in low efficiency of space and time, an efficient AO algorithm for mining frequent itemsets in data streams is improved. The algorithm uses the idea of sliding window to mine the data stream in blocks; when there is new data flowing in the full window, the residual insertion is used to update the data; and operation is used to solve the support degree of frequent k-itemsets, and the superset detection is combined in the mining process, which greatly improves the mining efficiency. The experimental results show that the algorithm has good superiority in both time and space efficiency.

Current Issue

Author center

Review center

Online journal