Computer Engineering & Science

Running optimization of deep learning accelerators under different pruning strategies

YI Xiao, MA Sheng, XIAO Nong

2023, 45(7): 1141-1148. doi:

Abstract ( 538 )

PDF (807KB) ( 662 ) 　　

Convolutional neural networks have achieved great success in the field of image analysis. With the development of deep learning，deep learning models are becoming more and more complex，and the amount of deep learning calculations is increasing rapidly. The sparse algorithm can effectively reduce the amount of deep learning calculations without reducing the accuracy. This paper uses three different pruning strategies under the ResNet18 model and GoogleNet model to reduce the calculation amount of the deep learning model. The results show that the global unstructured pruning strategy has a sparsity of 94% and 90% without reducing the accuracy respectively, the level unstructured pruning strategy has an average sparsity of 83% and 56% without basically reducing the accuracy respectively, and the level structured strategy has an average sparsity of 34% and 22% without basically reducing the accuracy respectively. Under the three pruning strategies, the delay and power consumption results obtained by running the sparse deep learning model in the Eyeriss deep learning accelerator shows that, compared with the unpruned strategy, under the ResNet model, the global unstructured pruning strategy has a 66.0% reduction in latency and a 60.7% reduction in power consumption, the level unstructured pruning strategy has a 66.0% reduction in delay and a 80.6% reduction in power consumption, and the level structured pruning strategy has a 65.6% reduction in latency and a 33.5% reduction in power consumption. Under the GoogleNet model, the global unstructured pruning strategy has a 74.5% reduction in latency and a 63.2% reduction in power consumption, the level unstructured pruning strategy has a 73.6% reduction in delay and a 55.0% reduction in power consumption, and the level structured pruning strategy has a 26.8% reduction in latency and a 5.8% reduction in power consumption. Therefore, this paper concludes that the global unstructured pruning strategy can greatly reduce the delay and energy consumption without reducing the accuracy. Under the level unstructured pruning strategy, the delay and energy consumption can be greatly reduced under the premise of slightly reducing the accuracy.

A deep learning programming framework for FT-Matrix DSP+MatrixZone heterogeneous systems

KANG Yu-han, SHI Yang, CHEN Zhao-yun, WEN Mei

2023, 45(7): 1149-1158. doi:

Abstract ( 559 )

PDF (2317KB) ( 626 ) 　　

To meet the fast iteration speed and high computing power requirements of deep learning models, mainstream hardware vendors are increasingly inclined towards heterogeneous systems consisting of general-purpose processors and AI-specific accelerator cores. However, AI-specific accelerator cores only support certain core operators and do not have general programming capabilities. Therefore, how to efficiently deploy deep learning tasks on such heterogeneous architectures is worth further research. Based on the domestically developed FT-Matrix DSP+MatrixZone heterogeneous system platform, this paper designs and implements a deep learning programming framework, called KaiSa. KaiSa analyzes the input parameters of the deep learning model, identifies the operator type, and assigns it to the corresponding computing core. For complex operators, KaiSa automatically completes the optimal search for the block size based on a performance model, improving the performance of dual-core parallel computing. At the same time, KaiSa shields all low-level hardware details to provide users with a friendly programming environment for efficient program development. Experimental results show that KaiSa can achieve performance improvements of up to 39.0%.

SymQC:A symbolic quantum computing simulator

FU Xiang, LI Zi-hao, HUANG Zi-xiao, YANG Yao-jia, LIU Ding-dong, ZHANG Chun-hui, LI Xiao-fang

2023, 45(7): 1159-1169. doi:

Abstract ( 842 )

PDF (799KB) ( 865 ) 　　

Currently, mainstream quantum computing simulators are generally based on numerical calculations, which suffer from precision loss, lack of intuitive representation of quantum states, and difficulty in supporting parameterized quantum circuits. Although symbolic quantum computing simulators have been proposed, they are limited in describing parameterized quantum states, constructing custom quantum gates, and integrating with quantum programming environments. To address this issue, this paper proposes a symbolic quantum computing simulator, SymQC. SymQC can use either full amplitude vectors or Dirac symbols to represent quantum states, calculate the equivalent matrix of symbolically parameterized quantum circuits, simulate the evolution of quantum states under quantum circuits, and output the execution results of quantum algorithms in different forms. This paper describes the software structure of SymQC and provides a mathematical proof of a commonly used quantum state update algorithm. Algorithm instances including variational quantum eigensolver (VQE) algorithm are used to verify the ability of SymQC.

A parallel balanced cascade support vector machine

LIU Yi-cheng, LIU Xiao-yan, YAN Xin

2023, 45(7): 1170-1177. doi:

Abstract ( 733 )

PDF (1661KB) ( 471 ) 　　

Cascade support vector machine (CSVM) divides the dataset into groups and trains them in parallel, greatly reducing training time and memory usage. However, the accuracy of the model obtained using this method has certain errors compared to direct training. In order to reduce the error, the reasons for the error caused by grouping training are analyzed, and the ideal grouping without error is summarized. A balanced cascade support vector machine (BCSVM) algorithm is proposed. The algorithm balances the sample proportions in the sub-datasets after grouping, ensuring that the sample proportions in the sub-datasets are the same as those in the original dataset. It adjusts the parameter values during grouping training to obtain more support vectors, thereby reducing the possibility of global support vector loss. At the same time, researchers discussed the effectiveness of BCSVM algorithm and demonstrated that models obtained using this algorithm have better performance in prediction accuracy than those obtained using random grouping CSVM. Finally, multiple common datasets are used for experimental verification, and the results show that the accuracy error obtained by training using the BCSVM algorithm is reduced from 1% to about 0.1%, i.e., by one order of magnitude.

Formal verification of Greens theorem and its applications

LIU Yong-mei, WANG Guo-hui, GUAN Yong, ZHANG Jing-zhi, SHI Zhi-ping, DONG Lu

2023, 45(7): 1178-1187. doi:

Abstract ( 470 )

PDF (572KB) ( 776 ) 　　

Greens theorem is widely used in physics, hydrodynamics, chemistry and other fields. Traditional computer simulation and numerical calculation methods are usually used to build the system model based on Greens theorem. However, the possible system defects in the tool software lead to the deviation of the model, which makes the task fail. In order to solve the above problems, this paper adopts the formalization method based on higher-order logic to realize the higher-order logic modeling and verification of Greens theorem related content in the theorem prover HOL Light. Firstly, the basic concepts and properties of gradient and divergence are formally described. Secondly, formal modeling and verification of Greens theorem and its properties are carried out. Finally, the high-level logical derivation of groundwater control model is completed based on the formal model of Greens theorem, so as to ensure the safety of the system model.

An equipment fault detection method based on cloud-edge collaboration variational autoencoder neural network

LIU Yang, SU Hang, HE Qian, SHEN Pu, LIU Peng

2023, 45(7): 1188-1196. doi:

Abstract ( 530 )

PDF (1384KB) ( 559 ) 　　

In response to the overall trend and practical application of multi-threshold points in electromechanical equipment fault data detection, this paper proposes a cloud-edge collaborative electromechanical equipment fault detection method based on a variational autoencoder with gated recurrent unit (VAE-GRU). A cloud-edge collaborative electromechanical equipment fault detection architecture is structed, including a terminal equipment layer, an edge node layer, and a cloud center layer, in which electromechanical equipment is detected for faults through collaboration between the cloud center and edge nodes. The VAE-GRU model is design, where the input data is sampled by VAE, and GRU is used to capture the long-term correlation of the timing data. A dynamic threshold selection algorithm is used to calculate the fault detection threshold, that can automatically select the optimal threshold for different data sets to improve fault detection accuracy. Experimental results show that the proposed method improves the accuracy of electromechanical equipment fault detection while reducing latency, ensuring the normal and stable operation of electromechanical equipment.

Association analysis of alarm information based on power network situation awareness platform

LEI Xuan, CHENG Guang, ZHANG Yu-jian, GUO Liang, ZHANG Fu-cun

2023, 45(7): 1197-1208. doi:

Abstract ( 686 )

PDF (1488KB) ( 715 ) 　　

The safety and stability of power networks have become increasingly important in the field of industrial control. Traditional information analysis for power networks overly relies on expert know- ledge, and existing analysis models suffer from problems such as high algorithm complexity and rule redundancy. To address these issues, this paper proposes an advanced alarm information correlation analysis method that takes into account the characteristics of power networks. The method first eliminates noisy parts in the original alarm logs through a pre-processing module, then generates alarm transaction sets using a proposed method based on dynamic sliding time window, and subsequently applies the FP-Growth algorithm to mine alarm association rules for power networks. Finally, a time-based alarm rule filtering algorithm is proposed to eliminate invalid rules. Experiments conducted on alarm data collected from a situation awareness platform deployed in a power grid company show that this method reduces the redundancy of alarm rules by an average of about 30% compared to other similar association analysis method, and can effectively extract key alarm rules in power networks to guide fault warning.

A low-frequency log noise filtering method in business process based on string matching algorithm

HE Zi-xian, FANG Xian-wen

2023, 45(7): 1209-1215. doi:

Abstract ( 352 )

PDF (632KB) ( 455 ) 　　

The process mining field focuses on the analysis of data generated by business process execution, aiming to extract operational process knowledge from the data. However, there may be some noise in the low-frequency logs of the model, which may negatively affect the analysis. Therefore, a method based on frequency change rules and string matching is proposed to identify and filter noise from low-frequency event logs. Firstly, based on the directly-follows graph and the eventually-follows graph, invalid direct activity pairs are identified from the event log sequence set according to frequency change rules. Then, combined with an improved string matching algorithm (KMP), the invalid activity sequences are matched with the low-frequency log traces based on the correspondence between the direct relationship of the direct-follow graph and the sequence fragment of the event log, thus filtering the noise in the log and optimizing the mining model. Finally, the effectiveness of the method is verified through specific case analysis and simulation experiments.

An online multi-pedestrian tracking method with Mask R-CNN

CAO Yu-dong, CHEN Dong-hao, CAO Rui, ZHAO Lang

2023, 45(7): 1216-1225. doi:

Abstract ( 476 )

PDF (837KB) ( 626 ) 　　

Pedestrian object detection and tracking have attracted much attention in the computer vision field. An improved multi-pedestrian tracking model is proposed, which improves the basic framework of Deep SORT and integrates Mask R-CNN to realize the detection, tracking and pose estimation of pedestrian. The anchor boxes with the more suitable aspect ratio for pedestrian target are adopted, which replace the anchor boxes of RPN to speed up the model and improve performance without complex calculation. In addition, attention mechanism is introduced into the deep residual network, i.e., the lightweight SKNet is used to choose the best convolution kernel adaptively to improve the feature representation for target detection. The histogram of gradient feature combined with color information is adopted instead of the convolution feature, which improves appearance feature association matching in the Deep SORT model so as to track pedestrian targets effectively under occlusion. The impact of various improvements on the model are verified through ablation studies, and the proposed model is compared with the current mainstream model. Experimental results show that the improved models are effective, which improves MOTA of NSH by 6% on the MOT16 tracking data set. The test performance of our proposal on the public datasets is superior to that of the compared models. The proposed model can still track pedestrian targets effectively when the background moves or pedestrian targets are occluded.

Metal surface defect detection based on improved YOLOv3

LIU Hao-han, SUN Cheng, HE Huai-qing, HUI Kang-hua

2023, 45(7): 1226-1235. doi:

Abstract ( 657 )

PDF (1515KB) ( 914 ) 　　

In order to improve the efficiency of detecting surface defects on industrial parts, a target detection method based on improved YOLOv3 is proposed. The latest attention mechanism SA (Shuffle Attention) with channel shuffling is introduced and combined with the residual unit of the Darknet-53 backbone structure of the YOLOv3 model to form the SA residual block structure, which fully utilizes the feature channel information to obtain the YOLOv3-SA model. For different datasets, the input images are scaled at different scales, and the K-means method is used to cluster the real bounding boxes to improve detection efficiency. The experimental results show that the recall rate of the YOLOv3-SA model reaches 95.4%, and the mAP can be increased by up to 7% compared to YOLOv3.

A circuit breaker moving contact tracking methods based on convolution and Transformer

CUI Ke-bin, CUI Ye-wei

2023, 45(7): 1236-1244. doi:

Abstract ( 616 )

PDF (1399KB) ( 676 ) 　　

Measuring the motion characteristics of circuit breaker moving contacts can help diagnose the operating status of the circuit breaker. Currently, most measurement methods are "contact" testing methods, which generally have problems with inconvenient installation and low measurement accuracy. Therefore, a new model that can achieve non-contact measurement method is proposed. Firstly, the multi-scale feature fusion structure is used to fuse the extracted multi-layer depth features. Secondly, the improved Transformer structure with introduced convolution operation is used for feature enhancement. Finally, the prediction head is used to predict the tracking results. Experimental analysis shows that compared with the original algorithm, the tracking success rate of the tracking algorithm has increased by 2.6%, and the precision has increased by 13.9%. The model can achieve accurate tracking and obtain the circuit breaker stroke time curve, which can reasonably reflect the action char-acteristics of the circuit breaker operating mechanism.

Underwater image edge detection based on multi-scale wavelet and Tsallis entropy

WANG Xiao-qi, ZHAO Xuan-zhi, LIU Zeng-li,

2023, 45(7): 1245-1252. doi:

Abstract ( 443 )

PDF (2444KB) ( 539 ) 　　

To address the problem of low contrast and edge blurring in underwater images, a multi-scale wavelet and Tsallis entropy-based underwater image edge detection algorithm is proposed. Firstly, combining the characteristics of multi-scale wavelet decomposition, the open dark channel model is used to remove low-frequency haze and the soft threshold operation is used to reduce high-frequency noise. Secondly, a two-dimensional Gaussian function is used to construct a Gaussian scale space for background estimation to distinguish background from target information. Finally, the optimal threshold is obtained by combining information entropy and Tsallis entropy, and the edge detection image is obtained. Experimental results show that the proposed algorithm can effectively detect the edge contours of degraded underwater images, remove false edge situations, and accurately extract the feature edges of the image. At the same time, tests show that the algorithm performs well in edge detection of atmospheric haze images.

A multi-stage adaptive hat detection algorithm in complex scenes

LUO Xiao-xia, DENG Yong, YE Ou

2023, 45(7): 1253-1262. doi:

Abstract ( 340 )

PDF (3098KB) ( 556 ) 　　

Existing object detection algorithms have problems with false positives and false negatives when detecting small hats in complex scenes. In this paper, a multi-stage adaptive hat detection algorithm (MAHD) is proposed. Firstly, a region proposal network (MA RPN) based on adaptive convolution is constructed, and the features of anchors are refined through multiple stages to improve the algorithms ability to recognize targets in complex backgrounds. Then, an adaptive sampling strategy is used to dynamically allocate positive and negative samples, and the focus loss function is combined to guide the training of MA RPN and improve the detection accuracy of small targets. Finally, experiments are conducted on a self-built HAT4.5k dataset. The results show that compared with the Grid R-CNN algorithm, the proposed algorithm improves AP by 2.6% and APS by 5.1%. The detection performance of small targets is further verified on the open-source VisDrone-DET 2019 dataset, demonstrating the feasibility and effectiveness of the proposed algorithm.

Skeleton behavior recognition based on attention-enhanced central difference adaptive graph convolution

BAI Shan, FENG Xiu-fang

2023, 45(7): 1263-1273. doi:

Abstract ( 448 )

PDF (1241KB) ( 787 ) 　　

In recent years, graph convolution network has attracted the attention of many researchers due to its excellent performance in the field of skeleton action recognition. However, most graph convolution can only aggregate node information, ignoring the difference between the features of the central node and adjacent nodes. Therefore, a central difference adaptive graph convolution network MRFAM-CDAGC based on multiple receptive fields attention mechanism is proposed. It not only adaptively aggregates the information of associated nodes in the graph topology of the central node, but also merge the local motion information between adjacent nodes and aggregate the gradient characteristics of the central node. The attention module with multiple receptive fields is added to make the model focus on the information of more discriminative joints and frames, so as to improve the accuracy of model recognition. Under the two baselines of NTU-RGB-D data sets, the accuracy rates of the model reach 89.1% and 96.0% respectively. The universality of the model is reflected in the dynamics of large-scale data set, which verifies the superiority of the algorithm in extracting spatiotemporal features and capturing global context information.

Identification of dynamic parameters of tandem robot based on WLS-MBO algorithm

ZHANG Yi-nan, DING Jian-wan

2023, 45(7): 1274-1281. doi:

Abstract ( 447 )

PDF (1239KB) ( 564 ) 　　

Aiming at the parameter identification of six-degree-of-freedom tandem robots, a dynamic parameter iden-tification method based on weighted least squares and migrating birds optimization(WLS-MBO) is proposed. Firstly, based on the dynamics principle of the robot, a linearized dynamic equation considering the Coulomb viscous friction parameters is obtained. The fifth-order Fourier series is used as the identification excitation trajectory, the robot controller drives the joints to track the excitation trajectory and collects the joint position and torque data during the robot movement, and the initial solution of the dynamic parameters is obtained by the weighted least square (WLS) method. Based on the results of the WLS method, the migrating birds optimization (MBO) is used for secondary optimization to improve the identification accuracy of parameters. The analysis of the results shows that the identification method has a good identification effect and can further improve the identification accuracy, and it verifies that the MBO has better global search ability.

Unsupervised feature selection based on autoencoder and local embedding

ZHAO Rui-ping, JIANG Ai-lian

2023, 45(7): 1282-1291. doi:

Abstract ( 449 )

PDF (991KB) ( 596 ) 　　

In order to maintain the local geometric structure of features while learning the deep nonlinear relationship between features, this paper proposes a single-layer autoencoder as a joint framework for feature selection and manifold learning. Firstly, the reconstruction capability of single-layer autoencoder is used to eliminate the single feature with weak contribution to the reconstructed sample, learn the deep nonlinear relationship of the feature, and carry out sparse regularization on the feature weight matrix. Secondly, an optimal feature subset is obtained by improving the local linear embedding algorithm to preserve the local structure among features. Finally, a new target loss function is designed and the L-BFGS algorithm is used for iterative optimization. Compared with other six unsupervised feature selection algorithms on six data sets, the experimental results show that this algorithm is superior to other unsupervised feature selection algorithms in clustering performance and classification performance.

A low-resource Lao text regularization task based on BiLSTM

WANG Jian, JIANG Lin, WANG Lin-qin, YU Zheng-tao, ZHANG Song, GAO Sheng-xiang,

2023, 45(7): 1292-1299. doi:

Abstract ( 425 )

PDF (987KB) ( 620 ) 　　

Text normalization (TN) is an indispensable work in the front-end analysis task of speech synthesis text. Lao text normalization is to convert non-standard words (NSW) in Lao text into spoken-form words (SFW). At present, the task of text normalization has not yet been carried out in Lao, which mainly faces the problems of difficult acquisition of training data, diversified language expression and text regularization with ambiguity. A text normalization task in Lao is carried out. This task is completed as a sequence tagging task, and neural networks are used to predict NSW with ambiguity in combination with context. The corpus of the Lao text normalization task is constructed, the model results is predicted through the neural network, the self-attention mechanism is increased to deepen the relationship between the sequence characters, and different strategies are explored to introduce the pre-trained language model. An accuracy of 67.59% is achieved on the test set.

Graph attention network with enhanced preference influence for recommendation

GAO Wei-wei, LIU Yang, MA Hui-fang, TANG Yue-chen

2023, 45(7): 1300-1307. doi:

Abstract ( 623 )

PDF (908KB) ( 606 ) 　　

Graph neural network can effectively capture the complex interaction behavior between user and item in the recommendation scene. By capturing the higher-order information of nodes in the graph, recommendation performance can be improved. A Graph Attention Network with Enhanced preference influence for recommendation (GEPR) is proposed. The algorithm uses graph attention network to fuse preference influence, and then captures the potential information between user and item interaction. Specifically, the user-item bipartite graph is firstly constructed based on user-item interaction, and the attention neighborhood aggregation strategy is designed to learn the embedded representation of user and item adaptively on the graph structure. Secondly, a preference influence enhancement layer is designed to strengthen the influence of similar users(items)' preferences on target users(items)' preferences. Finally, multi-layer perceptron is used to obtain the probability score of user-item interaction by coupling the preference influence of similar users (items) on target users (items) with the embedding effect of target users (items). Experimental results on two real data sets verify the rationality and validity of the attention neighborhood aggregation strategy and preference influence in the proposed method.

An improved whale optimization algorithm based on multi-strategy coordination

CHAI Yan, ZHU Yu, REN Sheng

2023, 45(7): 1308-1319. doi:

Abstract ( 493 )

PDF (803KB) ( 584 ) 　　

To solve the problems of whale optimization algorithm, such as low precision, slow convergence speed and ease of falling into local optimal, a multi-strategy collaborative improved whale optimization algorithm (MSWOA) is proposed. Firstly, the population information guidance mechanism is used to improve the mining efficiency of the global optimal position, so as to avoid that the algorithm falls into the local optimal position in the late iteration. Secondly, the improved golden sine algorithm is combined with the process of whale encircling prey to enlarge the search range of the population in the solution space. Finally, the inertial weight and nonlinear parameter adjustment strategy are used to improve the global exploration and local development ability of the algorithm. Through the effectiveness analysis of different improved strategies, comparison analysis with other intelligent algorithms, optimization performance analysis in high-dimensional cases, and Wilcoxon rank sum test, it is proved that MSWOA algorithm has better optimization accuracy and stability.

An arithmetic optimization algorithm integrating sine-cosine strategy

HUANG Xue-yu, LUO Hua

2023, 45(7): 1320-1330. doi:

Abstract ( 524 )

PDF (1169KB) ( 450 ) 　　

This paper proposes an arithmetic optimization algorithm that integrates the sine-cosine strategy to address the problems of low solution accuracy, slow convergence speed, and easy fall into local optima in arithmetic optimization algorithms. The algorithm adaptively adjusts the math optimizer acceleration (MOA) accelerator function based on the change information of individual fitness, balancing the global exploration and local exploitation abilities of the algorithm. The improved sine-cosine algorithm is introduced into the local development stage of the algorithm, increasing the population diversity in the later iterations, avoiding the algorithm from falling into local optima, and effectively improving the solution accuracy and convergence speed of the algorithm. Simulation experiments on 14 benchmark test functions show that the improved algorithm has significant improvements in solution accuracy, convergence speed, and robustness. Finally, the improved algorithm is applied to the optimization of support vector machine (SVM) parameters, and a student knowledge level prediction model is established, which further verifies the practicality and superiority of the algorithm.

Current Issue

Author center

Review center

Online journal