Computer Engineering & Science

A survey of core computing architecture of high performance processors for exascale computing

WU Tie-bin, GUO Feng, WANG Di

2023, 45(05): 761-771. doi:

Abstract ( 361 )

PDF (1526KB) ( 556 ) 　　

High performance computing (HPC) has entered the post exascale era. As the core components of supercomputers, high performance processors provide super computing power for HPCS through the core computing architecture. The research progress of core computing architecture represents the development direction of high performance processor architecture. Aiming at advanced high performance processors for exascale computing, this paper analyzes and discusses the research progress of core computing architecture from the aspects of computing resource organization structure, data and instruction level parallelism, domain-specific acceleration structure, data type and computing power, and predicts the development trend of the core computing architecture for high performance processors. The main research and development directions of the core computing architecture of future high performance processors are ultra-wide vector SIMD and SIMT, domain-specific architecture for matrix computation accelerate, and support of various low precision computation for HPC and AI fusion.

Inference rule learning in autonomous fault management systems

ZHANG Li-li, WANG Rui-bo, WANG Xiao-dong, ZHANG Wen-zhe

2023, 45(05): 772-781. doi:

Abstract ( 148 )

PDF (739KB) ( 323 ) 　　

With the rapid increase of the scale of high-performance computer system, the inherent reliability of the whole system gradually decreases, resulting in the “reliability wall” problem. In order to address this challenge, an autonomous fault management system is designed in Tianhe high performance computer system. It can monitor, analyz and manage alarms, faults and errors of the whole system in real time. Fault messages collected by this autonomous fault management system vertically cover all logical layers of the system and horizontally cover all functional modules of the system. Therefore, there is a logical causal relationship between fault messages, that is, a fault source causes a series of subsequent fault events. In this paper, a fault information inference rule learning algorithm EMRL is proposed. The fault information inference rule is modeled as a probabilistic model. Through this model, fault inference rules are automatically mined from fault information, and the minimum fault inference graph is automatically generated according to the mining results. The validity of EMRL algorithm is verified by partial operation data of Tianhe system. The results show that EMRL can effectively mine the inference relation of fault information.

Automated task allocation of sparse matrix computation based on supervised learning

LI Xiao-ling, FANG Jian-bin, MA Jun, TAN Shuang, TAN Yu-song

2023, 45(05): 782-789. doi:

Abstract ( 203 )

PDF (873KB) ( 417 ) 　　

In this paper, the effects of different task allocation strategies on the performance of sparse matrix and dense vector multiplication are discussed. It is observed that the selection of task allocation strategy can significantly affect the performance of sparse matrix, and there is no fixed task allocation strategy that can obtain the best performance for all sparse matrices. Therefore, this paper proposes an optimal task allocation strategy selection method based on machine learning. Its training process only uses sparse matrix features to characterize the input data set, and can automatically train the model for a given data set and target platform. Experiments show that, compared with the default block allocation method, the task allocation method selected by this model can achieve an average performance improvement of about 35%.

A survey of data center load scheduling based on renewable energy consumption

CHEN Dong-lin, MA Yi-fan, ZOU An-qi,

2023, 45(05): 790-801. doi:

Abstract ( 271 )

PDF (945KB) ( 420 ) 　　

In the social environment of “carbon peak and carbon neutralization” and the economic environment of “carbon cost”, renewable energy with the attribute of zero carbon + low cost has become an important decision-making factor for data center operators to carry out multi-objective management through load scheduling under various internal and external constraints. This paper analyzes the potential and mode of cooperation between renewable energy and data center operation and management, and introduces the data center load scheduling optimization method based on renewable energy consumption. Finally, this paper analyzes, summarizes and compares the data center load scheduling methods based on renewable energy consumption from three aspects: space-based scheduling optimization, time-based scheduling optimization and space-time scheduling optimization.

A fused-layer attention model accelerator based on systolic array

LIU Xiao-hang, JIANG Jing-fei, XU Jin-wei

2023, 45(05): 802-809. doi:

Abstract ( 213 )

PDF (1734KB) ( 405 ) 　　

Attention mechanism has recently shown superior performance in deep neural networks, its computation generates complex data flow and requires high computation and memory overheads. Therefore, customized accelerators are required to optimize the inference computing. This paper pro- poses an accelerator architecture for attention mechanism computation. A flexible partitioning method based on hardware control is used to divide the huge matrices in the attention model into hardware-friendly computing blocks, which realizes the systolic array in accelerator matched by the block computation match. A layer fusion computing structure based on two-step softmax function decomposition is proposed, which effectively reduces the memory access of attention mechanism computation. A fused-layer attention model accelerator based on fine-grained computational scheduling is designed and implemented by HDL. The performance was evaluated based on the XLINIX FPGA device and HLS tool. Compared with the CPU and GPU implementation under the same settings, the delay of accelerator was improved by 4.91 times, the efficiency of accelerator was improved by 1.24 times.

An AADL end-to-end flow specification verification method based on timed automata

BAI Xian-ping, YAO Xi-xin, CHEN Xiang-lan, LIU Chong, LI Xi

2023, 45(05): 810-819. doi:

Abstract ( 207 )

PDF (1271KB) ( 317 ) 　　

Architecture analysis and design language (AADL) is a standard and intuitive real-time system design tool, which provides uniform standards for key steps such as model design, analysis, verification, and automatic code generation. However, using simulation, the verification method of AADL model cannot obtain accurate results of end-to-end flow, especially for real-time systems with dynamic resource allocation. To solve this problem of inaccurate results, AADL is combined with the model checking method to traverse the systems infinite state space. Firstly, the AADL model of the real-time system is converted into a timed automata (TA) model, and the TA is used as the theoretical system for model checking verification. Secondly, the pattern of end-to-end delay requirements is defined, based on the demand classification of the response chain. Finally, the corresponding observer model is implemented according to the pattern and combined with the system model in parallel to reduce the time as well as space resources consumed by the verification algorithm.

Lasso boundary condition:A divergence description guiding goal-conflict resolution

LUO Wei-lin, WAN Hai, YANG Bin-hao, LI Xiao-da, CAO Jian-en, SONG Xiao-tong

2023, 45(05): 820-829. doi:

Abstract ( 160 )

PDF (758KB) ( 273 ) 　　

Goal-conflict analysis of divergences in requirement engineering aims to identify, assess, and resolve divergences. The divergence is caused, because the mismatch between domain attributes and objectives makes the system unable to satisfy all objectives at the same time under boundary conditions. Boundary conditions describe disagreements in the form of linear temporal logic. Due to the lack of interpretability of arbitrary boundary conditions and the extensive manual evaluation and design required to evaluate and repair divergences, the current definition of boundary conditions is not conducive to efficient and automated evaluation and repair of divergences. Therefore, in this paper, an explainable boundary condition, called lasso boundary condition, is proposed. Lasso boundary condition intuitively describes the situation where the system diverges due to some specific preconditions. Then, a lasso boundary condition identification algorithm (LBC identifier, LBCI) based on gradual weakening is designed. LBCI gradually satisfies the boundary conditions by weakening the linear temporal logic formula. The effectiveness of lasso boundary condition and LBCI are evaluated on a baseline data set. The experimental results show that the lasso boundary condition can enhance the interpretability of the divergences and the guiding role in repairing the divergences.

A software defect prediction algorithm based on optimized random forest

TANG Yu, DAI Qi, YANG Zhi-wei, YANG Ai-min, CHEN Li-fang,

2023, 45(05): 830-839. doi:

Abstract ( 264 )

PDF (800KB) ( 449 ) 　　

The traditional random forest application in the field of software defect prediction has the problems of low prediction accuracy and difficulty in parameter optimization, to address these deficiencies, we propose a new software defect prediction algorithm for optimizing random forest parameters with fractional-order variation sparrow (FMSSA-RF). Firstly, the fractional mutation sparrow algorithm is used to improve the global search capability of conventional FMSSA. The FMSSA algorithm has the advantage of faster convergence speed and higher optimization accuracy in the four benchmark functions. Secondly, the Fractional Mutation Sparrow Algorithm is used to optimize the random forest parameters. Finally, the FMSSA-RF algorithm is performed on the field of software defect prediction. The experimental results show that the evaluation index of the FMSSA-RF algorithm is significantly better than that of the other three comparative algorithms on four groups of ten public software defect data sets, which proves that FMSSA-RF algorithm has higher prediction accuracy and better stability. The results of Friedman ranking and Holm’s post-hoc test also show that the FMSSA-RF algorithm has obvious statistical significance.

A traversal multi-target path planning algorithm for unmanned cruise ship

YU Jia-bin, CHEN Zhi-hao, DENG Wei, XU Ji-ping, ZHAO Zhi-yao, WANG Xiao-yi

2023, 45(05): 840-848. doi:

Abstract ( 216 )

PDF (1332KB) ( 392 ) 　　

To solve the problem of traversal multi-goal path planning for unmanned cruise ships, a hybrid path planning method is proposed. This method is divided into two parts. Firstly, the multi-goal path planning problem is transformed into the travel salesman problem and an improved grey wolf optimizer (GWO) algorithm is used to calculate the multi-goal cruise sequence. In view of the unconsidered environmental factors in the traditional GWO algorithm, the environmental impact factors are introduced into the fitness function to reflect the impact of obstacles and unknown areas on multi-goal sequence planning. Secondly, based on the planned goal sequence, the A* algorithm is combined with the improved artificial potential field (APF) method to complete the single-goal path planning between each goal. The goal nonreachable problem of the traditional artificial potential field method is solved by the optimized repulsive potential function. Finally, the comparative simulation experiments with other two algorithms in ordinary and complex environment are carried out. The experimental results verifies the effectiveness of the proposed hybrid algorithm. Through the statistical analysis of experimental results, the proposed hybrid algorithm exhibits better performance than other two methods in terms of distance and time costs, and the effectiveness of the proposed hybrid algorithm is verified..

Facial expression recognition fusing local dynamic features

LIU Nan-yan, WEI Hong-fei, MA Sheng-xiang

2023, 45(05): 849-858. doi:

Abstract ( 182 )

PDF (1421KB) ( 409 ) 　　

Facial expressions are one of the most important ways for humans to express emotions. Because facial expression changes are affected by the movement of multiple facial organs and facial muscles, in order to effectively extract local dynamic features and solve the problem of partial occlusion of facial expressions, a simple and effective deep learning network that integrates local dynamic features is proposed. By introducing the attention network and using the monitored key points of the face, the network is guided to focus on the unobstructed facial area. In the key frame with time sequence, the dynamic information and spatiotemporal information of key areas such as eyes and mouth are extracted to strengthen the connection between different expression features, so as to obtain effective local dynamic features. Finally, local dynamic features are added as a supplement to the overall network. The accuracy of the fusion network on the CK+, Oulu-CASIA, RAF-DB and AffectNet datasets are 98.08%, 90.59%, 86.02% and 61.28%, respectively, which is higher than other methods.

A dynamic graph transformer model for solving CVRP

WANG Yang, CHEN Zhi-bin

2023, 45(05): 859-868. doi:

Abstract ( 249 )

PDF (780KB) ( 650 ) 　　

Capacitated vehicle routing problem is one of the classic combinatorial optimization problems, which has been studied repeatedly for many years. Recently, Transformer has become the dominant deep learning architecture for solving vehicle routing problems. However, traditional positional encoding method is not suitable for extracting location information for dynamic optimization problems. The state of an instance is changed according to the model at different construction steps, and the node features should be updated correspondingly. Therefore, current methods have poor effect on improving learning efficiency. With the goal of minimizing the routing length, a dynamic graph transformer model (DGTM) and a dynamic positional encoding (DPE) method are proposed, and a double-loss REINFORCE algorithm is used to train the DGTM model. In addition, reinforcement learning, graph neural networks and Transformer architecture are combined to improve the training efficiency of the model. It enhances the information representation of the neural network for routing problems with constraints. The experimental results show that the optimization of the model on this problem outperforms current deep reinforcement learning methods and some traditional algorithms. The DGTM model has better overall performance than the professional solver and has good generalization performance, which provides an effective method for solving the combinatorial optimization problems on graphs.

A small object detection algorithm based on improved Faster R-CNN

DENG Shan-shan, HUANG Hui, MA Yan

2023, 45(05): 869-877. doi:

Abstract ( 430 )

PDF (1604KB) ( 751 ) 　　

In order to solve the problem that the high-frequency features such as image detail texture are lost in the process of feature extraction based on the convolutional neural network model to result in poor detection of small object, a target detection algorithm based on multi-layer frequency domain feature fusion is proposed. The algorithm uses the Faster R-CNN algorithm as the basic framework, and uses high-frequency enhanced images and contrast-enhanced images as input samples of the algorithm to improve the detection image quality. For objects with a small area, the scale of anchor point in the RPN network is changed. The multi-scale convolution feature fusion method is used to integrate features from different feature layers to solve the problem that the feature information of small objects is lost in the deep feature map. The experimental results show that the algorithm has good performance on the DAGM 2007 data set and the mAP reaches 97.9%. The algorithm has significantly better mAP for small objects in the PASCAL VOC 2007 data set than the original Faster R-CNN.

An improved YOLOv3 target detection optimization algorithm in dense traffic scenarios

HUO Ai-qing, ZHANG Shu-han, YANG Yu-yan, XU Jing-rong, WANG Ze-wen

2023, 45(05): 878-884. doi:

Abstract ( 194 )

PDF (897KB) ( 458 ) 　　

Aiming at the problem of missed detection and false detection due to high overlap rate of detection targets in traffic congested scenes, a combination optimization algorithm (named L-YOLOv3+CIoU Loss+SD-NMS) containing improved YOLOv3, CIoU loss function optimization, and SD-NMS optimization is proposed. Deep separable convolution, SE module, and Ghost module are used to improve the residual unit structure of YOLOv3, in order to improve the ability of dense target feature extraction and reduce the amount of network model parameters. CIoU Loss is adopted to speed up the network model convergence speed. Meanwhile, the multi-target set prediction idea is combined with DIoU-NMS to propose the SD-NMS optimization algorithm, in order to reduce the false detection rate of missed detection. Experimental results on the BDD100K data set show that the improved target detection algorithm has a recall rate of 91.58% and an accuracy rate of 93.04%. Compared with YOLOv3 algorithm, the proposal improves the recall rate and accuracy by 12.09% and 9.52%, respectively, showing better detection effect.

Curvilinear trajectory detection for photosphere bright points based on multi-scale and multi-modal learning

FANG Xue-shan, YANG Yun-fei, FENG Song

2023, 45(05): 885-894. doi:

Abstract ( 146 )

PDF (1725KB) ( 377 ) 　　

The curvilinear motion of solar photosphere bright points, which is approximate rotation, is of great significance for studying how the energy from solar convection zone is transmitted to the corona. The existing algorithms only detect the global curvilinear motion of photosphere bright points. This paper proposes a multi-scale and multi-modal deep learning method to detect the global and local curvilinear motion of photosphere bright points. This mothod constructs a multi-scale network model based on the bidirectional long short-term memory network (Bi-LSTM) to extract multi-scale time sequence features of the trajectories of photosphere bright points. EfficientNet-B0 is adopted to extract the spatial features of the trajectories. The temporal features and spatial features are fused into multi-modal features to detect various curvilinear motions of photosphere bright points. The experiment results show that the accuracy of this method is 85.08%, which is 6.12% higher than that of the single-scale method and 3.1% higher than that of the multi-scale and single-mode method. This method can also be applied to the motion type detection requirements in other fields.

An entity relation extraction method based on deep learning

Peride Abdurehim, Turdi Tohti, Askar Hamdulla,

2023, 45(05): 895-902. doi:

Abstract ( 186 )

PDF (679KB) ( 341 ) 　　

Commonly used neural networks such as Convolutional Neural Network (CNN) and Recurrent Neural Network (RNN) have shown very good results in relation extraction tasks. However, CNN is good at capturing local features, but it is not suitable for processing sequence features. Traditional RNN can effectively extract features between long-distance words, but it is easy to cause gradient disappearance or gradient explosion. To solve these problems, a hybrid neural network model, called BiLSTM-CNN-Attention, is proposed. The combination of BiLSTM and CNN makes them complement each other, and the introduction of Attention can highlight the importance of inter entity relation words in the whole sentence. In addition, the mosaic word vector is used in the word embedding layer to overcome the problem of single word vector representation. The experimental results show that, compared with word2vec word vector, mosaic word vector can obtain more semantic word vector and enhance the robustness of word vector. Compared with BiLSTM-CNN, CNN-Attention and BiLSTM-Attention models, BiLSTM-CNN-Attention improves the accuracy and F1 value.

A multi-scale semantic collaborative patent text classification model based on RoBERTa

MEI Xia-feng, WU Xiao-ling, HUANG Ze-min, LING Jie

2023, 45(05): 903-910. doi:

Abstract ( 207 )

PDF (717KB) ( 399 ) 　　

For patent text classification, the existing static word vector tools such as word2vec cannot express the context information of words, and most of the models can not completely extract features. Aiming at this problem, a multi-scale semantic collaborative patent text classification model based on RoBERTa, named RoBERTa-MCNN-BiSRU++-AT, is proposed. RoBERTa can learn the context-appropriate dynamic semantic representation of the current word and solve the problem that static word vectors cannot represent polysemous words. The multi-scale semantic collaboration model uses the convolution layer to capture the multi-scale local semantic features of text, and then uses the bidirectional built-in simple attention loop unit to model the context semantics at different levels. The multi-scale output features are spliced, and the key features that contribute more to the classification result are assigned higher weight by the attention mechanism. Experiments were carried out on the patent text data set published by the National Information Center. The results show that, compared with ALBERT-BiGRU and BiLSTM-ATT-CNN, RoBERTa-MCNN-BiSRU++-AT increases the accuracy by 2.7% and 5.1% respectively in patent text classification at the department level, and by 6.7% and 8.4% respectively in patent text classification at the major class level. RoBERTa-MCNN-BiSRU++-AT can effectively improve the classification effect of different levels of patent texts.

Sentiment analysis of Chinese product reviews based on dual-channel gated composite network

DONG Peng-shan, ZHANG Jing, JIN Ri-ze

2023, 45(05): 911-919. doi:

Abstract ( 195 )

PDF (889KB) ( 392 ) 　　

The sentiment analysis task aims to understand and classify the polarity of emotions that people express towards entities and their attributes. In the classification of Chinese text, most of the existing methods have single input feature representation, which makes the models unable to fully learn semantic information. To solve these problems, a dual-channel gated composite network model, named DGCN, is proposed, which uses word vector and character vector as the input of the two channels, which makes up for the defect of word vector caused by the inevitable inaccurate word segmentation and enriches the semantic information. At the same time, the gating mechanism is used to improve the combination mode of channels, so that char vector helps the word vector learn the characteristic information of text better. On each channel, a composite network composed of bidirectional gated recurrent unit network and convolutional neural network is used, so the advantages of the two channels are complementary. The attention mechanism is added to focus on more effective features. The experimental results show that the DGCN model has better accuracy and F1 value in sentiment analysis of Chinese product reviews than the counterparts, and has good application ability.

A beetle antennae search algorithm based on differential evolution strategy and its application

YE Kun-tao, SHU Lei-lei, LI Wen, HOU Chun-ju

2023, 45(05): 920-930. doi:

Abstract ( 202 )

PDF (1171KB) ( 293 ) 　　

Considering that the convergence of beetle antennae search algorithm (BAS) is of highly individual dependence, poor exploration ability and easily falling into local optimal solution, a beetle antennae search algorithm based on differential evolution strategy (BASD) is proposed. The algorithm not only uses the good point set method to initialize the beetle population to enhance the population diversity, but also introduces the concept of dynamic differential evolution to an elite evolutionary competition guidance strategy, which better balances the mining and exploration capabilities of the algorithm. The BASD algorithm is tested on 14 benchmark functions and compared with the optimization results of several advanced algorithms. The results show that the overall optimization performance of the BASD algorithm is better. Finally, the BASD algorithm is applied in image enhancement, and the result shows that the gray distribution of the image enhanced by the BASD algorithm is more uniform and the distribution range is larger.

Conv-WGAIN:Convolutional generative adversarial imputation net for multivariate time series missing data

LIU Zi-jian, DING Wei-long, XING Meng-da, LI Han, HUANG Ye

2023, 45(05): 931-939. doi:

Abstract ( 249 )

PDF (1249KB) ( 428 ) 　　

Gas chromatography data of oil-immersed transformers is a kind of multivariate time series, but such data is often missing due to equipment or network failures. Imputation is usually required to form a complete dataset for further business analysis and research. However, the existing imputation models cannot deal with multivariate time series data conveniently to guarantee the efficiency and effect from the inherent characteristics of temporal irregularity and temporal bidirectionality. In this paper, a model Conv-WGAIN is proposed based on the Generative Adversarial Imputation Nets (GAIN). Through the constructed imputation feature map, 2D convolution can be used to learn temporal bidirectional features and simultaneously deal with irregular time intervals. The Wasserstein distance is introduced in discriminator for judgement to improve the stability of the model. Experiments on gas chromatography datasets from a real project and 3 public datasets show that our work is universal for data imputation on multivariate time series missing, and Conv-WGAIN outperforms other baselines with 20.75% to 73.37% in metric RMSE.

A whale optimization algorithm based on teaching and learning and dimensional Cauchy mutation

FU Jie-di, LI Zhen-dong, GUO Hui

2023, 45(05): 940-950. doi:

Abstract ( 183 )

PDF (2270KB) ( 289 ) 　　

In the face of complex optimization problems, the basic whale optimization algorithm still has problems such as easy to fall into local extremum, slow convergence speed and low calculation accuracy. Therefore, a whale optimization algorithm based on teaching and learning and dimensional Cauchy mutation, named TCWOA, is proposed. Firstly, Sobol sequence is selected to initialize the whale population, which can make the population distribution more uniform. Secondly, the teaching strategy in the teaching and learning algorithm is introduced to replace the random search strategy, avoiding the blindness of search and improving its convergence speed. Thirdly, the dimensional Cauchy mutation with inertia weight is used to perturb the whale's optimal individual, which can make it jump out of the local optimal solution and enhance the global search ability of the algorithm. Finally, the comparative analysis of various optimization algorithms on 10 standard test functions ware carried out. In the application research, TCWOA algorithm first optimize BP network parameters and then predicts Boston housing prices. The results verify the effectiveness and accuracy of the optimization algorithm.

Current Issue

Author center

Review center

Online journal