Computer Engineering & Science

State of the art analysis of China HPC 2023

ZHANG Yun-quan, DENG Li, YUAN Liang, YUAN Guo-xing

2023, 45(12): 2091-2098. doi:

Abstract ( 693 )

PDF (979KB) ( 1058 ) 　　

In this paper, according to the latest China HPC TOP100 rank list released by CCF TCHPC in the late November, the total performance trends of China HPC TOP100 and TOP10 of 2023 are presented. Followed with this, characteristics of the performance, manufacturer, and application area are analyzed separately in detail.

A Clos network based high-radix router structure

SHI De-jun, LI Hong-liang, HU Shu-kai

2023, 45(12): 2099-2112. doi:

Abstract ( 342 )

PDF (2670KB) ( 546 ) 　　

The router is a key component of the high performance interconnect network, which can flexibly build a topology with low network diameter, rich routing path and high fault tolerance performance. The hierarchical structure divides the entire router into multiple small subcrossbars. The typical number of sub-crossbar switches is equal to the number of router ports, and each sub-crossbar switch corresponds to an input and output port. The input and output of each subcrossbar are equipped with buffers, resulting in a large number of buffers inside the hierarchical router that limits the scalability. The network structure will be used to build the network topology of the system in a chip, such as connecting smaller switches through a grid, a full interconnect, or a fat tree implemented in a router through integrated circuit technology, which externally appears as a high radix router. Network structures have low costs and require consideration of not only the system network topology’s performance after building the system network but also the router’s own routing issues. This paper proposes a hierarchical structure router based on Clos network, combining the advantages of high performance and low cost of traditional hierarchical structure, and proposes two schedule algorithms for Clos network. In uniform traffic mode, they approach 100% bandwidth utilization, and the RTL synthesis evaluation achieves a maximum area saving of 25.9%.

RISC-V based design of graph convolutional neural network accelerator

ZHOU Li, ZHAO Zhi-qiao, PAN Guo-teng, TIE Jun-bo, ZHAO Wang

2023, 45(12): 2113-2120. doi:

Abstract ( 378 )

PDF (974KB) ( 668 ) 　　

Graph Convolutional Networks (GCN), an algorithm for processing non-Euclidean data, is currently mainly implemented on deep learning frameworks such as PyTorch for GPU acceleration. GCN's computation process involves nested matrix multiplication and data access operations, which can be satisfied by GPU in real-time but have high deployment costs and low energy efficiency. To improve the computational performance of GCN algorithm while maintaining software flexibility, this paper proposes a custom GCN accelerator based on RSIC-V SoC, which extends the dot product operation and hardware accelerator through hardware-software co-design in the hummingbird E203 SoC platform. The neural network parameter analysis determines the hardware quantization scheme from floating point to 32-bit fixed point. Experimental results show that the proposed accelerator has no accuracy loss and can achieve a maximum speedup of 6.88 times when running GCN algorithm on Cora dataset.

Loop permutation and auto-tuning under polyhedral model

PENG Chang, LIU Qing-zhi, CHEN Chang-bo,

2023, 45(12): 2121-2134. doi:

Abstract ( 337 )

PDF (1940KB) ( 577 ) 　　

Aiming at improving the performance of the default loop scheduling and tile size of Pluto, a commonly used polyhedral compiler, this paper proposes a method to compute a variety of legal permutations for its default scheduling and auto-tune its performance according to the configuration space composed of permutations and tile sizes. Through the processing of scalar dimension that defines loop fusion, both intra and inter permutations for imperfect loop nest are realized. Four machine learning driven auto-tuning strategies are proposed to find the optimized combination of permutation order and tile size for a loop with a given problem size. Under the default tile size, the optimal permutation gene- rated by the extended Pluto compiler in a parallel environment achieves a maximum speedup of 4.02 and a geometric mean of 2.12 compared with the default scheduling of Pluto. By further searching for a better combination of permutation order and tile size, the best auto-tuning strategy achieves a maximum speedup of 5.48 and a geometric mean of 2.86 compared with Pluto's default optimization in a parallel environment. In addition, the best configuration and the learned model obtained by auto-tuning for a particular problem size, when being applied to similar problem sizes, also outperform the default optimization of Pluto in various degrees.

Key techniques and practice on managing multi-site HPC clusters for university campus

ZHANG Tian-yang, CHI Cheng-yue, GUO Wu, GAO Yi-qin, WEN Min-hua, WEI Jian-wen

2023, 45(12): 2135-2145. doi:

Abstract ( 272 )

PDF (1155KB) ( 602 ) 　　

With the growth and expansion of high-performance computing businesses, external factors such as data center space and power supply capacity often become constraints on cluster expansion and upgrading, resulting in the need for the construction of multi-site high-performance computing (HPC) clusters. Multi-site HPC cluster can break through the geographical limitations of a single cluster and provide more computing resources. Based on the practice of SJTU-computing platform, this paper summarizes the unified management methods of infrastructure and system software, as well as the high availability design of cluster remote disaster tolerance, including: adaptive Slurm job scheduling system and Open OnDemand visual portal site, extending high availability capabilities for LDAP and other basic services, and building a hierarchical aggregation monitoring system. Finally, this paper demonstrates the effectiveness of remote supercomputing cluster solutions from three dimensions: data transmission, user experience, and platform high availability.

A Verilog code verification method based on C program analysis and verification techniques

DENG Xi, FAN Guang-sheng, CHEN Li-qian, LI Tun, WANG Ji

2023, 45(12): 2146-2154. doi:

Abstract ( 329 )

PDF (687KB) ( 568 ) 　　

Traditional hardware verification methods synthesize RTL designs into gate-level netlists and use SAT solvers for verification, without effectively leveraging their word-level structure, resulting in the inability to verify some properties. In recent years, software analysis and verification techniques and SMT solving technology have made significant progress. To migrate the latest software analysis and verification techniques to hardware verification, a Verilog code verification method based on C program analysis and verification technology is proposed. First, a Verilog-to-C translation system based on integrated semantics is designed, and then typical techniques and tools in the current software analysis and verification field are used to analyze and verify the converted C program, in order to determine whether the original Verilog code satisfies the property assertion. Experimental results demonstrate the feasibility and effectiveness of migrating C program analysis and verification techniques to Verilog code verification.

Research on robust speech recognition technology based on domain knowledge

WANG Fei-fei, BEN Ke-rong, ZHANG Xian

2023, 45(12): 2155-2164. doi:

Abstract ( 218 )

PDF (863KB) ( 533 ) 　　

Due to the decrease in accuracy of speech recognition software in noisy environments, a robust enhancement method based on domain knowledge is proposed to ensure the safety of using speech control operations. Taking ship control as the application background, a domain knowledge graph is established for ship control. Ship control commands are extracted from nautical books and classic naval warfare film and television materials, and a Chinese speech dataset for ship control commands is constructed. A domain knowledge-embedded decoding method is proposed to correct the output control commands by calculating the matching degree between the recognition result and the domain knowledge graph. Experimental results show that compared with the current popular connection time sequence classification decoding method and attention mechanism-based decoding method, the proposed decoding method reduces the word error rate by 4.0% and 1.5% when recognizing noisy speech with a signal-to-noise ratio of 10dB and 20dB, respectively, and improves the accuracy of command recognition by 10.3% and 6.3%, respectively, improving the robustness of the speech recognition model in recognizing Chinese commands.

ShadowDB: A SQL engine based traffic-split system for full-link stress test

JIANG Jun, LI Wen-hui, ZAHNG Liang, WANG Shan-min, LI Rui-yuan

2023, 45(12): 2165-2174. doi:

Abstract ( 283 )

PDF (941KB) ( 1117 ) 　　

Full-link stress test, as an emerging software testing technique, performs stress tests in the production system directly. It aims at evaluating the performance of online systems accurately. Traffic-split techniques based on shadow databases can guarantee the production data not to be polluted during the process of full-link stress test. Based on SQL engine, this paper designs and implements a complete open-source traffic-split system, named ShadowDB, for full-link stress test. The main idea of ShadowDB is to split the traffic of user requests through a SQL parser and a SQL router. Currently, ShadowDB can correctly distribute all SQL statements of six different kinds of relational database management systems. Furthermore, it can support two traffic-split algorithms, i.e., column-based shadow algorithm and hint-based shadow algorithm. ShadowDB implements all of the interfaces of JDBC, enabling online systems to incorporate it without any change. ShadowDB can be embedded in the application programs, thus it do not forward the requests through networks, which has the minimal impact on the request efficiency and guarantees the reliability of full-link stress test. Extensive experiments wire conducted based on two widely-used benchmarking tools. The experimental results show that ShadowDB performs much better than the comparing systems.

Retinal vessel segmentation based on multi-scale attention feature fusion network with dual-decoder structure

ZHANG Wen-hao, QU Shao-jun

2023, 45(12): 2175-2185. doi:

Abstract ( 327 )

PDF (1305KB) ( 605 ) 　　

To solve the problem of irregular and difficult segmentation of blood vessels in fundus retinal images, a multi-scale attention feature fusion network model based on a dual-decoder structure is proposed to achieve accurate segmentation of retinal blood vessels. The dual decoder branch network structure can reduce information loss. In the encoder, the multi-scale attention feature fusion module is designed to extract rich multi-scale features and the spatial attention module is combined to enhance the extraction of spatial context information and improve vascular recognition ability. Squeeze-and-excitation module is used to optimize aggregated features, suppress irrelevant feature channels and improve the comprehensive segmentation ability of the model. The experimental results on the DRIVE and CHASEDB1 data sets show that the recall rate can reach 0.841 1 and 0.855 1 respectively, making great progress compared with some advanced networks at present, with the maximum increase of 6.6% and 8.25% respectively.

Image adversarial cascade generation via coupling word and sentence-level text features

BAI Zhi-yuan, YANG Zhi-xiang, LUAN Hong-kang, SUN Yu-bao,

2023, 45(12): 2186-2196. doi:

Abstract ( 195 )

PDF (1527KB) ( 441 ) 　　

Text-to-image generation aims to generate realistic images from natural language descriptions, and is a cross-modal analysis task involving text and images. In view of the fact that the generative confrontation network has the advantages of realistic image generation and high efficiency, it has become the mainstream model for text generation image tasks. However, the current methods often divide text features into word-level and sentence-level training separately, and the text information is not fully utilized, which easily leads to the problem that the generated image does not match the text. In response to this problem, this paper proposes an image confrontation cascade generation model (Union-GAN) that couples word-level and sentence-level text features, and introduces a text-image joint perception module (Union-Block) in each image generation stage. By combining channel affine transformation and cross-modal attention, it fully utilizes the word-level semantic and overall semantic information of the text to generate images that not only match the text semantic description but also maintain clear structures. Meanwhile, jointly optimizing the discriminator and adding spatial attention to the corresponding discriminator allows the supervisory signal from the text to prompt the generator to generate more relevant images. Compared with multiple current representative networks such as AttnGAN on the CUB-200-2011 dataset, experimental results show that the FID score of our Union-GAN is 13.67, an increase of 42.9% compared to AttnGAN, and the IS score is 4.52, an increase of 0.16.

LPD-YOLO:Lightweight obscured pedestrian detection model

LIANG Xiu-man, ZHOU Jia-run, YANG Ruo-lan

2023, 45(12): 2197-2205. doi:

Abstract ( 510 )

PDF (1190KB) ( 572 ) 　　

In the driving scenario, due to the occlusion between pedestrians and their scale variations, detection model have low accuracy, high model parameters, and difficulty in deploying to mobile terminals. This paper proposes a lightweight real-time pedestrian detection model, LPD-YOLO, based on the YOLOv5s model. Firstly, in the feature extraction part, the original backbone network is replaced with MES Net (Mish-Enhanced Shuffle Net), and an attention module SA (Shuffle Attention) is embedded in the backbone network to enhance network feature extraction ability. Secondly, in the feature fusion part, the original PANet is improved by using the DS-ASFF structure to fully fuse feature maps of different sizes. Then, standard convolution is replaced with GS convolution in the feature- covergent network part without affecting accuracy, further reducing model parameters and computation. Finally, in the prediction part, the original loss function is improved by using the OTA label assignment strategy combined with α-IOU to accelerate model convergence. Experimental data show that compared with YOLOv5s, LPD-YOLO has 81.2% fewer parameters, 46.3% lower floating-point operation volume, 75.8% smaller model size, and 3.3% higher detection accuracy. The single image detection speed is 13.2 ms, which better meets the real-time detection requirements of dense pedestrians in driving scenarios.

A lightweight white blood cells image recognition model based on improved EfficientNet

LIU Huan, WU Liang-hong, CHEN Liang, ZHOU Bo-wen

2023, 45(12): 2206-2215. doi:

Abstract ( 250 )

PDF (1058KB) ( 434 ) 　　

Most white blood cells (WBCs) recognition models present the disadvantages such as limited deployment due to large parameter count and computation amount, low WBC recognition accuracy, and poor generalization ability. Therefore, a lightweight and efficient WBCs recognition model based on improved EfficientNet is proposed. Firstly, the main modules are streamlined to reduce the model parameter count, while jump connections between feature layers are added to ensure a complete information flow. Secondly, the main module is optimized by adding the improved efficient channel attention and selecting a more suitable DropBlock2D. The improved module makes the model capture more channels and detail features, thus improving the recognition accuracy and generalization ability. Finally, the model is trained by a cross-entropy loss function with label smoothing to accelerate the convergence of the model and further enhance the generalization ability further. The experimental results show that the number of parameters of the improved model is 2.49M, which is 1.11M less than that before the improvement, simplifying the complexity of the model. The improved model achieves 99.67% accuracy in the classification task for the mixed dataset, which is 0.37% better than before the improvement. In addition, the model achieves 100.00% accuracy in the classification of the public dataset BCCD2, which is higher than the existing WBCs recognition models, verifying that the model has high accuracy and good generalization ability while maintaining lightweight computation.

A multi-strategy fusion artificial hummingbird algorithm

LIU Yan, ZHANG Jiao, JIANG Sheng-teng, PAN Xiao-qian, ZHAO Hai-tao, Wei Ji-bo

2023, 45(12): 2216-2225. doi:

Abstract ( 284 )

PDF (856KB) ( 650 ) 　　

In order to improve the convergence speed and solution accuracy of the basic artificial hummingbird algorithm, a multi-strategy fusion artificial hummingbird algorithm is proposed. Firstly, the hummingbird position is initialized by the chaotic reverse communication strategy to improve the diversity of the initial population. Secondly, to coordinate global exploration and local search, the probability dynamic adjustment function is designed to control the guided foraging and regional foraging behaviors of hummingbirds, and the adaptive spiral is introduced to improve the migration foraging behavior. Finally, the location of the optimal hummingbird is disturbed by the Cauchy Gaussian mutation strategy to improve the algorithm's ability to jump out of the local optimum. Finally, 9 benchmark functions are chosen to evaluate the proposed algorithm in simulation experiments, which are compared with the other five latest optimization algorithms. Simulation results show that the proposed algorithm has a faster convergence speed, higher accuracy, and stronger stability.

Bi-modal music genre classification model MGTN based on convolutional attention mechanism

JIAO Jia-hui, MA Si-yuan, SONG Yu, SONG Wei

2023, 45(12): 2226-2236. doi:

Abstract ( 289 )

PDF (1185KB) ( 582 ) 　　

In the field of music information retrieval (MIR), classification according to music genres is a challenging task. Traditional audio feature engineering methods requires manually selecting and extracting music signal features for processing, resulting in complex feature extraction process, unstable model performance and poor generalization. The method combining deep learning with spectrogram also has some problems such as unsuitable model for some data and difficulty in global feature extraction. This paper proposes a music genre classification model based on convolutional attention mechanism, called MGTN. MGTN combines two music genre classification methods: input spectrogram and audio signal feature extraction, to construct audio time series data, which greatly improves the model's ability to extract features and generalization, and provides a new idea for music genre classification. Experimental results on GTZAN and Ballroom datasets show that the MGTN model can effectively fuse input data from two different modalities. Compared with dozens of benchmark models, the MGTN model has strong advantages.

Robot path planning of goal-directed Bi-RRT based on information inspiration

LI Zhong-hua, YUAN Jie, GUO Zhen-yu

2023, 45(12): 2237-2245. doi:

Abstract ( 189 )

PDF (2768KB) ( 494 ) 　　

Aiming at the problems of low efficiency and rough path planning due to the randomness and blindness of node expansion in Bi-directional rapidly-exploring random tree algorithm, this paper proposes a goal-directed Bi-RRT algorithm based on information inspiration. In order to reduce the randomness and blindness of node expansion, the tree node expansion method is optimized. The node information generated by regression analysis is used to optimize the extended node evaluation function to strengthen the target tropism of node growth, and the expansion direction is constrained by node and environmental cost. The redundant nodes in the initial path are eliminated by branch and bound method, and the path satisfying the maximum steering Angle constraint is obtained. The cubic B-spline curve is used to smooth the path to improve the smoothness and continuity of the path. Finally, the proposed algorithm is compared with other classical algorithms in different environments based on the MATLAB simulation platform, and the experimental results verify the effectiveness and enforceability of the proposed algorithm.

A multi-aspect oriented dual channel knowledge-enhanced graph convolutional network model

CHEN Jing-jing, HAN Hu, XU Xue-feng

2023, 45(12): 2246-2255. doi:

Abstract ( 225 )

PDF (943KB) ( 439 ) 　　

Aspect-based sentiment analysis is a fine-grained sentiment analysis task, which aims to align aspects with the corresponding emotion words for aspect specific emotion polarity reasoning. In recent years, the graph neural network sentiment classification method based on syntactic dependent information has become a research hotspot in this field. However, due to the flexibility of comment sentences in content expression and syntactic structure, the modeling method using only syntactic dependent information still has some shortcomings. In order to enhance the comment sentences by affective knowledge and structural semantic information, a convolutional network model(DualSyn-GCN) of two channel knowledge enhancement graph is proposed. On one hand, the syntactic dependency adjacency matrix is enhanced according to the implicit relationship between aspect and aspect as well as aspect and context. On the other hand, the emotional dependency of aspect is learned from external emotional knowledge, and then the two different enhanced representations are fused to realize the sharing and complementarity between different representations. The experimental results show that, compared with the classical aspect based graph convolutional network model (ASGCN), this model improves the accuracy and MF1 value on LAP14 data set by 2.34% and 3.26% respectively.

A multi-clause dynamic deduction algorithm based on clause activity and complexity and its application

LIN Ling-yu, CAO Feng, YI Jian-bing, FANG Wang-sheng, LI Jun, WU Guan-feng

2023, 45(12): 2256-2264. doi:

Abstract ( 208 )

PDF (540KB) ( 458 ) 　　

First-order logic automated theorem proving is an important research content in the fields of knowledge representation and automated reasoning. How to effectively select clauses to participate in deduction is a research hotspot to improve the capability and efficiency of automated reasoning. Based on the good de-duction characteristics of multi-clause dynamic deduction, this paper proposes a measure and calculation method of clause activity and complexity by analyzing the properties of the variable terms and the structure of function terms of clauses, which can effectively evaluate clauses with different term structures. Based on the clause evaluation method, a multi-clause dynamic deduction algorithm with fully synergized deduction of clauses is proposed, which can effectively optimize the multi-clause deduction search path. The multi-clause dynamic deduction algorithm is applied to the international top prover Eprover 2.6, and the 2021 international automated reasoning competition problems (FOF group) is used as the test object. Within the standard 300 seconds, Eprover 2.6 with the proposed multi-clause dynamic deduction algorithm proven four more theorems than the original Eprover 2.6, and the average proof time is decreases by 1.12 seconds under the condition of proving the same number of theorems as Eprover 2.6. In addition, it can prove 16 unproven theorems of Eprover 2.6, accounting for 15.1% of the total unproven theorems. The experimental results show that the proposed multi-clause dynamic deduction algorithm is an effective inference method, which can improve the capability and time efficiency of automated reasoning to a certain extent.

Personal credit risk prediction with multi-scale deep feature fusion

CHEN Gong, LI Zhan-li, ZHU Li

2023, 45(12): 2265-2273. doi:

Abstract ( 287 )

PDF (721KB) ( 731 ) 　　

With the development of credit business in China, assessing the default risk of each loan has become a crucial task. Due to the complex internal relationships among different features in financial credit data, the effectiveness of traditional machine learning methods and ensemble learning methods relies on feature selection, while ignoring the internal relationships of data, and feature selection may also cause data loss. To solve the above problems, a feature extractor based on multi-scale deep feature fusion is proposed. Firstly, multi-scale convolution is applied to one-dimensional data to fully extract the internal relationships between features and perform attention fusion to obtain more critical features. Then, an ensemble learning XGBoost classifier is used to classify deep abstracted features and obtain the prediction results. Experimental results show that the multi-scale deep feature fusion approach can better predict personal credit risk under the real data set. The values of AUC and KS are both increased, in comparison to the XGBoost model and traditional machine learning methods.

Node classification research based on meta-learning and graph filter

WANG Ying, CHEN Wen-qi, HAN Yao-chen

2023, 45(12): 2274-2280. doi:

Abstract ( 196 )

PDF (532KB) ( 399 ) 　　

深度学习在提取数据特征方面取得了巨大的成功，尤其是在处理节点间关系信息丰富的图数据时，通过在频域上使用图滤波器进行图卷积操作，设计出了多种图神经网络。这些图神经网络主要关注设计固定的滤波器或学习简单的滤波器，但这种对滤波器的简化可能会导致滤波器不能适用于所有的图数据。为了解决上述问题，提出了一种基于元学习和图滤波器的节点分类模型MGCN，以提高图滤波器的普适性。模型利用元学习为图卷积神经网络（GCN）的滤波器学习了一组初始化权重，在对滤波器的权重进行微调之后，模型可以快速地适应新任务。为了验证MGCN的有效性，在6个基线数据集上进行了大量实验。实验结果表明，提出的模型相比于传统图神经网络模型可以适用于更加广泛的图数据。

Current Issue

Author center

Review center

Online journal