Computer Engineering & Science

GNNSched: A GNN inference task scheduling framework on GPU

SUN Qing-xiao, LIU Yi, YANG Hai-long, WANG Yi-qing, JIA Jie, LUAN Zhong-zhi, QIAN De-pei

2024, 46(01): 1-11. doi:

Abstract ( 482 )

PDF (1464KB) ( 851 ) 　　

Due to frequent memory access, graph neural network (GNN) often has low resource util- ization when running on GPU. Existing inference frameworks, which do not consider the irregularity of GNN input, may exceed GPU memory capacity when directly applied to GNN inference tasks. For GNN inference tasks, it is necessary to pre-analyze the memory occupation of concurrent tasks based on their input characteristics to ensure successful co-location of concurrent tasks on GPU. In addition, inference tasks submitted in multi-tenant scenarios urgently need flexible scheduling strategies to meet the quality of service requirements for con-current inference tasks. To solve these problems, this paper proposes GNNSched, which efficiently manages the co-location of GNN inference tasks on GPU. Specifically, GNNSched organizes concurrent inference tasks into a queue and estimates the memory occupation of each task based on a cost function at the operator level. GNNSched implements multiple scheduling strategies to generate task groups, which are iteratively submitted to GPU for concurrent execution. Experimental results show that GNNSched can meet the quality of service requirements for concurrent GNN inference tasks and reduce the response time of inference tasks.

Design of convolutional neural network acceleration system based on heterogeneous platform

QIN Wen-qiang, WU Zhong-cheng, ZHANG Jun, LI Fang,

2024, 46(01): 12-20. doi:

Abstract ( 317 )

PDF (878KB) ( 655 ) 　　

Deploying convolutional neural networks (CNN) on embedded devices with limited computing and storage resources poses challenges such as slow execution speed, low computational efficiency, and high power consumption. This paper proposes a novel CNN acceleration architecture based on a heterogeneous platform, and designs and implements a lightweight CNN acceleration system based on MobileNet. Firstly, to reduce hardware resource consumption and data transmission costs, a design method combining dynamic fixed-point quantization and batch normalization fusion is employed to optimize the network model and reduce the hardware design complexity of the acceleration system. Secondly, by implementing convolutional block partitioning, parallel convolutional computation, and data flow optimization, the efficiency of convolutional operations and system throughput are effectively improved. Experimental results on the PYNQ-Z2 platform demonstrate that the MobileNet network inference acceleration scheme implemented by this acceleration system achieves a recognition time of 0.18 seconds per image and a system power consumption of 2.62 watts, representing a 128-fold improvement in acce- leration performance compared to an ARM single-core processor.

Fine-grained memory access monitoring based on memory protection keys

WANG Rui-bo, WU Zhen-wei, ZHANG Wen-zhe, WU Hui-jun, ZHANG-YU Shu-qing, LU Kai

2024, 46(01): 21-27. doi:

Abstract ( 302 )

PDF (961KB) ( 559 ) 　　

Based on memory protection key hardware extensions, a lightweight and fine-grained page protection mechanism is proposed. This mechanism overcomes the technical limitations of traditional page protection methods that only support page-grained memory access monitoring, and achieves fine-grained page protection that can intercept each memory access operation. By fully utilizing the user-level thread-local page access permission control provided by memory protection keys, the performance overhead is reduced by more than 30% compared to traditional page protection. Through the integration of fine-grained page protection and compiler instrumentation, the limitations of traditional compiler instrumentation methods that cannot cover non-recompilable portions of programs are addressed.

Gloo+: Accelerating distributed training of deep learning using in-network computing

HUANG Ze-biao, DONG De-zun, QI Xing-yun

2024, 46(01): 28-36. doi:

Abstract ( 452 )

PDF (1037KB) ( 558 ) 　　

In distributed deep learning training, collective communication is the main communication method. In the research of collective communication optimization, there are software-level optimization and hardware-level optimization. SHARP is a collective communication network offload protocol proposed by Mellanox. It is optimized for collective communication in hardware. It offloads collective ope- rations to switches in the network, thereby shortening the collective communication time. We integrated SHARP technology on the basis of Gloo, and designed and implemented a collective communication library-Gloo+ that can accelerate distributed deep learning training by using in-network computing. Our experimental evaluation of Gloo+ shows that in the benchmark test, when the message size is small, the acceleration ratio of Gloo+ relative to Gloo can reach up to 100 or more. While compared to MPI in Ethernet mode, the acceleration ratio can also reach up to 50 or more. While compared to MPI in IB mode, the acceleration ratio is within 10. In the practical application of distributed deep learning training, the acceleration ratio of Gloo+ can reach a maximum of 1.1 compared to Gloo, 1.3 compared to MPI in Ethernet mode, and 0.5 compared to MPI in IB mode.

A triple-node-upset self-recovery latch using C-element

XU Hui, ZHU Shuo, SUN Hao-jie, MA Rui-jun, LIANG Hua-guo, HUANG Zheng-feng

2024, 46(01): 37-45. doi:

Abstract ( 223 )

PDF (1223KB) ( 436 ) 　　

With the persistent reduction of process size in integrated circuits, latches are also more and more vulnerable to the influence of triple-node-upset caused by particles radiation. Aiming at this problem, a triple-node-upset tolerance and self-recovery MKEEP latch based on C-element with low power consumption, low delay and high robustness is proposed. Simulation experiments and PVT fluctuation experiments show that, compared with other latches with triple-node-upset tolerance or self- recovery capability, the proposed latch has lower power consumption, low delay and area overhead. At the same time, this latch is less sensitive to process, voltage and temperature, and has obvious advan- tages than referenced latches.

Review on security issues of blockchains

SHEN Chuan-nian

2024, 46(01): 46-62. doi:

Abstract ( 498 )

PDF (959KB) ( 931 ) 　　

Blockchain, with its disruptive innovative technology, is continuously changing the operational rules and application scenarios of various industries such as digital finance, digital government, Internet of Things, and intelligent manufacturing. It is an indispensable key technology for building a new trust and value system in the future society. However, due to the defects of its own technology and the complexity and diversity of application scenarios, the security issues of blockchain are becoming increasingly serious. Security has become a major bottleneck restricting the future development of blockchain, and the road to blockchain regulation is arduous. This paper introduces the background know- ledge, basic concepts, and architecture of blockchain. Starting from the architecture of blockchain, it analyzes the security issues and prevention strategies of blockchain from seven aspects: data layer, network layer, consensus layer, incentive layer, contract layer, application layer, and cross-chain. Based on this, it discusses the safety supervision of blockchain from the current situation and difficulties of policy supervision, the establishment of technical supervision standards, innovative methods, and deve- lopment trends.

Adversarial visible watermark attack based on intelligent evolutionary algorithm

JI Jun-hao, ZHANG Yu-shu, ZHAO Ruo-yu, WEN Wen-ying, DONG Li

2024, 46(01): 63-71. doi:

Abstract ( 373 )

PDF (1073KB) ( 518 ) 　　

With the increasing awareness of citizen copyright, more and more images containing watermarks are appearing in daily life. However, existing research shows that images with watermarks can cause neural network misclassification, posing a significant threat to the popularization and application of neural networks. Adversarial training is one of the defensive methods to solve this problem, but it requires a large number of watermark adversarial samples as training data. To address this issue, this paper proposes a visible watermark adversarial attack method based on intelligent evolutionary algorithm to generate high-intensity watermark adversarial samples. This method can not only quickly generate watermark adversarial samples, but also maximize the attack on the neural network. In addition, this method incorporates image quality evaluation metrics to constrain the visual loss of the image, making the watermark adversarial samples more visually appealing. The comprehensive experimental results show that the proposed method has lower time complexity than the benchmark watermark attack method, and has a higher attack rate on neural networks compared to the benchmark black box attack.

A large and mini fountain code model in DNA storage

CUI Jing-song, JIANG Chang-yue, GUO Chi

2024, 46(01): 72-82. doi:

Abstract ( 300 )

PDF (1191KB) ( 662 ) 　　

In application scenarios such as DNA storage, the traditional fountain code algorithm must transmit the number K of source file packets to the decoder through an additional channel. In practical applications, although K can be embedded in each coded data packet to transmit this key parameter, this method will seriously waste the channel's bandwidth. Aiming at the above problems, a large and mini fountain code model is proposed, which optimizes the transmission of critical parameters by adding the out-of-band channel of the mini fountain code. The mini fountain code reduces the granularity of the space occupied by the critical information about the parameter K in each coding group to 1 bit, effectively reducing the consumption of bandwidth resources. In addition, the mini fountain code can also adapt to the restriction of the indefinite length of the coding sequence caused by the inhomogeneity of the DNA storage medium. Under certain conditions, it cannot even occupy additional channel bandwidth at all.

Anonymous authentication and key exchange protocol in intelligent vehicle networks

ZHANG Xiao-jun, TANG Hao-yu, FU Hong, WANG Wen-chen

2024, 46(01): 83-90. doi:

Abstract ( 246 )

PDF (714KB) ( 607 ) 　　

Intelligent vehicular ad hoc networks (VANETs) are the core of intelligent transportation systems, in recent years, it has received increasing attentions from the academic community. However, due to the openness and fragility, VANETs are confronted with many security problems. In order to solve the problems such as two-way authentication between intelligent vehicles and nearby RSUs, exchange of session keys and anonymity of intelligent vehicles, this paper proposes an anonymous authentication and key exchange protocol in the intelligent vehicle networks. In the protocol, an identity-based digital signature algorithm is designed to enable the intelligent vehicle to send authentication information to the nearby road side unit (RSU) in a completely anonymous manner. After the RSU validates the authentication information, a message authentication code will be calculated and sent to the intelligent vehicle as the response to realize two-way authentication. In addition, during the anonymous authentication process, the session key can be negotiated for subsequent secure communication. The protocol is designed based on the identity cryptosystem, which does not need complex certificate management. The performance evaluation shows that this protocol can be effectively deployed in intelligent vehicle application scenarios with highly sensitive information.

A vehicle object detection algorithm in UAV video stream based on improved Deformable DETR

JIANG Zhi-peng, WANG Zi-quan, ZHANG Yong-sheng, YU Ying, CHENG Bin-bin, ZHAO Long-hai, ZHANG Meng-wei

2024, 46(01): 91-101. doi:

Abstract ( 506 )

PDF (1626KB) ( 831 ) 　　

Aiming at the problems of a large number of small targets in UAV video stream detection, insufficient contextual semantic information due to low image transmission quality, slow inference speed of traditional algorithm fusion features, and poor training effect caused by unbalanced dataset category samples, this paper proposes a vehicle object detection algorithm based on improved Deformable DETR for UAV video streaming. In terms of model structure, this method designs a cross-scale feature fusion module to increase the receptive field and improve the detection ability of small objects, and adopts the squeeze-excitation module for object_query to improve the response value of key objects and reduce the missed or false detection of important objects. In terms of data processing, online difficult sample mining technology is used to improve the problem of uneven distribution of class samples in the data set. The experimental results show that the improved algorithm improves the average detection accuracy by 1.5% and the small target detection accuracy by 1.2% compared with the baseline algorithm without detection speed degradation.

A time series image semantic segmentation model modified by optical flow

QIU Xiao-meng, WANG Lin, GU Wen-jun, SONG Wei, TIAN Hao-lai, HU Yu

2024, 46(01): 102-110. doi:

Abstract ( 291 )

PDF (1601KB) ( 718 ) 　　

The development of medical imaging technology has generated a massive amount of medical image data, which reflects the internal structural features of the human body. Medical image segmentation technology can improve the efficiency of medical diagnosis, making it an important assistive tool for modern medical diagnosis. However, noise or artifacts that are inevitably present in the imaging process bring great challenges to the segmentation work. In existing segmentation models, single-frame medical image semantic segmentation models do not consider the relationship between image frames, while video semantic segmentation models utilize temporal information but have some limitations in edge extraction. To address these issues, this paper proposes a U-Net-based temporal semantic segmentation model modified by optical flow. This model can extract optical flow information between consecutive frames and perform feature extraction and weight allocation on the current frame and optical flow for correction. Experiments show that the model obtains optimal results on three evaluation metrics, namely Dice similarity, pixel accuracy and cross-merge ratio, on different types of datasets, namely Drosophila electron micrographs, combined healthy abdominal organ segmentation and coronary angiogram, which validate the effectiveness and generalization of the proposed model.

Human activity recognition based on LoRa devices

CUI Hao, WAN Ya-ping, ZHONG Hua, NIE Ming-xing, XIAO Yang

2024, 46(01): 111-121. doi:

Abstract ( 252 )

PDF (229KB) ( 449 ) 　　

Abstract: In recent years, many sensor models based on LoRa devices have verified the long-distance sensing potential of LoRa devices, but the use of feature-blurred LoRa wireless signals to identify human activities still requires further research. This paper analyzes the propagation law of LoRa signals affected by human activities, and proposes a LoRa signal processing method to extract signal change features. Subsequently, data are collected to create two LoRa datasets that record human activities, and the proposed method is tested through advanced deep learning models. The accuracy of recognizing activity types, activity roles in a room, activity roles, and activity rooms in four rooms reaches over 90%. Compared to the method of using convolutional recurrent neural networks for direct training, it is also more time-saving and spatial resource-saving.

Combining coordinate attention and generative adversarial network for image super-resolution reconstruction

PENG Yan-fei, MENG Xin, LI Yong-xin, LIU Lan-xi

2024, 46(01): 122-131. doi:

Abstract ( 231 )

PDF (1195KB) ( 488 ) 　　

An image super-resolution reconstruction model combining coordinate attention and generative adversarial networks is proposed to address the problems of inadequate utilization of feature information, weak judgment of local details by VGG discriminators, and unstable training in the existing image super-resolution reconstruction model of generative adversarial networks. Firstly, a generator is constructed with residual blocks embedded with coordinate attention to aggregate features along both channel and spatial dimensions to extract features more adequately. The Dropout is also adjusted to join the network in such a way that it acts in the generator to improve the generalization ability of the model. Secondly, the discriminator is constructed with U-Net structure to output detailed pixel-by-pixel feedback to obtain the local difference between the true and false images. Finally, spectral normalization regularization is introduced into the discriminator to stabilize the training of GAN. The experimental results show that when the amplification factor is 4, the peak signal-to-noise ratio obtained on the benchmark test sets Set5 and Set14 is increased by 1.75 dB on average, and the structural similarity is increased by 0.038 on average, which can reconstruct clearer and more realistic images with good visual effects.

A focally discriminative loss for unsupervised domain adaptation method

WANG Shan-shan, WANG Meng-zhu, LUO Zhi-gang

2024, 46(01): 132-141. doi:

Abstract ( 255 )

PDF (865KB) ( 564 ) 　　

The maximum mean discrepancy (MMD), as a representative distribution metric between source domain and target domain, has been widely applied in unsupervised domain adaptation (UDA), where both domains follow different distributions, and the labels from source domain are merely available. However, MMD and its class-wise variants possibly ignore the intra-class compactness and inter-class separability, thus reducing discriminability of feature representation. This paper proposes a focally discriminative loss for unsupervised domain adaptation. This method endeavors to improve the discriminative ability of MMD from two aspects: (1) the weights are re-designed for MMD in order to align the distribution of relatively hard classes across domains; (2) a focally contrastive loss is explored to tradeoff the positive sample pairs and negative ones for better discrimination. The integration of both losses can not only make the intra-class features close, but also push away the inter-class features far from each other. Moreover, the improved loss is simple yet effective, and it can be extended to the network structure of the attention mechanism. Experiments on several domain adaptation datasets verify the effectiveness of the proposed method.

A multi-scene adaptive A* algorithm based on fitting-first search

SHEN Ke-yu, YOU Zhi-yu, LIU Yong-xin

2024, 46(01): 142-149. doi:

Abstract ( 226 )

PDF (1086KB) ( 460 ) 　　

Aiming at the problems of large number of traversal nodes, large turning angle, and slow search speed of the traditional A* algorithm, a multi-scene adaptive improvement A* algorithm based on fitting-first search is proposed. Firstly, the heuristic distance of the parent node is introduced to reduce the number of traversal nodes and improve the search speed, the scene map information is quantified, and the adaptive control principle is used to achieve the timely adjustment of the heuristic weight to enhance the robustness of the algorithm. Secondly, the heuristic strategy of fitting-first search is used to further enhance the heuristic of the algorithm. Thirdly, the path is smoothed through local pruning and redundant node deletion to reduce the number of traversed nodes and the turning angle. Finally, a simulation test is carried out on Matlab, and the test results show that the proposed algorithm has fewer traversed nodes, smaller turning angle, and faster search speed.

A clustering method based on algebraic granularity

XIAO Zhen-guo, CHEN Lin-shu, SUN Shao-jie, MEI Ben-xia, LIU Yuan-hui, ZHAO Lei

2024, 46(01): 150-158. doi:

Abstract ( 163 )

PDF (675KB) ( 391 ) 　　

Clustering is the main task of machine learning, and is also the core work of granular computing, namely information granulation. At present, most of granular computing based clustering algorithms only utilize the granule features without taking the granule structure into account, especially in the information field where algebraic structure is widely used. From the perspective of granular computing, this paper proposes a clustering method based on algebraic granularity (CMAG). Firstly, the algebraic granularity is newly formulated with the granule structure of an algebraic binary operator. Se- condly, the CMAG is proposed with granules of incorporating congruence partition and granule structure of homeomorphic projection. Finally, the CMAG is experimentally compared with the tolerance domain model and the quotient space model, and the results show that the CMAG has better structural completeness and practical robustness. The CMAG can enrich and extend the granular computing theory from granule structure, and will provide a theoretical basis for the combination of granular computing methods and machine learning theory.

Research on path optimization of express terminal location based on hybrid heuristic algorithm

SUN Rui-nan, CHU Xiang, CHEN Yu, YAN Ming-ning

2024, 46(01): 159-169. doi:

Abstract ( 294 )

PDF (840KB) ( 497 ) 　　

The traditional express terminal distribution mode has problems such as redundant construction of express outlets and overlapping delivery paths, and the joint distribution model can effectively solve these problems. Therefore, this paper studies the location path of express terminal outlets in the case of simultaneous receiving and dispatching and uncertain receiving demand under the joint distribution model. Firstly, a two-stage mathematical optimization model is established to deal with the problem of uncertain receipt volume by introducing random chance constraints. Secondly, a hybrid heuristic algorithm based on genetic algorithm and adaptive large neighborhood search algorithm is designed. Finally, numerical experiments show that the designed hybrid algorithm has a faster convergence speed and better solution quality than the traditional genetic algorithm. Too high or low risk acceptance of the optimization scheme in the random demand environment will lead to the increase of cost. With the increase of the ratio of customer receiving and dispatching volume, the cost of express terminal distribution first decreases and then increases, The nearest outlet return strategy can effectively reduce the distribution cost of enterprises.

Cross-lingual AMR parsing based on unsupervised pre-training

FAN Lin-yu, LI Jun-hui, KONG Fang

2024, 46(01): 170-178. doi:

Abstract ( 265 )

PDF (640KB) ( 403 ) 　　

AMR (Abstract Meaning Representation) abstracts the semantic features of a given text into a single-root directed acyclic graph. Due to the lack of non-English language AMR datasets, cross-lingual AMR parsing aims to parse non-English text into the corresponding AMR graph of its English translation. Current cross-lingual AMR parsing methods rely on large-scale English-target language parallel corpora or high-performance English-target language translation models to build (English, target language, AMR) triplet parallel corpora for target language AMR parsing. In contrast to this assumption, this paper explores the possibility of achieving cross-lingual AMR parsing with only large-scale monolingual English and target language corpora. To this end, we propose cross-lingual AMR parsing based on unsupervised pretraining. Specifically, during pretraining, we integrate unsupervised neural machine translation tasks, English AMR parsing tasks, and target language AMR parsing tasks. During fine-tuning, we use an English AMR2.0-based target language AMR dataset for single-task fine-tuning. Experimental results on AMR2.0 and a multilingual AMR test set show that our method achieves Smatch F1 scores of 67.89, 68.04, and 67.99 in German, Spanish, and Italian, respectively.

Multi-domain sentiment analysis of Chinese text based on prompt tuning

ZHAO Wen-hui, WU Xiao-ling, LING Jie, HOON Heo

2024, 46(01): 179-190. doi:

Abstract ( 359 )

PDF (1348KB) ( 663 ) 　　

The expression of sentiment texts in different domains are different, so it is usually necessary to train the corresponding sentiment analysis model for each domain. In order to solve the problem that one model cannot be used for multi-domain sentiment analysis, this paper proposes a multi-domain text sentiment analysis method based on prompt tuning, called MSAPT. With the help of hard prompts, indicating the domain of the emotional text and the selected emotional labels, the model is prompted to draw on its knowledge of different domain sentiment analysis. Then, a unified "generalized model" is pretrained for sentimental analysis. In downstream learning of various domain texts, the model is frozen and prompt tuning is used to make the model learn the characteristics of emotional text in each downstream domain. MSAPT only requires saving a model and some prompts with far fewer parameters than the model for multi-domain sentiment analysis. Experiments were conducted using multiple datasets of emotional text in different fields, and the results show that MSAPT outperforms model fine-tuning when only prompted tuning is applied. Finally, the length of prompt tuning, hard prompt adapted to specific domains, soft prompt and the size of intermediate training dataset are ablated respectively, to prove their impact on the effectiveness of sentiment analysis.

Current Issue

Author center

Review center

Online journal