High Performance Computing
-
Research on multi-protocol support technology and standardization in Chiplet interconnection interface
- HE Xingyang1, 2, ZHOU Hongwei1, 2, ZHOU Yuxuan1, 2, SUN Yubo3, LI Mengjin1, 2
-
2025, 47(9):
1521-1534.
doi:
-
Abstract
(
239 )
PDF (5213KB)
(
186
)
-
Chiplet integration has emerged as an effective approach to overcoming the limitations of chip fabrication, memory bandwidth, power consumption, and scalability in the post-Moore era. Establishing standardized Chiplet interconnection interfaces is a prerequisite for heterogeneous Chiplet integration, significantly simplifying Chiplet adaptation, improving the reusability of Chiplets and interconnect interfaces, and accelerating multi-Chiplet SoC design. Since different types of Chiplets adopt varying protocol standards at the protocol layer, Chiplet interconnection interface must support multiple protocols. To address this, we propose a multi-protocol support technology based on broad categorization, dividing protocols into two classes according to their packet characteristics: fixed-pattern protocols and stream-pattern protocols. This technology directly supports protocols conforming to these two categories, while indirectly supporting non-conforming packet types through a “native mode”. Additionally, it enables concurrent execution of any two protocols and provides enhanced support for CXL and UCIe. The technology improves indirect protocol support efficiency via micro-packet-level compatibility and achieves complete decoupling between the protocol and adapter layers by delegating link management information embedding in data payloads to the protocol layer. Based on this, we designed a Chiplet interconnection interface supporting concurrent PCIe and CXL.mc protocols and established a simulation verification environment. Experimental results confirm the feasibility and correctness of the proposed technology in multi-protocol support and concurrent protocol execution.
-
A job failure identification method of heterogeneous supercomputing platforms based on semantic analysis of multi-source logs
- HU He1, ZHAO Yi1, GU Beibei1, 2, ZHAO Yunqing1
-
2025, 47(9):
1535-1543.
doi:
-
Abstract
(
118 )
PDF (1179KB)
(
96
)
-
This paper presents a method for detecting job anomalies in large-scale distributed HPC heterogeneous platforms.Analyzing job runtime logs is vital for detecting anomalies,but the sheer volume of logs hinders human comprehension.To address this,we introduce a multi-source log semantic analysis approach using latent Dirichlet allocation (LDA) to analyze logs from various sources.By modeling topic evolution over time and matching with historical faulty job patterns,it predicts anomalies.Experiments on a domestic HPC platform show 95.2% precision,enhancing predictive capability and aiding users and administrators in quickly diagnosing issues,thereby improving HPC environment availability and efficiency.
-
Optimization of ILU decomposition parallel algorithm on MIMD many-core architecture
- SHI Yongzhen1, 2, MO Haotian1, 2, HU Xingyu1, 2, LIU Jie1, 2, WANG Qinglin1, 2
-
2025, 47(9):
1544-1554.
doi:
-
Abstract
(
146 )
PDF (1169KB)
(
98
)
-
ILU (Incomplete LU) factorization is widely used in solving large-scale sparse linear systems. It can effectively reduce the number of iterations and improve solving efficiency. However, due to the data dependence of linear systems and the irregularity of computation and memory access during the decomposition process, it is difficult to perform efficient parallel optimization. In the multiple instruction multiple data (MIMD) many-core architecture, numerous parallel computing threads can execute different instructions, which has a natural adaptability to algorithms with irregular control flow. This paper conducts research on the parallel algorithm optimization of ILU factorization on the MIMD many-core architecture PEZY-SC3s processor, proposes an ILU parallel algorithm for the MIMD architecture, and adopts measures such as graph coloring-based parallelism optimization, vector unit-based memory access optimization, thread grouping-based load balancing optimization, and on-chip local storage-based data locality optimization to optimize the algorithm performance. Experimental results show that the proposed ILU parallel factorization algorithm achieves an average speedup of 16.70 and 1.39 compared with the MKL implementation on Intel Xeon 4314 CPU and the cuSPARSE implementation on NVIDIA A30 GPU, respectively.
-
Computing-in-memory circuit and macro design for solving partial differential equations
- WANG Jingke, XIE Aisen, CHANG Liang
-
2025, 47(9):
1555-1562.
doi:
-
Abstract
(
111 )
PDF (1727KB)
(
151
)
-
To address the computational challenges of high-precision partial differential equation (PDE) solving in natural sciences and engineering,this paper proposes a novel PDE solver based on the computing-in-memory (CIM) architecture.Leveraging CIM technology,the solver embeds computational logic directly into memory,significantly reducing data transmission between the processor and memory.We thoroughly analyze the computational process of PDE solving,extract key computational flows,and transform them into matrix multiplication and accumulation operations suitable for CIM.By designing a parallel computing scheme and corresponding behavior-level model for the CIM architecture,we further develop and test the hardware implementation.The correctness and efficiency of the proposed design are verified by comparing results with traditional CPU computations.Experimental results show that when solving 2D Poisson equations,wave equations,and other PDEs,the solver achieves a solution accuracy of over 98% for 2D equations and 99.8% for 1D equations,with a solution speed 76 times faster than that of CPUs.
-
CPWS: A checkpoint-based multi-level warp scheduler for GPGPU
- JIANG Zekun, YUAN Bo, CUI Jianfeng, HUANG Libo, CHANG Junsheng, LIU Sheng
-
2025, 47(9):
1563-1570.
doi:
-
Abstract
(
128 )
PDF (2089KB)
(
200
)
-
General-purpose graphics processing unit (GPGPU) adopts the single instruction multiple- thread (SIMT) model, which allows a large number of threads to execute the same instruction simultaneously, thereby significantly improving computing efficiency. Under the SIMT model, GPGPUs organize a group of threads into logical execution units called warps. Since hardware must perform time-division multiplexing among multiple warps, warp scheduling is crucial for achieving efficient parallel computing. By adding new checkpoint instructions, a checkpoint-based multi-level warp scheduler (CPWS) is introduced. CPWS can track the execution progress of each warp and dynamically adjust its scheduling strategy based on this progress, with relatively low overall hardware overhead. Experimental results show that CPWS improves performance by 11% compared with the greedy then oldest (GTO) scheduler, 16.7% compared with the loose round robin (LRR) scheduler, and 10.6% compared with the two-level round robin scheduler. In addition, synthesis results on FPGA indicate that the logic unit overhead added by CPWS compared with GTO is only 0.8%.
Computer Network and Znformation Security
-
A variable-weight multi-attribute decision-making algorithm for wireless network
- YUAN Xin1, LIU Yunyan2, MA Liang3, SONG Ye4, LI Ning1, GUO Linxu5, ZHANG Zhaoxin1, YU Changli1
-
2025, 47(9):
1571-1585.
doi:
-
Abstract
(
131 )
PDF (1068KB)
(
120
)
-
In applications of wireless networks such as routing decisions, cloud computing, data center networking, network selection, and edge computing, multi-attribute decision-making (MADM) algorithms are widely adopted due to their effectiveness in solving multi-objective decision-making problems. However, in modern wireless networks, traditional MADM algorithms fail to adequately meet the demands of scenarios involving rapid, continuous, and large-scale service flows. To address this, this paper proposes two enhanced algorithms: iMADM and variable-weight MADM (vw-MADM). Compared to traditional algorithms, the vw-MADM algorithm is simpler and more efficient. In vw-MADM, when one parameter changes, only the utility of that specific parameter needs to be recalculated, while the utilities of other candidate parameters remain unaffected. Its innovation lies in improving accuracy while reducing computational complexity. Additionally, this paper evaluates the properties of the proposed vw-MADM and iMADM algorithms, including rationality, effectiveness, computational complexity, and thresholds for parameter and utility variations. Simulation results demonstrate that the proposed vw-MADM algorithm outperforms traditional MADM and iMADM algorithms in terms of accuracy, computational complexity, and rationality, proving its capability to significantly enhance MADM performance.
-
Is adversarial-arbiter physical unclonable function really secure
- JIANG Haolin1, 2, 3, DENG Ding1, 2, 3, NI Shaojie1, 2, LOU Shengqiang1, 2, 3, SUN Pengyue1, 2, 3, ZHANG Shuzheng1, 2, 3
-
2025, 47(9):
1586-1597.
doi:
-
Abstract
(
93 )
PDF (1356KB)
(
119
)
-
Physically unclonable function (PUF) is a promising security primitive for authentication security.Proposing attack models tailored to different PUFs can drive their design improvement and enhance security.Adversarial-arbiter PUF (A-APUF) is a secure PUF proposed in 2021,claiming to effectively resist modeling attacks.This paper focuses on the A-APUF and proposes a bidirectional- challenge-based attack model.Firstly,bidirectional challenge sequences are applied to the A-APUF to obtain bidirectional response sequence,and then the control sequence is calculated.Secondly,the XOR tap coefficients are calculated using the control sequence.Finally,the XOR tap coefficients are used to crack the A-APUF into an arbiter PUF (APUF).The experiments show that this model effectively reduces attack difficulty for both standard and upgraded defense mechanisms of A-APUF,bringing it to the level of an ordinary APUF.Using a conventional feedforward neural network achieves over 90% accuracy with just 1 000 challenge-response pairs for A-APUF.
-
Android malware detection based on classifier-oriented feature weighting
- XIONG Zhi1, 2, LIU Fang1, WANG Yixuan1
-
2025, 47(9):
1598-1608.
doi:
-
Abstract
(
102 )
PDF (1007KB)
(
92
)
-
Feature weighting can provide more comprehensive information to enhance models learning ability and decision accuracy,but the relationship between features and classifiers is often ignored in practice.To address this problem,a classifier-oriented feature weighting method called COFW is proposed and applied to Android malware detection.Firstly,the features of seven categories are extract- ed from the Android application package,and the most important feature subset is selected.Secondly, according to the classifier used to detect malware,COFW is employed to compute the optimal weight of each feature for the classifier.Finally,the classifier is trained on the weighted features.COFW adopts the method of removing one to calculate an initial weight for each feature,then maps it to the final weight through a mapping function,and uses a differential evolution algorithm to optimize the parameters of the mapping function and the classifier.The experimental results show that using COFW for feature weighting can improve the performance of the classifier,and COFW outperforms the other four feature weighting methods designed for Android malware detection.
-
A user classification and hierarchical access control model based on blockchain and attribute-based encryption
- YIN Xiaofan1, LI Xiaohui1, ZHANG Siqi2
-
2025, 47(9):
1609-1617.
doi:
-
Abstract
(
131 )
PDF (1280KB)
(
259
)
-
The privacy protection method for users financial data publishing utilizes the characteristics of multi-party maintenance in the consortium chain,and adopts a combination of on-chain and off-chain to reduce storage costs.The access policy is customized by the user, and the attribute-based encryption technology is used to achieve access control for user classification and hierarchy.This paper proposes a user classification and hierarchical access control model based on attribute based encryption.The model combines on-chain and off-chain forms,utilizes attribute-based access control model,adopts ciphertext policy attribute-based encryption,embeds reputation incentive mechanism into smart contracts,automatically modifies trust scores based on user upload and query operations,and establishes the classifications and hierarchies of data requesters based on their trust scores,thereby,different levels of users have different access permissions to data,ensuring both data security and flexibility when accessing data.
Key words:attribute-based encryption;blockchain;access control;classification and hierarchical
-
LwFEN:A lightweight feature extraction network for unsupervised pedestrian re-identification
- GAO Shunqiang1, WANG Zhiwen1, BAI Yun2
-
2025, 47(9):
1619-1627.
doi:
-
Abstract
(
115 )
PDF (4525KB)
(
82
)
-
To address the problems of high computational cost and large model parameters in unsupervised person re-identification models,a lightweight feature extraction network for unsupervised person re-identification is proposed.First,the Ghost Bottleneck is redesigned to compress the number of models parameters,and the ECA attention module is embedded into the lightweight backbone network to improve performance,enhance the network’s feature extraction capability,and solve the problem of feature loss caused by lightweight design.Second,a cluster-level dynamic memory dictionary and momentum update strategy are introduced to handle the embedding of unsupervised clustering features,which helps to alleviate the problem of feature inconsistency.Finally,pre-training is performed on the LUPerson dataset.A large number of experiments are carried out on common public datasets such as Market-1501,MSMT17,and PersonX.Compared with models such as PPLR,Cluster Contrast,and RTMem,the results show that LwFEN reduces the model parameters by 24.3%,the computational amount(measured by floating point operations) by 28.12%,and improves the mAP of the model to 83.4%.
-
A dual-stream network based on feature decoupling for pan-sharpening
- ZHANG Shengyu1, 2, 3, SONG Huihui2, 3, 4
-
2025, 47(9):
1628-1637.
doi:
-
Abstract
(
99 )
PDF (2797KB)
(
72
)
-
The purpose of pan-sharpening is to fuse high-resolution panchromatic images and low-resolution multispectral images generated by the same satellite to generate a high-resolution multispect- ral image.Existing models fail to fully explore the correlation and complementarity between different modalities when fusing multimodal information,resulting in the inability to fully leverage the advantages of multimodal information,thus affecting the quality of pan-sharpening.To address the question of insufficient cross-modal feature extraction and fusion,this paper proposes a dual-stream network based on feature decoupling to capture richer feature information from panchromatic and multispectral images.Specifically,the proposed network utilizes an encoder to decompose the features of panchromatic and multispectral images into global features and local features,thereby improving the models ability to capture long-range dependencies and local details.Subsequently,a cross-modal feature fusion module integrates these features within and across domains,allowing the model to learn more comprehensive and rich feature representations at different levels.Then,a progressive fusion module gradually merges glo- bal and local features to obtain more accurate feature representations.Finally,the fused features are fed into the decoder to generate a high-resolution multispectral image.Experimental results on GaoFen-2 and WorldView-3 demonstrate the superiority of the proposed model compared to many existing models.
-
A multiple restoration network for large broken images
- LI Zhipeng1, CHEN Danyang1, 2, ZHONG Cheng1, 2
-
2025, 47(9):
1638-1646.
doi:
-
Abstract
(
155 )
PDF (1264KB)
(
76
)
-
To restore images with large damaged areas, this paper proposes a new multi-stage inpainting network model. Firstly, the model reduces the accumulation of inpainting errors during the upsampling process by extending the feature inpainting procedure. Secondly, the multi-scale restoration module(MSRM) is proposed, which can synthesize the information from different receptive fields to complete the feature map. Thirdly, an attention mechanism is employed to optimize the inpainting process, resolving the issue of color inconsistency in the output image caused by non-uniform restoration across different regions. Finally, the loss function is improved to make the model focus more on repair- ing the damaged regions. Experimental results show that the quality of the restored images of the model on both Places2 and CelebA datasets is improved to different degrees, and the improvement effect is more obvious as the proportion of missing pixels in the image increases.
-
A Transformer-based pixel-by-pixel detail compensation dehazing network
- WANG Yan, LIU Jingjing, HU Jinyuan, CHEN Yanyan
-
2025, 47(9):
1647-1657.
doi:
-
Abstract
(
147 )
PDF (3767KB)
(
111
)
-
Currently, deep learning-based image dehazing algorithms struggle to simultaneously extract the global and local features of images, resulting in the loss of detailed information in the restored images and the occurrence of color distortion. To address this issue, a pixel-wise detail compensation dehazing network based on Transformer is proposed, which mainly consists of a Transformer-based encoder-decoder and a CNN branch. When a foggy image is input, global feature extraction is performed through the encoder. The Transformer in the encoder is composed of a channel attention block (CAB), a compression attention neural block (CANB), and a dual-branch adaptive neural block (DANB). The CANB captures the global dependencies of image superpixels through feature aggregation, attention calculation, and feature restoration. The DANB adopts a dual-branch structure to encapsulate the global dependencies of superpixels into individual pixels, thereby obtaining global feature information. Meanwhile, the spatial attention in the CNN branch can enhance the model’s ability to perceive different fog densities and perform local feature extraction. Finally, in the decoder part, the features extracted by the encoder and the CNN branch are fused to output a clear image. Experimental results show that the proposed model performs excellently on both synthetic dataset (RESIDE) and real datasets (O-HAZE and NH-HAZE), and can effectively solve the problems of detailed feature loss and color distortion.
Artificial Intelligence and Data Mining
-
A safe and energy-efficient obstacle avoidance method for UAVs
- WAN Zhong1 , CHEN Renzhi1 , ZHANG Xiangyu2 , XU Shi1, ZHAO Jingyue1, AI Yongbao1, YANG Zhijie1 , WANG Lei1
-
2025, 47(9):
1658-1668.
doi:
-
Abstract
(
168 )
PDF (1718KB)
(
261
)
-
To achieve high-speed,agile,and autonomous flight,it is necessary to extend the UAV endurance,reduce command transmission delay,and enhance the UAV's quick response capability.Meanwhile,in complex scenarios,UAVs highly depend on obstacle detection information,and various errors will reduce UAV flight safety.Based on the above problems,an obstacle avoidance strategy is formulated through a local planning obstacle avoidance method with predefined rules.The obstacle avoidance method is optimized using Kalman filtering and Bayesian linear regression model respectively.Experimental results show that the Bayesian linear regression-based method has a prediction speed 2.8 times faster than the Kalman filtering-based method,which not only improves prediction efficiency but also ensures high prediction accuracy and stability.Additionally,to obtain both low-power and safe obstacle avoidance speeds,the obstacle avoidance speed is set as the decision variable,and the endurance time and confidence are set as the target vectors.The optimal obstacle avoidance speed is obtained by finding the knee point to minimize the trade-off loss between endurance time and confidence level.Finally,the improved local planning-based obstacle avoidance method is verified in the UAV obstacle avoidance environment.The results show that this system can promptly avoid dynamic obstacles,and the total time delay is reduced by approximately 7% on average compared with the obstacle avoidance method under the same experimental conditions.
-
A hierarchical decoder model with attention collaboration mechanism for solving the heterogeneous capacitated vehicle routing problem
- ZHENG Mingjie, CAO Zhanmao
-
2025, 47(9):
1669-1678.
doi:
-
Abstract
(
96 )
PDF (772KB)
(
144
)
-
Existing deep reinforcement learning (DRL) methods for solving the capacitated vehicle routing problem (CVRP) are mainly designed for homogeneous fleets, where all vehicles have the same capacity. However, these DRL methods perform poorly when dealing with more realistic heterogeneous fleets. Aiming to minimize the route length, a novel DRL model is proposed to solve the heterogeneous capacitated vehicle routing problem (HCVRP) with different capacity constraints. Specifically, a hierarchical decoder model (HDM) consisting of two types of decoders is proposed: a routing allocation decoder (RAD) and a sequence construction decoder (SCD). The RAD assigns nodes to appropriate vehicles to form several groups, while the SCD constructs the order of nodes within each group to minimize the total route length. In addition, an attention collaboration mechanism (ACM) is designed to promote information sharing among SCDs, optimizing the node order of each group and thus improving the quality of the overall solution. Experimental results show that the HDM model outperforms existing state-of-the-art deep learning methods and can provide solutions comparable to traditional optimization solvers within a reasonable time.
-
A multi-strategy improved mountain gazelle optimization algorithm and its application
- LI Xiang1, LIU Jie2, QIN Tao1, LI Wei3, LIU Ying4, YANG Jing1, 5
-
2025, 47(9):
1679-1690.
doi:
-
Abstract
(
85 )
PDF (1278KB)
(
94
)
-
To address issues such as slow convergence speed, low convergence accuracy, and proneness to falling into local optima in the mountain gazelle optimizer (MGO), a multi-strategy improved mountain gazelle optimizer algorithm(MSIMGO) is proposed. Firstly, a good point set is used to initialize the population, improving the quality of the initial population. Secondly, the golden sine strategy is integrated to enhance the convergence speed. Thirdly, the vortex effect is introduced to alleviate the reduction in population diversity in the later stage. Finally, Cauchy mutation is applied to perturb the position of the optimal gazelle, enhancing the algorithm’s ability to escape local optima. Comparative optimization experiments with 8 other algorithms on 8 benchmark functions and CEC2019 benchmark functions show that MSIMGO has stronger optimization capability, and the effectiveness of MSIMGO is validated through the Wilcoxon rank-sum test. The application of MSIMGO to the engineering problem of pressure vessel design demonstrates its feasibility and effectiveness in handling practical engineering problems.
-
A fault tolerance scheme for memristive neural network under stuck-at faults
- CHENG Qihong1, LIU Peng1, YAO Lian1, YOU Zhiqiang2, WU Jigang1
-
2025, 47(9):
1691-1699.
doi:
-
Abstract
(
139 )
PDF (1086KB)
(
165
)
-
Resistive random access memory (RRAM) exhibits enormous potential in accelerating neural network computations due to its characteristics such as non-volatility and low latency. It can efficiently implement vector-matrix multiplication operations while avoiding massive data transmission. However, stuck-at faults (SAFs) can lead to a significant degradation in the inference accuracy of RRAM-based neural networks. This paper proposes a fault-tolerant scheme for SAFs, which includes methods such as weight mapping adjustment, weight range modification, and loss function regularization, aiming to minimize the weight deviations introduced by SAFs. Comprehensive evaluations through applying image recognition tasks on different neural networks show that the proposed fault-tolerant scheme can effectively recover the accuracy loss caused by SAFs. Even under the condition of 10% SAFs, the average accuracy loss does not exceed 1.5%.
-
Optimization and reduction for deep learning test set based on MMD-GA
- WANG Fengying1, 2, SONG Zikai2, ZHANG Yan1, DU Liming1
-
2025, 47(9):
1700-1710.
doi:
-
Abstract
(
100 )
PDF (1651KB)
(
185
)
-
In the field of image recognition, test cases are redundant and labeling still requires manual operation. Optimizing test cases is an effective way to solve the problems of high testing costs and low testing efficiency. Based on this, a test case optimization and reduction method based on evolutionary algorithm, named ERIR, is proposed. It uses a deep neural network model to extract image features, which are then substituted into the HDBSCAN clustering algorithm to analyze the data distribution of the original test set. On the basis of clustering results, an evolutionary algorithm is designed with the goal of minimizing the difference between the test subset and the original distribution. A test case selection method combining maximum mean discrepancy and genetic algorithm, named MMD-GA, is proposed, which can select the most representative prototypes from each cluster to form a test subset. A large number of experiments were carried out on CNN structure and Transformer-structure models using this algorithm. The results show that the selected test inputs improve time efficiency while ensuring that the accuracy is close to that of the original test set, with the average error of accuracy compared with the overall test set ranging from 0.18% to 2.32%.