High Performance Computing
-
An optimization method for transmitter equalization based on neural network
- SHEN Huiyi, LI Jinwen, CAO Jijun, LAI Mingche
-
2026, 48(1):
1-10.
doi:
-
Abstract
(
59 )
PDF (2120KB)
(
19
)
-
With the ever-increasing demand for data transmission bandwidth in data centers and high performance computer systems, the data transmission rates of high-speed interconnection networks are increasing, while the signal transmission links are becoming increasingly complex. This places higher requirements on the equalization technology for high-speed serial communication SerDes (Serializer/Deserializer) signals. Currently, adaptive equalization can be achieved at the receiver end, but adaptive feed forward equalization (FFE) at the transmitter end remains challenging and requires manual configuration. To address this issue, this paper proposes a multi-objective optimization method for the transmitter-side FFE coefficients based on neural networks. Firstly, simulation data is collected, and a neural network is utilized to model the relationship between the FFE tap coefficients and eye height/width. Subsequently, a multi-objective optimization algorithm is applied to solve the trained neural network model, enabling the rapid determination of optimal FFE circuit tap coefficients. Compared to the traditional single-objective optimization method for FFE coefficients based on bit-by-bit simulation, the proposed method achieves a maximum improvement of approximately 25% in eye diagram area, significantly reduces time overhead, and enhances optimization efficiency.
-
A parallel evolution strategy framework for large-scale system
- ZHANG Han, WANG Xiaoping
-
2026, 48(1):
11-19.
doi:
-
Abstract
(
31 )
PDF (914KB)
(
9
)
-
Evolution strategies (ES) algorithm is an efficient optimization algorithm suitable for solving problems where gradient information is either unavailable or difficult to obtain. It is widely applied in tasks such as reinforcement learning and black-box optimization. As the scale and complexity of problems increase, the sampling size of the ES algorithm also grows larger, leading to a corresponding increase in computational parallelism. For large-scale systems, a new parallel ES algorithm framework is proposed, primarily focusing on optimizing fault-tolerant computing and communication overhead during ultra-large-scale parallel execution of the algorithm. To address these issues, a high-concurrency reduction mechanism is introduced, along with a low-overhead fault-tolerance method tailored to the algorithm’s characteristics. Experimental results demonstrate that the parallel efficiency of the new algorithm framework in large-scale systems exceeds 54.7%, and when the parallel scale expands to tens of thousands of nodes, the parallel efficiency is 23% higher than OpenAI-NES.
-
OBCC: An operator-based code complexity measurement method to overcome the exascale programming wall in the post-Moore era
- ZHANG Xiaozhe, CHEN Tao, XIAO Tiaojie, ZHANG Xiang, BAO Weimin, GONG Chunye
-
2026, 48(1):
20-27.
doi:
-
Abstract
(
41 )
PDF (624KB)
(
10
)
-
In the post-Moore era, there is a lack of measurement standards for the “programming wall” faced by exascale computing. As an inherent attribute of software code, code complexity serves as the foundation for code understanding, optimization, and pricing. To address the limitations of existing code complexity measurement methods in high-performance computing (HPC) applications, this paper proposes absolute code complexity and relative code complexity, both based on the number of operators and lines of code (LOC). Specifically, absolute complexity refers to the total number of operators in the code, while relative complexity is defined as the ratio of absolute complexity to lines of code. Experimental verification using 43 pieces of software code s shows that this method can reasonably evaluate the complexity of different types of code, especially in the field of scientific computing. Among the tested codes, llvm and the linux kernel rank first and second in terms of absolute complexity, with 33 million and 23 million operators respectively; jellyfin-media-player, spheral, and llvm top the list in relative complexity, with values of 4.54, 3.9, and 3.12 respectively. This method provides a new perspective for the analysis, comparison, and pricing of different codebases, and also offers an objective and quantifiable standard for measuring the “programming wall” in exascale computing.
-
Characteristics analysis and runtime prediction of jobs in supercomputer
- YANG Hongzhen, CHENG Wei, DU Liang, HUANG Dan, ZENG Chuxuan, XIAO Nong
-
2026, 48(1):
28-39.
doi:
-
Abstract
(
43 )
PDF (2339KB)
(
12
)
-
Job logs of high-performance computing (HPC) clusters can be utilized to analyze system workloads, identify periodic patterns in system usage, correlations among job characteristics, and user behavior patterns. This analysis further facilitates the development of a runtime prediction model, reducing the error in estimated job runtimes and enhancing the performance of job backfilling scheduling. Existing prediction algorithms primarily focus on improving the average prediction accuracy of job runtimes but overlooking scenarios where predicted values fall below actual runtimes (underprediction), which may cause the scheduler to prematurely terminate running jobs, thereby reducing the effective utilization of system resources. To address the aforementioned issue, based on an analysis of the long-term trends and correlations of HPC job characteristics, this paper proposes an ensemble learning model to predict job runtimes and introduces an ordered extended maximum strategy to adjust the prediction results of the ensemble model. Experimental results demonstrate that the job runtime prediction model significantly reduces the underprediction rate while maintaining high prediction accuracy, and it exhibits good stability and generalization capabilities.
-
A module placement algorithm based on deep reinforcement learning for fully programmable valve array biochip
- CHEN Ziyang, CHEN Jun, ZHU Yuhan, LIU Genggeng, HUANG Xing
-
2026, 48(1):
40-50.
doi:
-
Abstract
(
33 )
PDF (1081KB)
(
16
)
-
As a novel continuous-flow microfluidic biochip, the fully programmable valve array (FPVA) biochip boasts high flexibility and programmability. As an experimental platform, it offers enhanced manipulation flexibility and enables personalized experimental workflow configurations. However, with advancements in chip manufacturing processes, the integration level of FPVA biochips has reached a high level. Combined with its high degree of freedom, this increases the difficulty in programming and designing FPVA biochips. Module placement is a critical step in biochip design. Previous studies typically employ heuristic algorithms for placement, which often yield limited results for discrete problems and pose challenges in parameter settings. Designing an efficient, user-friendly algorithms more suitable for discretized module placement can enhance the overall efficiency of the chip design process. Deep reinforcement learning (DRL) offers advantages in efficiency, adaptability, and flexibility. Agents, through continuous interaction with the environment, self-train and adjust, swiftly adapting to various complex variations and requirements to find optimal or near-optimal strategies. Compared to heuristic algorithms, DRL can better adapt to the environment and find global optimal placement solution. Therefore, this paper proposes a DRL-based module placement algorithm for FPVA biochips. It constructs an interactive environment for DRL agents within FPVA chips and employs the double deep Q-network to build a module placement decision model. Leveraging the rapid iteration capability of agents, it efficiently completes large-scale integrated module placement tasks under FPVA biochips. Moreover, by designing concurrent relationship constraints and placement area constraints to determine the concurrency between modules and restrict the placement area on the chip, the placement scheme can better conform to real-world scenarios, ensuring the correctness and feasibility of the placement scheme. Comparative experiments with state-of-the-art algorithm across multiple test cases demonstrate that the proposed algorithm can generate module placement schemes with shorter pre-routing wirelength and fewer unit reuse instances, thus providing a high-quality placement scheme for subsequent routing stages.
Computer Network and Znformation Security
-
TPM design and application based on PUF
- SHI Jiangyong, GAO Zhiyuan, LIU Tianyi, LIU Wei, GUO Zhenbin, ZHANG Yongding, LI Shaoqing
-
2026, 48(1):
51-60.
doi:
-
Abstract
(
33 )
PDF (1996KB)
(
9
)
-
Existing trusted platform modules (TPMs) primarily rely on a single RSA public-private key pair as the foundation for secure trusted root, with this RSA key pair being permanently stored within the TPM chip. Consequently, this design architecture may expose the system to threats from physical-level attacks, such as physical analysis and side-channel analysis, thereby making it difficult to effectively guarantee system security. To address this issue, this paper proposes the use of a physically unclonable function (PUF) as the trusted root. By leveraging the secure characteristics of PUFs, includ- ing their physical tamper-resistance, randomness, and unpredictability, a PUF-based TPM architecture is designed and implemented. Furthermore, this paper effectively improves upon the security vulnerabilities in key generation algorithms and the inadequacies in authentication mechanisms identified in existing research. The improved design is then applied to trusted boot verification and secure firmware updates, thereby significantly enhancing the defense capabilities against security threats in trusted comput- ing environments. The security of the proposed protocol is thoroughly analyzed using BAN logic and the protocol automated verification tool AVISPA. Additionally, relevant experiments on trusted boot are conducted on the ZynqTM 7000 series development board. The results demonstrate that the proposed method enhances the security of key generation algorithms and effectively reduces the threats posed by adversaries tampering with bootloader and firmware update data, thereby compromising the system. Performance evaluation results indicate that the average duration of the entire authentication process in the proposed protocol is merely 0.06 seconds, showcasing its superior performance.
-
A QR code information hiding scheme design based on over lapping Hamming code
- ZHANG Lina, HOU Minghui, XIN Peng, LIU Miao, YUE Hengyi
-
2026, 48(1):
61-69.
doi:
-
Abstract
(
30 )
PDF (1918KB)
(
13
)
-
QR codes are widely used due to their large information capacity, fast decoding speed, and error tolerance capability. However, because their decoding rules are publicly available, they are prone to privacy leakage. To address this issue, this paper leverages the fundamental properties of QR codes to propose an information-hiding scheme for QR codes based on overlapping Hamming codes to protect sensitive information within QR codes. This scheme uses a two-level grayscale QR code, transformed from a standard black-and-white QR code, as the carrier, reducing the extent of modifications to the information-hiding locations and enhancing the invisibility of the secret information. Additionally, an overlapping (16, 11) Hamming code algorithm and a corresponding codeword modification rule table are designed to hide a greater number of secret bits. Compared to existing approaches, the proposed scheme achieves superior performance in terms of secret payload capacity and embedding efficiency, while simplifying processes of information hiding and extraction.
-
A generic perturbation-based defense framework for back-door attacks
- RAO Yue, MA Xiaoning, CHENG Zhongfeng
-
2026, 48(1):
70-78.
doi:
-
Abstract
(
33 )
PDF (1353KB)
(
8
)
-
Recent studies have shown that deep neural network (DNN) is vulnerable to backdoor attacks, which are stealthy and powerful enough to allow the model to output the results expected by the attacker. To address the problem that current research on defense against backdoor attacks requires high computational overhead while also affecting the accuracy of the model, a generic perturbation-based defense framework is proposed, which combines the detection of backdoors with the elimination of backdoors. The detection phase generates generic perturbations for the sample set that cause the model to misclassify benign samples without affecting the backdoor samples, and accomplishes the efficient detection of backdoor samples by comparing the changes in the model's output before and after the addition of the perturbations to the samples to be detected. In the elimination stage, the detected backdoor samples are reconstructed using the random primary color overlay method and mixing with the benign samples to deduplicate and train the backdoor model. The framework is validated on MNIST, Fashion-MNIST, and CIFAR-10 datasets to verify the effectiveness of the framework in terms of the effects of different trigger designs, poisoning ratios on the defense, and the defense effect for specific label attacks. Experimental results demonstrate that the framework not only significantly reduces the success rate of backdoor attacks under various conditions but also has minimal impact on the classification performance of benign samples. Additionally, compared to previous studies, it shows substantial improvements in defending against specific label attacks.
-
Design of AES_ll coprocessor based on RISC-V
- HAN Jin, WU Zewei
-
2026, 48(1):
79-88.
doi:
-
Abstract
(
45 )
PDF (1750KB)
(
6
)
-
With the rapid development of computer technology, the volumes of data storage and computation are continuously increasing, making secure, reliable, and efficient data storage and transmission more important than ever. Among various encryption algorithms, the AES algorithm is a widely used symmetric encryption algorithm. The goal of this paper is to improve AES algorithm to make it more suitable for hardware implementation, aiming to reduce hardware area and enhance processing performance. Firstly, this paper proposes a lightweight AES algorithm (AES_ll) and designs four custom instructions based on the RISC-V instruction set architecture to improve the flexibility of the algorithm and reduce costs. Secondly, a dedicated AES_ll coprocessor is designed, and a verification platform capable of randomly generating plaintexts and corresponding ciphertexts is established to ensure the reliability and stability of the AES_ll hardware implementation under different inputs. Finally, synthesis is conducted under a 28 nm process. Experimental results show that the AES_ll coprocessor achieves a throughput rate of up to 2.976 Gbit/s, with an area of approximately 13.97 kgates, offering significant advantages in terms of the throughput-to-area ratio. The design provides an excellent solution for fields with limited resources and high demands for encryption and decryption.
-
SM2 two-party collaborative signature for smart home system
- XU Guowei, LIU Dengzhi
-
2026, 48(1):
89-97.
doi:
-
Abstract
(
58 )
PDF (740KB)
(
10
)
-
Smart home systems have been widely popularized in people’s daily lives. However, due to the limited computational resources of smart home nodes, numerous issues still exist in achieving privacy protection and key management. To enhance the security of smart home systems, a lightweight signature scheme has been designed. This scheme is constructed based on the SM2 cryptographic algorithm and two-party collaborative signing, with key splitting and storage implemented to reduce the risk of key leakage. Meanwhile, the key generation process is integrated with user registration, and authentication parameters are added during the interaction between users and smart gateways for private key generation, thereby further improving the system security without the need to synthesize a complete private key during the signing phase. Finally, security proofs are provided, including unforgeability, anonymity, and storage security. The results of simulated experiments demonstrate that the designed signature scheme is suitable for smart home systems in lightweight environments.
-
Research and implementation of a low-light image enhancement algorithm based on FPGA
- XIAO Jian, LI Zhibin, YANG Jin, CHENG Hongliang, HU Xin
-
2026, 48(1):
98-107.
doi:
-
Abstract
(
36 )
PDF (2280KB)
(
9
)
-
To address the issues of high computational complexity, difficulty in achieving real-time performance, and other challenges associated with implementing low-light image enhancement algorithms using software methods such as deep learning, this paper presents an improved Retinex-model-based low-light image enhancement algorithm that is readily deployable on FPGAs. The algorithm begins by converting the input low-light image from the RGB color space to the YCbCr color space. The Y component in this space is then selected as the initial illuminance component and processed with adaptive Gamma correction and bilateral filtering. This process enhances the brightness of the initial illuminance component while simultaneously achieving noise reduction and detail enhancement in the image. Subsequently, the enhanced image is generated based on the Retinex model. The enhanced image is then converted back to the YCbCr color space, where the Y component undergoes multi-scale detail enhancement before being transformed back to the RGB color space as the final enhanced output. Experimental results demonstrate that when comparing the output images of the proposed low-light image enhancement algorithm deployed on an FPGA with those obtained through algorithm simulation on MATLAB, the structural similarity index measure (SSIM) is close to 1, making it difficult to distinguish between the two with the naked eye. At a clock frequency of 200 MHz, the algorithm processes a 1 280×720 resolution image in approximately 21 ms. Furthermore, when deployed on a domestic FPGA model, the proposed algorithm exhibits low resource utilization and consumes only 3.357 W of power, meeting low power requirements and demonstrating significant practical and engineering application value.
-
A multi-path and multi-scale attention network for land cover segmentation
- LI Yan, FAN Xinyu, CHEN Qin
-
2026, 48(1):
108-118.
doi:
-
Abstract
(
65 )
PDF (1296KB)
(
11
)
-
In recent years, Transformers have made remarkable progress in the field of image recognition, yet they still face challenges in pixel-level segmentation tasks, primarily due to their insufficiently explicit and effective handling of local deviations. To address this issue, this paper proposes a multi-path and multi-scale attention network, named DMANet. By integrating the strengths of convolutional neural network (CNN) and Transformers during the encoding phase, this network is capable of simultaneously capturing fine-grained local information and extensive global context from images, effectively enhancing feature extraction capabilities. The proposed interactive dual-branch structure enhances feature integration, improving the model's performance in dense prediction tasks. During the decoding phase, cross-layer feature fusion is implemented to enhance DMANet’s ability to recognize complex objects. DMANet has demonstrated its exceptional performance and broad applicability in complex land cover segmentation tasks through experiments on Potsdam, GID-15, and L8 SPARCS datasets.
-
A pulmonary airway CT image segmentation method based on a novel adaptive combined loss function
- XIAN Ling, XU Xiuyuan, ZHOU Kai, NIU Hao, GUO Jixiang
-
2026, 48(1):
119-132.
doi:
-
Abstract
(
28 )
PDF (1260KB)
(
8
)
-
Segmenting pulmonary airways from computed tomography (CT) images holds significant importance for the diagnosis and treatment of lung diseases. In recent years, deep learning-based methods for pulmonary airway segmentation have made considerable progress. However, achieving high- precision segmentation still poses substantial challenges. The class imbalance present in pulmonary airway CT images severely affects segmentation performance. To address this issue, a novel segmentation method employing an adaptive combined loss function is proposed. Firstly, by utilizing a local imbalance strategy, the model’s ability to discriminate foreground voxels is enhanced. Secondly, radial distance information is integrated into the focal loss function to improve the model’s capacity to automatically identify small airways. Finally, topological continuity of the airways is enhanced based on topology sensitivity and topology precision. Experimental results demonstrate that, compared to the state-of-the-art models, the proposed method’s model achieves the best performance in terms of the dice similarity coefficient (DSC), branch detection rate(BD), and tree length detection rate(TLD), thereby improving the performance of pulmonary airway segmentation.
-
ASOD-YOLO: An improved aerial small object detection model based on YOLOv8n
- CAO Li1, 2, XU Huiying2, 3, XIE Gang1, LI Yi2, 3, HUANG Xiao4, CHEN Hao2, 3, ZHU Xinzhong2, 3, 5
-
2026, 48(1):
133-145.
doi:
-
Abstract
(
45 )
PDF (3282KB)
(
27
)
-
To address issues such as missed detections and false detections caused by characteristics like scattered object distributions, significant size variations, and indistinct features in UAV images, this paper proposes an improved aerial small object detection model based on YOLOv8n, named ASOD-YOLO. Firstly, the feature fusion network is redesigned: the top-down part of the original feature pyramid structure is replaced with a low-level information distribution (Low-GD) structure. This modification reduces feature loss while enhancing the information fusion ability across different scales. Secondly, the original 20×20 large-object detection head is replaced with a 160×160 small-object detection head to improve the detection capability for small objects. Additionally, the multi-scale cross-layer connections are optimized to provide the detection head with richer semantic information. Meanwhile, a fast fourier convolution module(FFCBlock) is introduced to reduce the loss of small-object information after downsampling and enhance the ability to extract global contextual information. Experimental results on the VisDrone aerial small-object dataset show that, compared with the baseline YOLOv8n model, the ASOD-YOLO model achieves a 4.1 percentage points improvement in the mAP@50 metric and 2.3 percentage points improvement in the mAP@50:95 metric, with a single-image processing time of only 6.8 ms. These results demonstrate that the proposed ASOD-YOLO model can effectively accomplish the task of aerial small-object detection.
Artificial Intelligence and Data Mining
-
Overview of database query rewrite technology
-
2026, 48(1):
146-161.
doi:
-
Abstract
(
26 )
PDF (1940KB)
(
11
)
-
The syntax for writing query statements in databases is highly diverse and flexible, with vastly different query formulations possible for the same requirement. The execution performance of queries directly impacts user experience. Query rewriting techniques transform an input query into an equivalent query with superior performance. Given the numerous rewriting rules and complex query environments, designing high-quality query rewriting strategies poses a significant challenge. Traditional query rewriting strategies are either cost-based or heuristic-based; however, achieving optimal query rewriting results in complex query environments remains difficult. With the rise of AI for databases (AI4DB), integrating machine learning methods into query rewriting techniques has become a mainstream approach, enabling further resolution of issues present in traditional query rewriting. Therefore, this paper first elaborates on the relevant technologies, existing problems, and applicable scenarios of traditional query rewriting strategies. Then it introduces machine learning-based query rewriting strategies, with a focus on discussing how they enhance performance. Finally, it discusses the current challenges in query rewriting and offers perspectives on future research directions.
-
GPR:A large language model enhancement method
- GAO Fucai, HE Tingnian, YANG Yang, YANG Jiangwei
-
2026, 48(1):
162-171.
doi:
-
Abstract
(
49 )
PDF (1195KB)
(
13
)
-
Large language models (LLMs) acquire various abilities and knowledge through a large amount of data, but still have problems such as illusions and lack of specialized domain knowledge, which can be mitigated by introducing an external knowledge graph. A new method called global pruning retrieval (GPR) is proposed for knowledge acquisition from knowledge graphs, which retrieves relevant relations and entities through breadth first search (BFS) and prunes to extract highly relevant relations and entities with a global perspective. At the same time, the entities in the question are connected by the shortest path to the relations. The relations and entities are transformed into prompt and pushed to LLMs, which guide LLMs to reason and generate answers and textualize the reasoning process, making the decision transparent and traceable. Experimental results on multiple datasets show that GPR has a better reasoning advantage, and the retrieved knowledge can better alleviate the illusion and domain knowledge deficit problems of LLMs.
-
Autonomous driving trajectory prediction based on enhanced prediction model
- TIAN Hongpeng, CUI Dan, ZHANG Xiaopei
-
2026, 48(1):
172-179.
doi:
-
Abstract
(
47 )
PDF (742KB)
(
12
)
-
One of the major challenges in autonomous driving technology is the real-time prediction of reliable future trajectory information for surrounding agents to facilitate optimal decision-making for path planning. This paper proposes an agent interaction prediction model named GT-Former. Built upon the Transformer structure, the proposed model integrates graph convolutional network (GCN) to output dynamic interaction features among agents. Furthermore, the interaction between the map and agents utilizes agents’ features as query conditions, combining cross-attention and multi-modal attention mechanisms to integrate both mono-modal and multi-modal interaction information, thereby comprehensively capturing the interaction information between agents and various map features. Simulation experiments on the Waymo dataset demonstrate that this integrated strategy enhances the accuracy of multi- agent trajectory prediction of the model.
-
Knowledge concept-aware session modeling for knowledge tracing
- WANG Jing, MA Huifang, ZHANG Mengyuan
-
2026, 48(1):
180-190.
doi:
-
Abstract
(
76 )
PDF (1272KB)
(
12
)
-
Knowledge tracing (KT) aims to dynamically model learners’ evolving knowledge states based on their historical learning records, and plays a significant role in online education systems. Most existing KT methods treat knowledge states as transition patterns of mastery levels of knowledge concepts from completing one exercise to completing the next, and consider learners’ learning records as continuous and uniformly distributed data. However, actual learning records are considered to be divisible into different shorter sessions. To address this, a method, called knowledge concept-aware session modeling for knowledge tracing (KSMKT), is proposed to capture learners’ knowledge state changes at a finer granularity. Specifically, learners’ historical learning records are divided into shorter sessions from the perspective of knowledge concepts. Subsequently, a fine-grained knowledge state modeling module is proposed to capture fine-grained interaction dependencies and knowledge state changes within and across sessions. Additionally, a global knowledge proficiency modeling module is introduced to model learners’ knowledge states from an overall perspective. Extensive experiments on 3 real-world datasets demonstrate that KSMKT outperforms most current baseline methods, thus proving the effectiveness of KSMKT.