Computer Engineering & Science

An efficient parallel computing framework for earth system models

WANG Dong, LIU Zhuang, HUANG Xiaomeng

2025, 47(10): 1711-1925. doi:

Abstract ( 406 )

PDF (3320KB) ( 225 ) 　　

Earth system models (ESMs) are pivotal tools for understanding the mechanisms of past climate and environmental evolution and projecting future global change scenarios. However, the rapid advancement of computer technology has introduced significant challenges in programming, porting, and optimizing model development. To address these issues, OpenArray 2.0, an automatic parallel computing framework designed for ESMs, decouples model development from underlying parallel computing architectures by employing technologies such as custom operator interfaces, implicit parallelism, computational graph optimization, automatic code generation, just-in-time compilation, and dynamic I/O scheduling. OpenArray 2.0 allows users to write models using Matlab-like serial syntax while enabling parallel execution across heterogeneous platforms, including x86, Sunway, and GPUs. Models developed with OpenArray 2.0 achieve 75% parallel efficiency on an x86 platform with 19 200 cores, delivering performance close to manually optimized code. On the Sunway platform with a million-core environment, it demonstrates 70% scalability, while also exhibiting exceptional execution efficiency on GPU platforms. As a highly promising alternative tool for ESM advancement, OpenArray 2.0 is poised to significantly enhance both model development efficiency and computational performance.

Research on state storage and strategic mapping techniques for FPGA-accelerated simulation

RONG Peitao, ZENG Kun, LI Kai, ZHANG Tian, WANG Yongwen

2025, 47(10): 1726-1736. doi:

Abstract ( 313 )

PDF (2250KB) ( 153 ) 　　

With the continuous growth of processor design scale, cycle-accurate simulation technology is facing challenges.Traditional software simulators are usually slow, while hardware emulation acceleration platforms are often expensive, which limits the use of most academic and industrial research teams. Using FPGA to accelerate cycle-accurate simulation is regarded as a highly promising method. In recent years, FireSim, an open-source platform that uses FPGA for simulation acceleration, not only integrates previous research results in the field of FPGA-accelerated simulation， but also overcomes a series of key obstacles. However, this solution still has the problem of underutilization of FPGA resources, especially the excessive occupation of BRAM resources after model mapping, which limits the further expansion of simulation scale. To solve this problem, new resource management and optimization technologies for FPGA simulation acceleration platforms are proposed, including an automated process for identifying BRAM resource usage and two mapping strategies: migrating components occupying BRAM to URAM to reduce pressure, and achieving balanced resource utilization through distributed reconstruction and resource-sensitive mapping. These technologies increase the simulation scale on a single FPGA from 16 cores to 32 cores, and can theoretically be extended to 64 cores with almost no loss of simulation speed. They effectively enhance the simulation scale expandability of existing platforms and are of great significance for promoting the application of FPGA acceleration technology in large-scale full-system simulation scenarios.

A spiking neural network accelerator based on approximate computing

XU Weikang, SUN Yan, ZHANG Jianmin

2025, 47(10): 1737-1744. doi:

Abstract ( 443 ) 　　

Spiking neural network (SNN) achieves a closer simulation of biological neurons, and their high energy efficiency makes them exceptionally suitable for edge and end-device computing scena- rios. However, in applications highly sensitive to power consumption, further reducing power consumption remains a crucial objective. Approximate computing simplifies design by introducing a certain degree of error, offering new opportunities for energy-efficient hardware design for fault-tolerant applications. This paper explores methods for applying approximate computing to SNN accelerators. Firstly, through analysis and experiments tailored to the application characteristics of SNNs, the distribution characteristics of input data for numerous adders in SNN accelerators are summarized. Based on these characteristics, an application-sensitive error evaluation metric for approximate arithmetic components, named AARE (application-aware approximation error), is proposed. By using this metric and the optimal approximate adder selection strategy introduced in this paper, more appropriate approximate arithmetic components can be selected for specific applications. Building on this, an approximate computing-based SNN hardware accelerator, AxSpike, is implemented using open-source EDA tools and PDKs, along with a corresponding simulator developed using snnTorch. Experimental results demonstrate that the accelerator achieves a 37.32% reduction in power consumption and a 31.26% reduction in area, with only a 3.47 percen- tage point decrease in accuracy, significantly enhancing the energy efficiency of SNN hardware accelerators.

Survey on adaptive routing algorithms for 3D network-on-chip

SHAO Jingbo, NING Jiahong, SU Xinling

2025, 47(10): 1745-1755. doi:

Abstract ( 282 )

PDF (1175KB) ( 138 ) 　　

In recent years, with the continuous development of semiconductor manufacturing processes, the integration level of chips has been increasing. As a solution to large-scale on-chip interconnection problems, the three-dimensional network-on-chip (3D NoC) has become a major trend in the development of integrated circuits. However, large-scale communication within systems may cause network congestion, link failures, and excessively high local temperatures, thereby reducing system performance. Therefore, congestion control, topology-aware, and hot spot avoidance are key research focuses in routing algorithms. The 3D NoC adaptive routing algorithm dynamically makes routing decisions for data packets according to the network state of 3D NoC, which has become one of the research hotspots in 3D NoC routing algorithms. Firstly, this paper introduces the research history of adaptive routing algorithms, expounds the working principles and implementation methods of 3D NoC adaptive routing algorithms, and classifies the algorithms from the perspective of algorithm design principles. Secondly, under the analysis framework based on routing rules, routing strategies, and adjustment strategies, this paper analyzes the adaptive routing algorithms proposed in recent years and summarizes their characteristics. Finally, it discusses the challenges and future development trends faced by adaptive routing algorithms.

New attack modes and protection measures for configurable approximate adder

WANG Haonan, WANG Zhen

2025, 47(10): 1756-1766. doi:

Abstract ( 392 )

PDF (3832KB) ( 138 ) 　　

Approximate computing circuits are designed for applications with inherent error tolerance.By sacrificing a certain degree of computational accuracy,they achieve advantages in computational performance and energy efficiency.However,recent studies have pointed out that the approximation mechanisms of approximate computing circuits may be exploited to create new attacks.The exploration of attack instances and corresponding detection or protection methods is gradually attracting widespread attention.So far,the investigation into the security threats of approximate computing circuits is still in its early stages.Only a few studies have proposed specific attack examples,and there is a lack of research on attack patterns and protective measures for configurable approximate adders.Therefore,there is an urgent need to investigate existing configurable approximate adders and propose possible attack methods to reveal new security threats.This paper proposes an approximate-precise boundary (APB) attack targeting configurable approximate adders and analyzes two attack modes.In addition, two protective measures,namely approximate types random selection and approximate configuration authorization circuits, are proposed.Experimental results show that the approximate configuration authorization circuit can protect against attacks with an additional area overhead of 10.17%,an additional power consumption overhead of less than 13%,and negligible additional delay overhead.

A real-time scheduling algorithm for video processing tasks under cloud-edge collaboration framework

LI Jiakun, XIE Yulai, FENG Dan

2025, 47(10): 1767-1778. doi:

Abstract ( 271 )

PDF (1281KB) ( 136 ) 　　

In the video task processing of cloud-edge collaboration, due to the existence of a large number of processing and transmission tasks, it is necessary to consider the success rates of task proces- sing and the processing time of tasks to ensure the quality of service. At the same time, various resource costs need to be taken into account to save system operation costs. To address the above issues, this paper formally models the video task scheduling problem under the cloud-edge collaborative framework and transforms it into a multi-objective optimization problem. For this problem, an algorithm called OCES is proposed. This algorithm sorts tasks within the same time slice to determine task priorities. For each task, it combines task information with the current status information of each edge node and cloud center node, and uses a neural network to judge and select the strategy with the maximum Q-value for scheduling, so as to specify the specific execution node of the task. OCES is an algorithm based on DDQN, which improves the reward function and strategy selection method. By integrating a noise network into the deep neural network, it avoids the algorithm from converging to a local optimal solution prematurely. Compared with the current internationally advanced CPSA algorithm, the proposed algorithm reduces the execution cost by 10.56% and 5.85% respectively in two scenarios with different average arrival rates and different task types, while achieving similar success rates and completion times.

A new dynamic ad-hoc network solution for IPSec entities based on routing access

LUO Jin, LIANG Yanliang, CHEN Yang, ZHAO Qi

2025, 47(10): 1779-1786. doi:

Abstract ( 298 )

PDF (2462KB) ( 99 ) 　　

With the increasing popularity of IPSec in encrypted transmission applications at the network layer, its end-to-end characteristics have gradually exposed problems such as low networking efficiency and difficulty in configuration, operation, and maintenance in large-scale networking applications. Currently, the mainstream solutions proposed in the industry can alleviate the above problems to a certain extent, but they all have certain limitations. This paper conducts in-depth research on the establishment mechanisms of IPSec SPDs and SADs, as well as routing access technologies, explores the possibility of integrating them, and finally proposes a new dynamic self-organizing network solution for IPSec entities. This solution can effectively improve the self-organizing efficiency of IPSec entities in large-scale networking applications and reduce the pressure of configuration, operation, maintenance, and support.

Localization of object removal by Seam Carving via DCT coefficient analysis

LIN Cong, MA Hongji, SITU Xiaoqing, ZHEN Ronggui, XIAO Hongtao, DENG Yuqiao

2025, 47(10): 1787-1798. doi:

Abstract ( 248 )

PDF (4769KB) ( 132 ) 　　

With the advancement of digital image processing, image tampering techniques have grown increasingly diverse and covert, among which object removal is a critical manipulation technique. Seam Carving, initially designed for content-aware resizing of images, can also be exploited for object removal. To address this tampering technique, this paper proposes a novel forensic method to localize object removal by Seam Carving by analyzing anomalies in DCT coefficients. For the first time, the paper introduces the double quantization effect into object removal by Seam Carving. Specifically, the paper identifies abnormal DCT blocks generated during Seam Carving-based object removal. The proposed method involves three key steps: 1) extracting the quantization matrix and DCT coefficient histograms from JPEG images; 2) estimating the primary quantization matrix and original DCT coefficients based on histograms, followed by generating a posterior probability map of tampered regions using a Bayesian framework; 3) denoising and thresholding the probability map to pinpoint the location of removed objects. Experimental results demonstrate that the proposed method effectively detects and localizes Seam Carving object removal. This method provides a new research direction for tampering forensics.

Review on automated debugging of hardware description language code

XU Jianjun, HE Jiayu, WU Jiang, MAO Xiaoguang

2025, 47(10): 1799-1809. doi:

Abstract ( 328 )

PDF (809KB) ( 139 ) 　　

Code defects are common yet critical issues in hardware design. During the development and maintenance phases, defect debugging remains a highly manual and time-consuming task nowadays for hardware developers. How to free hardware developers from arduous debugging tasks has become a pressing need in the field of hardware verification. Consequently, automated debugging technologies for hardware description language (HDL) code defects have emerged and gradually become a research hot- spot. To systematically organize the work in this field, this paper conducts a survey and analysis of research on automated debugging technologies for HDL code defects. It elaborates and analyzes the research progress of automated debugging technologies from three aspects — the defect analysis, automated defect detection and localization, and automated defect repair. Additionally, it discusses the limitations of current technologies and the challenges they face.

Modeling and simulation of aircraft dynamic virtual removal and installation processes under interference constraints

MA Hongyan, CHEN Jingjie

2025, 47(10): 1810-1818. doi:

Abstract ( 302 )

PDF (807KB) ( 91 ) 　　

To address issues in the current modeling of aircraft virtual removal and installation processes, such as non-standardized expression, rigid procedures, and inability to meet the diverse dynamic operation needs of trainees, a process modeling method is proposed. Firstly, with the removal and installation objects as the main body, a parameterized model for virtual removal and installation processes and object attribute tags are constructed. The former is used to standardize the expression of the virtual removal and installation processes, while the latter records the object attributes and their current states. Then, an interference constraint matrix is built to express the interference constraint relationships between objects, serving as the constraint boundary for training operations. On this basis, a dynamic virtual removal and installation process model of the aircraft under interference constraints is constructed. Finally, the model is verified by taking the removal and installation of a cargo-compartment temperature-sensor of a certain type of aircraft as an example. The results show that the model has good standardization and dynamics, and can effectively reduce the workload of modeling.

A fast visual detection and tracking algorithm for small UAV targets

DI Jiahao1, 2, TIE Junbo1, 2, ZHOU Li1, 2, WANG Yongwen1, 2

2025, 47(10): 1819-1829. doi:

Abstract ( 632 )

PDF (4680KB) ( 184 ) 　　

Small unmanned aerial vehicles (UAVs) show great potential in multiple fields, but they may lead to abusive behaviors such as illegal mapping, reconnaissance, and interference with aviation order. Therefore, effective detection and tracking strategies are urgently needed. Traditional radars have limitations in tracking small UAVs in complex urban environments, while vision-based deep learning methods, although with high accuracy, have large computational overhead. To address the above challenges, this paper proposes a detection and tracking algorithm based on lightweight YOLOv3-tiny and interactive multiple model Kalman filter (IMM-KF). YOLOv3-tiny is used for low-frequency detection, and IMM-KF realizes tracking through high-frequency prediction and state updates of multiple motion models, which effectively reduces the computational power requirements and can deal with the problem of target loss when the target is occluded. Experimental results show that the detection and tracking accuracy of this algorithm in complex urban environments reaches 98.33%, with a real-time coverage rate of 73.6%, which significantly improves tracking efficiency and stability and meets the needs of UAV supervision.

Research on human pose anomaly detection based on spatio temporal graph attention state space model

LI Hang, CHEN Zhigang, WANG Yijie, ZHANG Xinyu, LEI Jinghong, LIU Lingfeng

2025, 47(10): 1830-1840. doi:

Abstract ( 318 )

PDF (1315KB) ( 154 ) 　　

Video anomaly detection is widely applied in fields such as public security, transportation, and healthcare. However, human pose anomaly detection faces issues including susceptibility to environmental influences, difficulty in handling skeleton timelines, high computational complexity, and easy neglect of local important features in motion regions. To address these problems, a novel model based on human skeleton, named spatiotemporal graph normalizing flow mixed attention state space model (STG-FAM), is proposed. This model effectively captures temporal dynamic features in skeleton timelines by introducing a selective state space model and normalizing flow into the spatiotemporal graph convolutional network. It utilizes a mixed attention mechanism to learn attention weights across channels and spatial domains, thereby enhancing the model’s focus on key nodes and spatiotemporal edges in the temporal skeleton and improving the model’s representational capacity and anomaly detection performance. The effectiveness of the proposed model is demonstrated through experiments on two video anomaly detection datasets: the ShanghaiTech Campus dataset and the UBnormal dataset.

A dual-prior guided attention feature aggregation defogging generative adversarial network

WANG Yan, HU Jinyuan, LIU Jingjing, CHEN Yanyan

2025, 47(10): 1841-1852. doi:

Abstract ( 383 )

PDF (3130KB) ( 205 ) 　　

Image defogging is a challenging and hot issue in the field of computer vision. Existing defogging methods usually use a single convolutional neural network (CNN) to solve the problem, but such methods lack detail recovery mechanism and perform poorly in the case of non-uniform fog. To address the above two problems, a dual-prior guided attention feature aggregation defogging generative adversarial network is proposed, where the dark channel prior and semantic prior respectively guide the recovery of generalized features and texture details of the images. The generator uses a parameter-sharing encoder to extract features, adds an attention feature aggregation block (AFAB) to aggregate and enhance multi-scale features, and recovers the fog-free image by decoding multi-scale features. Finally, a multi-scale discriminator is used to supervise the recovery of the fog-free image. In addition, considering the possible uneven distribution of fog in the image, a coordinate attention residual block (CARB) is proposed, which can adaptively assign weights to make the network focus on the important features of the image. At the same time, a coordinate attention dense residual group (CARG) is constructed through three CARBs using residual aggregation, so that residual features can be fully utilized.Experimental results show that the proposed network performs excellently on both synthetic foggy image datasets and real foggy image datasets.

Elite golden jackal optimization based on multi-strategy improvement

WU Zhixiang, LIU Jie, QIN Tao, CHEN Changsheng, LI Wei, YANG Jing

2025, 47(10): 1853-1866. doi:

Abstract ( 495 )

PDF (1485KB) ( 122 ) 　　

Aiming at the problems of poor convergence accuracy, easy to fall into local optimality of golden jackal optimization algorithm when solving optimization problems, an elite golden jackal optimization algorithm based on multi-strategy (EGJO) is proposed. Firstly, the elite opposition-based learning is used to select the elite population to find the optimal solution, that can improve the quality and diversity of the population, thus effectively improve the convergence accuracy and speed of the algorithm. Secondly, the two-sided mirror reflection theory is used to deal with the transboundary individuals to solve the problem of uneven population distribution. Thirdly, an adaptive energy factor is proposed to better coordinate the exploration and the exploitation. Finally, Cauchy mutation strategy is applied to the optimal individuals of the population to improve the ability of the algorithm to jump out of the local optimal. Through the optimization simulation experiment of 16 typical benchmark functions, the convergence, robustness, Wilcoxon rank sum test and other aspects are analyzed comprehensively, and the six optimization algorithms are compared. The experimental results show that the convergence accuracy and speed of the EGJO are significantly improved. In addition, two typical engineering problems are optimized, and the results show that the proposed algorithm has the feasibility and efficiency to solve the actual optimization problems.

An intelligent assembly method based on edge-cloud collaboration and augmented reality

CAO Pengxia, LI Wenxin, HUANG Yibo

2025, 47(10): 1867-1876. doi:

Abstract ( 280 )

PDF (1416KB) ( 172 ) 　　

Traditional assembly methods have problems such as difficulty in ensuring efficiency and quality, and poor visualization capabilities, while augmented reality wearable devices also have shortcomings in terms of tracking and registration stability and computing power. To solve these problems, an intelligent assembly method based on edge-cloud collaboration and augmented reality is proposed. The cloud provides accurate pose information for the assembly process through the tracking and registration module, and manages information such as 3D models and assembly processes required for intelligent assembly. Since most targets in the assembly scenes lack textures and vary in size, the tracking and registration module uses the improved YOLOv5s for target detection to obtain the operation object, and then applies the 3D point cloud registration method to obtain the precise pose information of the operation object. The edge terminal, through binocular AR glasses, provides on-site operation environment data, receives the pose information provided by the cloud, and combines the speech recognition module and the augmented reality visualization module to realize the guidance of the assembly process. Experimental verification shows that the method proposed in this paper can effectively solve the problems of accuracy, speed and robustness faced by the augmented reality intelligent assembly system, and realize the intelligent assembly guidance of “what you see is what you operate”.

Improved chimp optimization algorithm based on multi-strategy integration

WANG Yan, WANG Niya, MAO Jianlin, XU Zhihao, LI Dayan

2025, 47(10): 1877-1889. doi:

Abstract ( 248 )

PDF (3409KB) ( 158 ) 　　

The chimp optimization algorithm (ChOA) is characterized by high population diversity and fast convergence speed. However, there remains room for improvement in its search capability and methods for escaping from local optima. Therefore, this paper proposes an improved chimp optimization algorithm based on multi-strategy fusion. Firstly, a double-cross infinite-fold iterative chaotic map is introduced to initialize the population, enhancing the quality of initial solutions and facilitating subsequent optimization by the algorithm. Subsequently, a hybrid position update mechanism that combines sinecosine weight factors and an individual best following strategy is employed to update individual positions, thereby improving the algorithm’s optimization capability and convergence accuracy. Finally, a CauchyGaussian variation mechanism is introduced to mutate the current best individual, and a greedy selection strategy is used to select the optimal individual, enhancing the algorithm’s ability to escape local optima. In numerical experiments, the Wilcoxon rank sum test is utilized to comparatively analyze the optimization performance of the improved algorithm using 10 benchmark functions. The results demonstrate that the proposed algorithm exhibits enhanced optimization performance compared to the compared algorithms and further validates its effectiveness in solving 3D path planning problems.

Research on compiler optimization methods based on source code migration

ZHOU Fang, LIU Maofu, LI Shanzhi

2025, 47(10): 1890-1900. doi:

Abstract ( 380 )

PDF (1208KB) ( 150 ) 　　

Compiler optimization aims to enhance the efficiency of code execution on target platforms by applying a series of transformations to the intermediate representation (IR) language. Traditional methods typically rely on machine learning to analyze IR features and predict the optimal combination of LLVM compiler optimization passes. However, these methods are limited by their reliance on existing compiler optimization strategies and insufficient use of global information, which limits their scalability. This study adopts deep learning to automatically translate function-level IR from an unoptimized state to the O2 optimization level, treating this optimization process as a translation task. By integrating a dense data flow graph (DDFG), this method is able to extract the global structural information from the IR code, thereby guiding the model to learn code semantics more comprehensively. Experiments using the Transformer model demonstrate that this method can effectively train IR at the O2 level, and 86.15% of the function-level optimized code can execute correctly on the compiler while ensuring semantic integrity.

Current Issue

Author center

Review center

Online journal