Computer Engineering & Science

Reproducible matrix decomposition on domestic chip

TANG Tao, JIANG Hao, PENG Lin, QI Haijun, LU Qingfeng

2025, 47(5): 761-774. doi:

Abstract ( 251 )

PDF (2901KB) ( 421 ) 　　

The reproducibility of floating-point programs refers to the fact that the same floating-point program exactly obtains the same numerical results in bits in multiple different runs. This is of great significance for program debugging or correctness verification of numerical results, and is widely used in numerical simulation. However, the results of floating-point calculations are often influenced by the order of calculations, so the dynamic scheduling & disordered execution of instructions makes the reproducibility a challenge. Matrix decomposition algorithms are widely used in numerical simulation applications. Reproducible matrix decomposition algorithms can effectively improve the efficiency of debugging and result analysis in precision sensitive numerical simulation applications. Based on error free transformation technology, three reproducible matrix decomposition algorithms are implemented based on the reproducible BLAS library, including block LU decomposition, Cholesky decomposition, and QR decomposition, and verified on domestic processor. The experimental results show that reproducible matrix decomposition algorithms are numerical accurate and reproducible.

Design and implementation of a cross-cluster data migration system for computational networks

LI Junzhe, FU Zhenxin, YANG Honghui, MA Yinping, LI Ruomiao, FAN Chun,

2025, 47(5): 775-786. doi:

Abstract ( 238 )

PDF (1971KB) ( 1823 ) 　　

In the construction of computational networks, how to conduct efficient and reliable data migration between clusters in different regional computing centers is a key research topic. In view of this, this paper designs and implements a high-performance transmission software based on RSYNC, namely SCOW-SYNC. The main research results are as follows: Firstly, SCOW-SYNC adopts the queue and thread pool architecture to optimize the traditional RSYNC. By parallelly establishing multiple TCP connections and parallel transmission, the bandwidth utilization rate is improved. In addition, SCOW-SYNC also supports functions such as automatic large file splitting, dynamic compression, background operation, real-time progress query, and SSH connection pool management. Through testing, SCOW-SYNC can achieve a speedup ratio of 125% to 130% compared with RSYNC. Secondly, in order to improve the security of transmission, this paper proposes a reliable cross-cluster transmission system architecture for computing centers. Data transmission only occurs between "transmission nodes" and is encrypted by "transmission keys", which are dynamically checked, generated, and distributed by the "management node". Finally, this paper integrates SCOW-SYNC into the high-performance computing portal and management platform SCOW, and implements the cross-cluster transmission module of SCOW, so that users can perform high-performance data migration between different clusters through the browser, and deploys it to the cross-cluster environment of Peking University through containerization technology, which improves the production efficiency.

A multi-threaded interrupt-free RISC-V processor for low-latency acceleration component control

ZHANG Weiwei, CHEN Hu

2025, 47(5): 787-796. doi:

Abstract ( 194 )

PDF (1591KB) ( 489 ) 　　

To meet the demand for controlling low-latency acceleration components, this paper proposes a multi-threaded interrupt-free RV32I microprocessor (MIRV) architecture and its associated software system. MIRV adopts a six-stage pipeline, single-issue in-order execution structure, utilizing data forwarding techniques to resolve most intra-thread data hazards. The hardware supports four-thread register files and program counters, employing a coarse-grained thread scheduling mechanism that enables zero-overhead thread switching when intra-thread data or control hazards cannot be resolved. Additionally, this paper introduces a hardware-software unified signaling mechanism, leveraging dedicated CSR (Control and Status Register) registers to facilitate thread suspension and rapid wake-up for signals from external acceleration components. Software-based signal handling is implemented to achieve multi-thread synchronization and mutual exclusion. After synthesis, MIRV occupies 1 811 LUTs and achieves a 210 MHz clock frequency. Compared to PicoRV32 and DarkRISCV, MIRV demonstrates higher ope- rating frequency and superior performance. We implemented a producer-consumer-based LED chaser control test case in C on the MK7160FA development board. In this experiment, the latency from hardware timer signal generation to software-driven external LED control signals was only 10 clock cycles, validating MIRV’s low-latency response capability to external hardware events. With low hardware resource consumption, high performance, and high-level language programmability, MIRV is well-suited as a controller for various low-latency acceleration components.

A near-data processing architecture for data-intensive applications

XIE Yang, LI Chen, CHEN Xiaowen

2025, 47(5): 797-810. doi:

Abstract ( 221 )

PDF (2267KB) ( 472 ) 　　

In the era of big data, multi-core processors face significant challenges when handling data-intensive applications, including low data locality, high memory access latency, and inefficient core utilization. Near-data processing (NDP) holds great potential for reducing memory latency and improving computational efficiency. This paper proposes a loosely-coupled near-data processing architecture (LcNDP), deployed at both the shared cache level and memory controller of multi-core processors. The key innovations include: Offloading memory access tasks from compute cores to enable parallel execution of computation and memory operations, thereby hiding memory latency. Processing streaming data via near-data compute units to reduce both computational and memory overhead on the cores. Experimental results demonstrate that, compared to traditional multi-core architectures, LcNDP achieves an average 43% reduction in latency. When benchmarked against conventional NDP-enhanced multi-core designs, it further delivers a 23% average latency improvement.

A novel low-overhead latch resistant to triple-node-upsets

XU Hui, TANG Lin, MA Ruijun, LIANG Huaguo, HUANG Zhengfeng

2025, 47(5): 811-822. doi:

Abstract ( 189 )

PDF (2365KB) ( 380 ) 　　

Under advanced nanoscale semiconductor processes, the continuous scaling down of transistor feature sizes and the increasing level of integration have made radiation-induced triple-node-upsets increasingly prominent. To mitigate the impact of radiation particles on circuit reliability, a novel low- overhead NLC-TNUTL latch resistant to triple-node upsets is proposed. The design combines dual-mode redundancy technology with an interlocking mechanism based on the polarity inversion principle of transient pulses and input-separated inverters. HSPICE simulations and PVT variation analyses demonstrate that, compared to state-of-the-art radiation-hardened latches with equivalent fault tolerance, the proposed latch exhibits lower power consumption, reduced delay, and smaller area overhead. Additionally, it shows moderate sensitivity to threshold voltage, supply voltage, and temperature fluctuations while maintaining excellent cost-effectiveness.

An adaptive cache management method for multi-layer recursive DNS

CHEN Chuyi, LUO Xiongfei, YAN Baotong, FENG Yuxuan, MA Ke, QIAO Ying

2025, 47(5): 823-831. doi:

Abstract ( 190 )

PDF (1773KB) ( 375 ) 　　

The domain name system (DNS) is a core infrastructure of the internet, and its service quality and efficiency directly impact the operation of the internet. To optimize DNS performance and improve domain name resolution efficiency, this paper proposes an adaptive cache management method for multi-layer recursive DNS. This method dynamically adjusts the cache content of domain name servers based on changes in DNS traffic, thereby enhancing the cache hit rate of the DNS and reducing domain name resolution time. Experiments demonstrate that the designed multi-layer recursive DNS with adaptive cache management achieves a higher cache hit rate and shorter DNS response latency compared to traditional multi-layer recursive DNS, delivering significant acceleration effects.

An information hiding scheme of two-level QR code using extended Hamming code

ZHANG Lina, XIN Peng, HOU Minghui, LIU Miao, YUE Hengyi

2025, 47(5): 832-842. doi:

Abstract ( 223 )

PDF (1761KB) ( 248 ) 　　

The quick response code (QR code) is a type of two-dimensional barcode known for its fast decoding speed and strong error correction capability, and it has been widely applied in various fields of daily life. However, due to the publicly available encoding rules of QR codes, when using QR codes as a carrier for transmitting secret information, some previous related schemes have faced issues such as leakage of secret information and limited payload capacity. To address these problems, this paper pro- poses a new information hiding scheme for QR codes. This scheme combines a Sudoku matrix to design a new mapping function and extends the traditional (7,4) Hamming code to develop a corresponding flipping rule table. On the basis of ensuring correct interpretation of the public information in the QR code, it achieves a higher level of decryption difficulty and greater information hiding capacity. Experimental analysis shows that the constructed codeword mapping function can prevent attackers from cracking the secret information based on the carrier QR code. Compared with existing steganography schemes, this scheme improves both the embedding payload capacity and the decryption difficulty coefficient.

A WSN data stream anomaly detection algorithm based on GATv2-TCN joint optimization

SU Yuhang, MA Jun, FAN Jinyu, CHEN Bohang, ZHOU Jiacheng, YIN Boran

2025, 47(5): 843-850. doi:

Abstract ( 223 )

PDF (1096KB) ( 205 ) 　　

In sensor networks, anomaly detection in data streams enables timely fault detection and alerting, ensuring the safe and reliable operation of the system. However, WSN (Wireless Sensor Network) data stream anomaly detection still faces two major challenges: 1) the complex correlations among different time series need to be further explored; 2) anomaly samples in datasets with extremely unbalanced normal/anomaly distributions are difficult to detect. This paper proposes an anomaly detection algorithm based on GATv2-TCN(Graph Attention Network version 2-Temporal Convolutional Network). GATv2 and TCN are used to model complex relationships in both feature and temporal dimensions, and the prediction and reconstruction modules are optimized. Four datasets are employed to validate and analyze the performance of the proposed algorithm. Experiments show that the proposed algorithm achieves high F1 and AUC scores, particularly outperforming baseline models across various metrics for unbalanced datasets, demonstrating its effectiveness in WSN data stream anomaly detection.

A slice-level vulnerability detection method based on hyperbolic graph convolutional neural network

CHEN Xu, CHEN Zixiong, JING Yongjun, WANG Shuyang, SONG Jifei

2025, 47(5): 851-863. doi:

Abstract ( 221 )

PDF (990KB) ( 788 ) 　　

Addressing the challenges in the field of source code vulnerability detection, particularly the shortcomings of existing methods in accurately embedding code graphs and capturing their complex hierarchical structures, this paper proposes an innovative slice-level source code vulnerability detection method based on hyperbolic graph convolutional neural network (HGCN), termed VulDHGCN. This method integrates the powerful expressive capabilities of graph convolutional neural networks and hyperbolic geometry to more comprehensively embed and preserve the structural features of source code, effectively reducing information distortion during the code graph embedding process. To comprehensively evaluate the effectiveness of VulDHGCN, three traditional rule-based static vulnerability detection methods and three advanced model-based vulnerability detection methods are selected as comparison baselines. Experimental results demonstrate that VulDHGCN outperforms the baseline methods across multiple key performance indicators. Specifically, VulDHGCN achieves accuracy, precision, recall, and F1 scores of 96.52%, 92.31%, 85.12%, and 88.57%, respectively. Compared to the baseline vulnerability detection methods, VulDHGCN exhibits a significant advantage with an improvement in F1 score ranging from 6.62% to 153.92%. This not only validates the effectiveness of the VulDHGCN method but also provides a new perspective and approach for the further application of deep learning in the field of source code vulnerability detection.

Multi-scale fully aggregated network for spatiotemporal fusion of remote sensing images

YU Zhiyuan, SONG Huihui,

2025, 47(5): 864-874. doi:

Abstract ( 495 )

PDF (1582KB) ( 277 ) 　　

Spatiotemporal fusion is designed to generate remote sensing images with high spatio- temporal resolution. Currently, most spatiotemporal fusion models usually use convolution operations for feature extraction and cannot model the correlation of global features, which limits their ability to capture long-range dependencies. At the same time, due to the significant difference in spatial resolution of the images, it becomes very difficult to reconstruct the detailed texture. To solve these problems, a multi-scale full aggregation network model for spatiotemporal fusion of remote sensing images is proposed in this paper. Firstly, this paper introduces an improved Transformer encoder structure to learn the local and global time features in the images, and effectively extracts the temporal and spatial texture information contained within the images by modeling pixel interaction in space and channel dimensions. Secondly, a multi-scale hierarchical aggregation module, including local convolution, mesoscale self- attention and global self-attention, is designed to provide full-scale feature extraction capability, which helps to compensate for the feature loss in the model reconstruction process. Finally, the adaptive instance normalization and weight fusion module are used to learn the texture transfer and local changes from coarse image to fine image to generate the fusion image with global spatiotemporal correlation. Comparative experiments were conducted between the proposed model and five representative spatio- temporal fusion models on two benchmark datasets, CIA and LGC. Experimental results demonstrate that the proposed model outperformed all baseline models across five evaluation metrics.

Efficient digital halftone calculation based on Floyd-Steinberg error diffusion

LIAN Kaicheng, YANG Chen, ZHU Jiawei, CHAI Zhilei,

2025, 47(5): 875-884. doi:

Abstract ( 221 )

PDF (1517KB) ( 211 ) 　　

In response to the issues of severe data dependency, low parallelism, and poor real-time performance of the mainstream digital halftone algorithm (the Floyd-Steinberg error diffusion algorithm) adopted in industry when dealing with increasingly large image data, an efficient computation algorithm is proposed. Firstly, a pre-generated pixel-error diffusion value lookup table is utilized to avoid frequent calculation of error and diffusion process. Secondly, memory access optimization is achieved through an efficient data structure based on row buffering. Then, a single instruction, multiple data (SIMD) parallel method for error accumulation is proposed, which uses AVX-512 instruction set parallelism to accumulate errors in the same direction for multiple pixels, enhancing the role of vector registers in the CPU. Finally, a multi core data parallelism method with edge error-constrained column blocking is implemented to eliminate errors caused by data dependency in boundary parts during data parallel processing. Experimental results demonstrate that the proposed algorithm exhibits good scalability, with computational performance linearly increasing with the optimal number of parallel cores. Compared with the traditional Floyd-Steinberg error diffusion algorithm, when processing a 5 120×5 120 grayscale image on a 16-core Intel CoreTM i7-11700 CPU platform, the proposed algorithm achieves a 15-fold performance improvement, completing the task in just 23 ms. This better meets the needs of industrial high-speed printing for large-scale, super-large format, ultra-high resolution, and varied content.

An image encryption algorithm based on double random phase encoding with double chaotic system and compressed sensing

ZHAO Xueyan, ZHANG Zhao, JIA Jingwen, ZHOU Hongyan, CHEN Xuebo

2025, 47(5): 885-893. doi:

Abstract ( 262 )

PDF (2056KB) ( 464 ) 　　

A novel image encryption algorithm is proposed by combining 1D Logistic chaotic system, 4D new hyperchaotic system (NHS), compressed sensing (CS), double random phase coding (DRPE), and 2D discrete cosine transform (DCT). Firstly, 2D DCT is used to represent the gray image sparsely, and the sparse matrix obtained by index sort scrambling is scrambled. Secondly, the measurement is made using CS, and the measurement matrix is generated by the 4D NHS. Finally, the second encryption of DRPE is carried out, the Logistic chaotic mapping and 4D NHS are used as the double key to realize DRPE, and the final encrypted image is obtained. The algorithm makes full use of the advantages of CS to realize compression and encryption at the same time, and combines CS with DRPE, which not only reduces the storage space and transmission bandwidth, but also improves the security performance of encryption. Simulation experiments and comparative analysis show that the proposed image encryption algorithm has good security, robustness and decryption quality.

Cross-modal image emotion perception captioning based on generative adversarial network

YANG Chunmiao, WANG Yang, HAN Liying, SUN Hebin

2025, 47(5): 894-901. doi:

Abstract ( 216 )

PDF (1566KB) ( 378 ) 　　

Image captioning is a cross-modal task, which aims to produce texts conforming to the image content based on visual information. Although some achievements have been made in image caption- ing, it still has improved space in the aspects of fine-grained affective semantic feature capture and the emotional delicacy of descriptions. Addressing this problem, a model is proposed, which based on generative adversarial network to generate aspect-level emotional language descriptions. With the codec structure integrating the two-modal attention mechanism as the generator and the convolutional neural network as the discriminator, the accuracy of the model in cross-modal emotion matching and the reliability of generating emotion statements are improved. Transfer learning and RMSProp optimization algorithm are introduced to improve the interpretability of the model. Finally, the experiment is carried out on the MSCOCO and SentiCap datasets，the model exhibits excellent convergence performance and attains a high accuracy rate.

A feature fusion recommendation model based on attention mechanism

MA Handa, LI Tengfei

2025, 47(5): 902-911. doi:

Abstract ( 390 )

PDF (1045KB) ( 872 ) 　　

Addressing the current challenges in recommendation systems, which include difficulties in obtaining feature information and the lack of effective methods to represent the weights of feature information, this study proposes a recommendation model based on the attention mechanism and feature fusion, named FFADeepCF_SPS. Firstly, to address the inadequate feature representation, the Factorization Machines (FM) are employed to fuse features, extending them from one-dimensional to high- dimensional space to obtain low-order feature representations. Subsequently, a Deep Neural Network (DNN) is used to learn high-order features, and the two types of features are combined through a fully connected layer to obtain the required feature representation. Secondly, to address the issue of excessive weight skewing in the single-head attention mechanism, a multi-head attention mechanism is adopted, where the input is divided into multiple single heads to calculate their attention weights separately. The results from each head are then concatenated through a linear transformation to obtain the final output. Finally, combining the above two points, a recommendation model based on the attention mechanism and feature fusion is constructed. To validate the effectiveness of the model, comparative experiments and ablation studies are conducted on four public datasets against baseline models such as GMF, DeepCF_SPS, and CNN-BiLSTM. The experimental results show that the proposed model outperforms the baseline models in terms of MSE, RMSE, and MAE evaluation metrics across datasets of different sizes.

An evolutionary reinforcement learning algorithm based on stochastic symmetric search

DI Jian, WAN Xue, JIANG Limei,

2025, 47(5): 912-920. doi:

Abstract ( 258 )

PDF (850KB) ( 608 ) 　　

The introduction of evolutionary algorithm has greatly improved the performance of reinforcement learning algorithms. However, existing algorithms based on evolutionary reinforcement learning (ERL) still suffer from the problems such as susceptibility to fall into deceptive rewards, easy convergence to local optimums and poor stability. To address these problems, a stochastic symmetric search strategy is proposed. It acts directly on the policy network parameters, and guides the global policy network parameter optimization update by the optimal policy network parameter based on the central of the policy network parameter. Besides, it is supplemented by gradient optimization to guide the intelligentsia for multivariate exploration. Experimental results in five continuous control tasks of robot motion in MuJoCo show that the proposed algorithm outperforms previous evolutionary reinforcement learning algorithms and has a faster convergence rate.

Research on EEG signal emotion analysis based on asymmetric spatial features

WANG Ying, YANG Qing , WANG Xiangyu , ZHANG Yong,

2025, 47(5): 921-930. doi:

Abstract ( 165 )

PDF (978KB) ( 719 ) 　　

The asymmetry of the brain will have an impact on EEG emotion analysis,but many studies have not considered this property.Combined with the asymmetry of brain space,this paper proposes a hybrid model,which uses multi-scale convolutional neural network to extract the EEG spatial features of left and right asymmetry of the brain,then uses bidirectional long short-term memory neural network to extract temporal features,and finally learns the relationship between features through the multi-head self-attention mechanism.The proposed model is experimentally validated on the publicly available DEAP dataset.The accuracy and F1-score for classifying the arousal dimension are 93.11% and 93.46%, respectively,while those for the valence dimension are 92.12% and 93.27%.Furthermore,the model is validated on the publicly available MAHNOB-HCI dataset,achieving accuracy and F1-score of 98.58% and 97.98% for the arousal dimension,and accuracy and F1-score of 98.76% and 98.25% for the valence dimension.The results demonstrate that the proposed model exhibits certain advantages in EEG-based emotion recognition.Furthermore,ablative experiments confirm the significance of the asymmetrical spatial layer.

Research on Chinese—traditional Mongolian cross-lingual summarization methods in low-resource scenarios

BAN Qi, YUN Jing, DENG Lei,

2025, 47(5): 931-939. doi:

Abstract ( 208 )

PDF (1922KB) ( 293 ) 　　

The cross-langual summarization aims to generating a summary in the target language (such as traditional Mongolian) given a source document in one language (such as Chinese).Typically,traditional multi-task frameworks employ sequence-to-sequence networks,which apply multiple decoders,each dedicated to a specific task.However,when documentation is translated from one language into another,the above structures cannot effectively capture and understand the relationships and differences between the two languages due to the different morphological and structural characteristics of both languages.This is particularly evident in the case of traditional Mongolian,where its complex morphological changes and diverse word formation patterns make the learning and processing of language features under low-resource conditions challenging.To address this challenge,we propose a cross-lingual summarization model that embeds consistency learning into a multi-task framework.Model consistency by calculating the distance metric of the probability distribution difference between the source language summary and the generated target language summary.Subsequently,the cross-lingual summarization model is optimized under the constraints of both cross-entropy loss and consistency loss.Furthermore,we built a Chinese—Mongolian cross-lingual summarization dataset.The competitive ROUGE scores obtained on this dataset demonstrate the effectiveness of the proposed model in resource-poor conditions.

Heterogeneous ensemble learning with feature subspace augmentation for imbalanced data

CHEN Lifang, BAI Yun, SHI Yonghui, DAI Qi

2025, 47(5): 940-950. doi:

Abstract ( 212 )

PDF (938KB) ( 219 ) 　　

For imbalanced data, traditional classifiers tend to identify the majority class at the expense of accuracy for the minority class, leading to degraded overall algorithm performance. To address this issue, a heterogeneous ensemble learning algorithm with feature subspace augmentation (HEL-FSA) for imbalanced data is proposed. Firstly, using the XGBoost algorithm to learn the importance of features and selects important features to form a feature subspace for the dataset. Secondly, the SMOTE algorithm is used to generate new samples within this feature subspace, obtaining more balanced training data. Thirdly, five classifiers, named Logistic Regression, Decision Tree, Multi-Layer Perceptron, Support Vector Machine, and XGBoost are employed as base models, and the heterogeneous base models are fused using the if_any algorithm. Experimental results on nine imbalanced datasets verify the feasibility of the proposed algorithm. Additionally, when applied to cervical cancer risk prediction, the proposed algorithm enhances the ability to understand and predict cervical cancer risk.

Current Issue

Author center

Review center

Online journal