Computer Engineering & Science

Design of modular arithmetic acceleration for privacy computing

LIU Hongwei1, ZHI Liang2, QIN Mengyuan1, CHEN Mingzhi1, DONG Wenkuo1, HAO Qinfen1

2025, 47(8): 1331-1342. doi:

Abstract ( 744 )

PDF (4260KB) ( 521 ) 　　

Privacy computing technology serves as a crucial means to ensure data security in data centers.With the advancement of quantum computing,lattice-based post-quantum algorithms and fully homomorphic encryption algorithms have gradually gained prominence.In these algorithms,modular arithmetic serves as one of the widely used nonlinear operations,primarily employed to prevent overflow during computations.This paper addresses the extensively utilized modular arithmetic in privacy computing and cryptographic applications,proposing a hardware-software co-design acceleration framework implemented on FPGA platforms via PCIe interfaces.The framework effectively masks communication latency and supports modular operations of up to 2 048 bits—including modular multiplication and modular exponentiation—to serve data center scenarios with privacy computing requirements.While existing researches primarily focus on modular operations themselves,our co-designed framework delivers a comprehensive acceleration solution that encompasses not only computational cores but also data interfaces,hardware-software interaction mechanisms,and optimized communication latency mitigation.Finally,we implement a tailored acceleration application for a specific telecom operator scenario,experimentally demonstrating the performance advantages of the proposed system.

Design and implementation of an instrumentation tool based on FT-X DSP tracing

WEI Zhen1, 2, YUAN Yulei1, LIU Yuehui1, 2, MO Jiasheng1, 2, HU Xiao1, 2

2025, 47(8): 1343-1353. doi:

Abstract ( 374 )

PDF (2489KB) ( 170 ) 　　

Program instrumentation technique encompasses both dynamic technique and static technique,primarily employed for dynamic analysis during program execution.It is widely applied in areas such as vulnerability discovery,defect detection,performance analysis and optimization,serving as a key method for collecting program execution paths and analyzing function calls.In embedded systems,traditional instrumentation methods often face challenges due to constraints such as the absence of an operating system,complex architecture,and limited memory.This paper focuses on static instrumentation algorithms,addressing the instrumentation requirements in embedded system debugging scenarios.In addition to introducing the fundamental principles of program instrumentation and systematically analyzing current typical instrumentation methods,we designed and implemented Dbtrace,an instrumentation tool based on FT-X DSP tracing.Furthermore,to address the overhead issue,we comprehensively measured the execution time overhead and code expansion rate of different instrumentation schemes,comparing them with an uninstrumented program.Experimental results demonstrate that Dbtrace can effectively track and record program execution traces information while reducing memory usage and instrumentation overhead,providing an efficient solution for instrumentation debugging in embedded systems.

Analysis and optimization of the strip vertical differential structures

ZHANG Huan, LI Tao, HU Jin

2025, 47(8): 1354-1363. doi:

Abstract ( 486 )

PDF (3608KB) ( 237 ) 　　

With the continuous improvement of differential signal rate,the impact of traditional differential via discontinuity on signal integrity is becoming more and more noticeable.In order to solve the problem of signal reflection,attenuation and impedance discontinuity in the vertical direction of multi-layer PCB,it is urgent to carry out the research on new vertical differential structure.Firstly,the three-dimensional electromagnetic simulation software HFSS is used to construct two types of strip vertical differential structure,Model I and Model II,and the traditional differential via model.Secondly,the transmission performance of the two strip vertical differential structures are simulated and analyzed,and it is found that Model I outperformed Model II in transmission performance.Thirdly,the influence of structural parameters on the electrical performance transmission parameters of the Model I type of strip vertical differential structure is analyzed.Finally,the electrical performance transmission parameters of the Model I type of strip vertical differential structure and the traditional differential via are compared and analyzed,and verified in the timedomain eye diagram.The results show that the transmission performance of the Model I type can be improved by reducing the size of the surface connection pad,the depth of the vertical conductor,and the length of the stub,increasing the diameter of drill hole and the size of anti-pad.Compared with the traditional differential via conversion,the Model I type conversion increases the eye height and eye width by 4.47% and 4.31% respectively,and reduces the jitter by
57.16%,the Model I type has better timedomain transmission performance.

Comparison and analysis of TAGE-based and neural-based branch predictors

ZHENG Weiwei, ZHENG Zhong, CHEN Wei, LU Hongyi

2025, 47(8): 1364-1380. doi:

Abstract ( 427 )

PDF (6170KB) ( 718 ) 　　

With the increasing demand for processor performance,superscalar and deeply pipelined techniques have been widely adopted in modern microprocessors to enhance instruction-level parallelism.However,conditional branch instructions in programs pose a challenge to continuous pipeline execution,limiting the potential for parallel instruction processing.To address this control hazard,branch prediction techniques have been developed,with the core objective of speculatively determining the direction and target address of branch instructions,thereby minimizing pipeline stalls caused by branch instructions.This paper presents a comparative analysis of two mainstream branch predictors—TAGE-based and Neural-based approaches—under a unified performance evaluation framework.Experimental results demonstrate that different branch predictors exhibit distinct preferences for specific traces,suggesting that hybrid prediction mechanisms could further unlock prediction potential.Additionally,the influence of execution context on branch prediction performance cannot be overlooked,particularly in multi-process environments.Furthermore,this paper reveals that current CNN-based predictors exhibit unstable performance when handling complex branch patterns,with their overall accuracy yet to surpass the baseline TAGE-SC-L predictor,indicating a need for further optimization.

Research on Retimer structure and key technologies for Chiplet interconnection

SUN Yubo1, ZHOU Hongwei2, 3, SUN Xingyu2, 3, HE Xingyang2, 3, SONG Zhaoyang2, 3, CHEN Zhiqiang2, 3

2025, 47(8): 1381-1390. doi:

Abstract ( 371 )

PDF (2759KB) ( 392 ) 　　

Connecting multiple dies through Chiplet interconnect interfaces has become the mainstream of chip design in the post-Moore era.The Chiplet interconnection interface circuit is only used for interconnection of multiple Chiplets within a single package,with an extremely short transmission distance.In large-scale computing systems,multiple chips need to build larger-scale computing nodes.How to achieve long-distance interconnection of Chiplets in multiple chips at the board-level has become a very important issue.Intel and others have defined a Retimer for Chiplet interconnection interfaces in the universal Chiplet interconnect (UCIe) specification,but the architectural details are not disclosed.The research on Retimer for Chiplet interconnection interfaces in China is still blank.Combining with the formulation of the independent Chiplet interconnection interface standard,this paper proposes a Retimer (D2C_Retimer) architecture for Chiplet interconnection to chip interconnection,which supports the conversion of the die-to-die (D2D) interface into a chip-to-chip (C2C) interface,realizing the interconnection of Chiplets across chips at the board level.Through key technologies such as the reliable transmission mechanism of Retimer,the credit mechanism of Retimer,and the hierarchical sideband transmission link,it not only achieves compatibility with the independent Chiplet interconnection standard,but also has advantages in credit management,reliable transmission,etc.Experiments show that the implemented Retimer can realize long-distance interconnection across packages between Chiplets without changing the existing independent interconnection standard,which is of great reference significance and engineering implementation value for improving the domestic Chiplet interconnection ecosystem.

A large-scale scan of IPv6 IP-ID

HUANG Fengyuan, YANG Yifan, YU Bo, YANG Zhenzhong, CAI Zhiping, HOU Bingnan

2025, 47(8): 1391-1398. doi:

Abstract ( 344 )

PDF (1488KB) ( 186 ) 　　

In IPv6 networks,the Internet protocol identification (IP-ID) fields,which are used to support fragmentation and reassembly of network-layer datagrams,no longer appear as fixed fields but are instead placed in the extension header for flexible use.In recent years,researchers have exploited the IPv6 fragmentation mechanism to induce IPv6 target hosts to generate IP-IDs and perform tasks such as alias prefix resolution,demonstrating that the IP-ID field in IPv6 networks can still leak information and pose certain security risks.Since existing IP-ID exploitation methods rely on simple,predictable IP-ID types,probing whether the IP-ID types of IPv6 devices on the internet are predictable hold significant importance for IPv6 network security and asset assessment.This paper proposes a method to detect IPv6 devices on the Internet,and classifies them into different types.Among the nearly 5 million IPv6 addresses returned,41.1% of the addresses still used predictable IP-ID,indicating that IPv6 networks are not immune to fragment and IP-ID based attacks.There are still a considerable number of devices in IPv6 network using predictable IP-ID which are of high security risk.

A searchable encrypted electronic medical record data sharing scheme in the blockchain

LI Yahong1, 2, LI Zhewei1, LI Qiang1, WANG Caifen3, ZHANG Xuejun1

2025, 47(8): 1399-1407. doi:

Abstract ( 424 )

PDF (1510KB) ( 155 ) 　　

A searchable encrypted electronic medical record data sharing scheme in blockchain is proposed for the problems of data security,storage,and data sharing existing in electronic medical records.Firstly,the proposed scheme uses a cloud server to store electronic medical records and re-encrypts the corresponding ciphertexts to ensure data sharing among different medical institutions.Secondly,the blockchain is used to store indices.In the search phase,a smart contract is invoked on the consortium blockchain to execute keyword ciphertext search,which realizes secure storage of indexes and reduces the risk of malicious searches by semi-honest third parties.At the same time,this scheme hides conditions in re-encryption keys to ensure data confidentiality,ensuring that agents cannot learn any information about the conditions.Finally,analysis shows that the scheme is lightweight and has great advantages in terms of computational and communication overhead.

CNN-ViTAMR:A Transformer-based automatic modulation recognition algorithm and its light-weighted implementation

LIU Chang, XU Weixia

2025, 47(8): 1408-1416. doi:

Abstract ( 1010 )

PDF (1889KB) ( 856 ) 　　

With the rapid development and widespread adoption of technologies such as the Internet of Things (IoT),5G communications,wireless ad hoc networks,and unmanned swarm systems,automatic modulation recognition (AMR) has found extensive applications in wireless communications,radar signal processing,electronic warfare,and other domains,while progressively penetrating into edge intelligent terminal devices.Consequently,the development of light-weight intelligent modulation recognition algorithms and their implementation has emerged as one of the critical challenges to be addressed in the field of communications.Traditional signal modulation recognition algorithm models based on CNN and RNN fail to accurately capture the global characteristics of signals,thus exhibiting certain limitations in AMR tasks.In recent years,the Transformer technology,leveraging the global feature extraction capability of its built-in multi-head self-attention mechanism,has broken through the generalization constraints of DNN models and achieved significant breakthroughs in timeseries information processing.To address these challenges,this paper proposes an AMR algorithm model based on the Transformer structure.The model embeds a CNN-based Tokenization module into the Transformer,enabling it to combine the global information extraction ability of the Transformer and retain the local time series features inside the Token,thereby ensuring the recognition accuracy of the algorithm.At the same time,due to the small number of parameters of the model,it is suitable for deployment on edge device terminals.Evaluation results on the Zynq Ultrascale+MPSoC platform demonstrate that,compared to the software implementation running on a higher-frequency CPU platform,the FPGA-based hardware acceleration solution achieves a significant speedup of up to 2.47× while operating at a lower clock frequency.

A novel malicious domain detection approach based on multi-perspective spatiotemporal alignment learning

JIN Xueqi1, 2, XU Hongquan3, HUANG Yinqiang4, SUN Zhihua5

2025, 47(8): 1417-1424. doi:

Abstract ( 729 )

PDF (1536KB) ( 160 ) 　　

Aiming at the problems of insufficient utilization of domain name string information and loss of global encoding features in current malicious domain detection methods,this paper proposes a novel malicious domain detection approach based on multi-perspective spatiotemporal alignment learning.Firstly,the domain name string is embedded into an image,and a denoising autoencoder network combined with a convolutional neural network (CNN) is employed to encode the domain name string into textual and visual feature spaces,constructing a multi-perspective feature set.Next,the feature maps are downsampled into different-scaled feature layers,and gradient information is learned through layer-by-layer iterative training to enhance the semantic representation capability of the features.Finally,a cross-attention mechanism is introduced to align the textual and visual feature maps.A prototype set is constructed using global average pooling on the aligned feature maps,enabling rapid determination of the legitimacy of a test domain by associating its features with the prototypes.Extensive experiments on public datasets,including binary- and multi-class classification tasks,demonstrate the superiority of the proposed approach.

BF-YOLO:An improved small object detection algorithm based on YOLOv8

PU Xiaoli, LAI Huicheng, GAO Guxue

2025, 47(8): 1425-1436. doi:

Abstract ( 942 )

PDF (7736KB) ( 267 ) 　　

To address the issues of low detection accuracy and large model size in existing object detection algorithms for UAV-captured images,this paper proposes an improved YOLOv8-based object detection algorithm named BF-YOLO.Firstly,the output detection layer of the network is reconstructed to enhance its capability for detecting small objects.Secondly,receptive field attention convolution is introduced to replace the standard convolution,enabling the network to focus on object location information and improving its ability to learn object features.Additionally,a multi-scale feature extraction module is designed,utilizing multiple grouped convolution units to capture object information at different receptive fields,thereby reducing the number of parameters while improving detection accuracy.Finally,a weighted bidirectional feature fusion method is incorporated into the neck network to enhance multi-scale feature fusion,boosting the model’s ability to recognize objects of varying scales.Experimental results on the VisDrone-DET2019 dataset demonstrate that the improved algorithm achieves a 7.3% increase in mAP50 compared to YOLOv8s,while reducing the model’s parameter count by 67.1%,effectively balancing detection accuracy and model lightweightness.

An improved multi-scale fusion YOLOv7-tiny algorithm based on Ghost efficient layer aggregation network

OUYANG Yuxuan, ZHANG Rongfen, LIU Yuhong, PENG Yaopan

2025, 47(8): 1437-1448. doi:

Abstract ( 753 )

PDF (3215KB) ( 321 ) 　　

To address the common issues of excessive parameters,slow inference speed,limited detection performance,and difficulty in deploying neural networks on edge devices,this paper proposes an improved YOLOv7-tiny algorithm.Firstly,according to the characteristics of the original algorithm model structure,Ghost-ELAN module is introduced to compress the model greatly.Secondly,Ghost Bottleneck-2 is used to replace the convolution of the Neck part of the network,which further reduces the scale of the model.Then,the multi-scale fusion module Ghost-SPPCSPC is used to improve the understanding of feature information of the model,and the output layer convolution is replaced by GhostConv,which reduces the redundancy of common convolution and makes the maximum use of semantic information in the network.Finally,transfer learning is employed for enhancing generalized feature learning and improving performance of the model.Experimental results demonstrate that the improved model reduces parameter count and model size by 57.19% and 55.28%,respectively,achieving substantial compression over the original model while enhancing accuracy.With an inference speed of 278,the proposed model attains rapid,efficient,and lightweight objectives,making it highly suitable for deployment on edge devices.

Generative image detection based on fine-grained local artifacts

YUAN Chengsheng 1, 2, CHEN Jinrui 1, 2, XU Chenwei 3, LIU Qingcheng 1, 2, FU Zhangjie 1, 2

2025, 47(8): 1449-1458. doi:

Abstract ( 845 )

PDF (2267KB) ( 355 ) 　　

With the rapid development of artificial intelligence technologies,images generated by models such as generative adversarial network (GAN) and diffusion model have reached a highly realistic level that it is difficult for the human eye to recognize the authenticity.Existing detection techniques show good performance under specific conditions,but their generalization abilities are usually unsatisfactory when facing images generated from unknown models and data.To address the above problems,this paper proposes a two-branch framework based on fine-grained local artifacts,which fully exploits the global spatial features of the image as well as the feature information extracted from multiple local regions.The artifacts caused by upsampling operations at the fine-grained level in the spatial domain,which are common in generative images,are exploited and combined with the global structural information of the image and the local detail information to enhance the generalization ability of the detection model in coping with different scenarios.With this strategy,the proposed method is able to analyze the image content more comprehensively and identify the unique fingerprints of the synthetic images,and shows stronger robustness and accuracy in identifying AI synthetic images.Experimental results show that the proposed method exhibits good performance when dealing with datasets generated by various GANs and diffusion models,further verifying the method’s excellent generalization ability.

Dynamic spatial Transformer and multi-level fusion algorithm for retinopathy grading

LIANG Liming, ZHONG Yi, KANG Ting, JIN Jiaxin

2025, 47(8): 1459-1469. doi:

Abstract ( 388 )

PDF (6294KB) ( 333 ) 　　

To address the issues of misgrading and insufficient focus on lesion edge information in diabetic retinopathy images,a retinopathy grading algorithm combining dynamic spatial Transformer and multi-level fusion is proposed.Firstly,the retinal images are processed through the PVT v2 backbone network for initial extraction of lesion information.Secondly,a contour enhancement module is introduced in the first three layers of the network to highlight lesion edge features,thereby improving the algorithm’s localization perception of lesion pixels.Thirdly,a dynamic spatial attention module is designed at the network’s lower layers to effectively connect global and local spatial information,enhancing the algorithm’s ability to extract deep semantic information.Finally,a multi-level gated fusion module is constructed to filter out non-diagnostic information while performing multi-level fusion of diagnostic information,further improving the accuracy of retinopathy grading.Experiments on IDRID and APTOS 2019 datasets show that the QWK are 91.71% and 89.89% respectively,the Acc on IDRID dataset and the AUC on APTOS 2019 dataset are 79.61% and 93.06% respectively.The experimental results demonstrate that the proposed algorithm has significant application value in the field of retinopathy grading.

Evidence span prediction based on bidirectional superposition attention in DBQA

TURDI Tohti1, 2, LUO Changhong1, 2, ASKAR Hamdulla1, 2

2025, 47(8): 1470-1482. doi:

Abstract ( 508 )

PDF (1623KB) ( 173 ) 　　

Document-based question answering (DBQA) generally relies solely on the one-way matching relationship between documents and questions to locate evidence spans and generate answers.However,capturing concise evidence spans is difficult when facing semantic challenges such as distant interference and multiple answer words.To address this issue,an evidence span prediction model ESP-BSA based on a bidirectional superposition attention mechanism is proposed.Firstly,the implicit interaction of text semantics is enriched by cross-matching the question with the text.Secondly,soft evidence label pairs are designed based on the heterogeneity of evidence distribution to represent the forward and backward evidence scores.Finally,the evidence scores at each position in the bidirectional stacked sequence are superposed to obtain evidence spans that better meet the contextual requirements.Experimental results demonstrate that the proposed model improves the precision of evidence span prediction and the accuracy of question answering in complex contexts,as evidenced by respective improvements in Span-F1 and Span-EM evaluation metrics compared to baseline models.

A link prediction model based on dense convolution and multi-feature perception

LIU Jinzhu, ZHANG Dong, LI Guanyu

2025, 47(8): 1483-1492. doi:

Abstract ( 296 )

PDF (1405KB) ( 164 ) 　　

ConvE applies convolutional neural network (CNN) to link prediction tasks,and its outstanding performance has attracted significant attention in academia.However,CNN-based models like ConvE still inadequately extract graph structural information and fail to consider the multi-feature attributes of relations in knowledge graphs.To fully leverage graph structural features and the multi-feature properties of relations,this paper proposes a novel link prediction model——ComConvR,which extracts the multi-feature representations of relations and incorporates dense convolutional blocks into the CNN.This enhancement strengthens the networks feature extraction capability and enables multi-feature fusion for link prediction.Experiments on four benchmark datasets demonstrate the effectiveness of ComConvR,supported by ablation studies and key parameter analyses that validate the efficiency and contribution of the dense convolutional blocks.

Scientific documents query expansion based on multi-dimensional meta-path in knowledge graph#br#

XU Jianmin, TONG Simeng, ZHANG Guofang

2025, 47(8): 1493-1502. doi:

Abstract ( 617 )

PDF (1800KB) ( 292 ) 　　

Aiming at the limitations of existing scientific document query expansion methods,such as insufficient utilization of document information and failure to effectively exploit inter-document relationships,a scientific document query expansion method based on multi-dimensional meta-path in the know-ledge graph is proposed.Firstly,the pseudo-relevant feedback document set is processed to obtain a candidate expansion term set.Then,based on the analysis of the scientific document knowledge graph,appropriate meta-paths are identified to represent the relationships between user queries and candidate expansion terms,and multi-dimensional semantic relevance scores between them are calculated based on different meta-path associations between nodes.Finally,the multi-dimensional semantic relevance scores and the weights of candidate expansion terms in the pseudo-relevant feedback document set are fused to select the final expansion terms,thereby achieving query expansion.Experimental results show that compared with existing query expansion methods,the proposed method improves mAP,DCG,and NDCG by at least 9.21%,10%,and 11.7%,respectively.

Low-resource multi-dialect Tibetan synthesis method based on Tibetan character components

WANG Jiawen1, 2, GAO Dingguo1, 2, NI Qiong1, 2, BA Guo1, 2

2025, 47(8): 1503-1510. doi:

Abstract ( 330 )

PDF (6228KB) ( 450 ) 　　

Tibetan synthesis is an important research direction in the field of artificial intelligence,which has significant implications for promoting the development and innovation of Tibetan language information processing.This paper proposes a corpus processing method based on Tibetan character components,aiming to reduce the difficulty of text processing,and adopts an end-to-end speech synthesis model to explore two low-resource multi-dialect Tibetan synthesis schemes.The experiments show that the proposed method can achieve multi-dialect speech synthesis with a single model trained on mixed datasets,improve the naturalness and expressiveness of speech,and achieve an average MOS of 4.56 for speech quality.

A probabilistic linguistic multi-attribute decision-making method based on a novel parametric distance

HUANG Shuai1, WANG Pei2, SHEN Zhen3

2025, 47(8): 1511-1520. doi:

Abstract ( 278 )

PDF (1372KB) ( 154 ) 　　

:Probabilistic linguistic term set (PLTS),composed of linguistic terms and their probability information,can effectively express uncertainties.When dealing with the problem of different numbers of linguistic terms in PLTS,this paper proposes a normalization method based on the greatest common divisor to make all terms have the same probability.Subsequently,a new parameterized distance is designed.By setting parameter values to represent terms of different scales,it solves the limitation that existing distance formulas rely on linguistic subscripts or specific scaling functions.In addition,for the case where attribute weights are unknown,this paper constructs a hybrid weight model by combining information entropy and dispersion to calculate weight information.Finally,a PLTS multi-attribute decision-making method is proposed by integrating the TOPSIS method,and the effectiveness and super-iority of the method are verified with the example of subway site selection.The results show that the method has strong applicability in both theory and practice.

Current Issue

Author center

Review center

Online journal