High Performance Computing
-
Design and implementation of a distributed shared buffer switch based on Crossbar structure
- YANG Qianming, SHAO Jingjie, ZENG Pin, YUAN Meng, SONG Zhuoqin, DENG Qiuyan, ZHANG Jianfeng, WANG Yong
-
2025, 47(6):
951-957.
doi:
-
Abstract
(
340 )
PDF (1126KB)
(
451
)
-
The performance of a switch is determined by its architectural implementations, such as the switching fabric, caching mechanism, and concurrent multi-port read/write operations. With the increase in the number of switch ports and port rates, how to enhance the multi-port data forwarding performance of switches has become a topic worthy of research. To meet the demands of multi-port data forwarding and non-blocking internal data exchanges, this paper proposes a distributed shared buffer architecture for Ethernet switches based on a Crossbar structure. First, it adopts a fully connected Crossbar-based input caching structure to ensure non-blocking input for multi-port data. Second, the switching fabric innovatively employs a distributed shared caching approach to improve data exchange rates. Finally, the design is simulated and verified on an FPGA development board. The results demonstrate that, compared to traditional switches, the proposed multi-port switch architecture with parallel read/write operations supports high-capacity data forwarding and effectively enhances data transmission bandwidth.
-
FT-Format: A configurable hardware code fast formatting tool
- CHEN Guixiang1, 2, LIU Sheng1, 2, GUO Yang1, 2
-
2025, 47(6):
958-967.
doi:
-
Abstract
(
205 )
PDF (1100KB)
(
541
)
-
Ensuring proper format of hardware code is easily overlooked but crucial in integrated circuit design as it directly impacts code readability and maintainability. While existing formatting tools have gained widespread application, they do have inherent limitations, especially for hardware description languages. To bridge this gap, after evaluating mainstream formatting tools, this paper proposes FT-Format, a Python-based hardware code formatting tool that enables efficient and rapid formatting while allowing user-customizable adjustments. To quantitatively assess the tool’s processing quality, two self-checking algorithms for error formats are designed. Experimental results demonstrate that FT-Format achieves an impressive average processing speed of 25 381 lines of code per second, and successfully passes validation via the self-checking algorithms. Furthermore, equivalence verification confirms that FT-Format maintains the logical consistency of hardware code throughout the formatting process.
-
Designing and optimizing RISC-V instruction set functionality based on multi-operand acceleration
- ZHANG Yu er, XI Yuhao, LIU Peng
-
2025, 47(6):
968-975.
doi:
-
Abstract
(
215 )
PDF (3768KB)
(
250
)
-
The RISC-V architecture, with its open and modular instruction set architecture (ISA) design, facilitates the integration of customized instructions tailored to specific applications and their software ecosystems, enabling efficient processing of complex algorithms and repetitive operations. However, designing acceleration instructions for RISC-V processors presents significant challenges, primarily due to limitations in operand quantity. Traditional acceleration methods typically adopt a 2-input-1- output model, which restricts the flexibility and efficiency of complex operations. To address these limitations, this method proposes a multi-operand acceleration mechanism that breaks the conventional 2-input-1-output constraint by providing a flexible interface for multiple inputs and outputs. The mechanism is validated through benchmark tests on an FPGA platform, including SHA-256, SHA-1, and FIR/IIR filter algorithms, conducted on Western Digital’s open-source RISC-V VeeR EH1 core. Experimental results demonstrate a performance improvement of up to 14% while maintaining hardware overhead at or below 3%. Compared to traditional 2-input-1-output acceleration methods, the proposed enhanced instruction set design significantly enhances the processing efficiency of RISC-V cores, demonstrating its superior capability in embedded computing and domain-specific acceleration applications.
-
An automated physical compiler for multi-port register files
- MING Tianbo1, LIU Biwei1, 2, HU Chunmei1, 2, WU Zhenyu1, 2, SONG Ruiqiang1, 2, SONG Fangfang1
-
2025, 47(6):
976-987.
doi:
-
Abstract
(
190 )
PDF (2559KB)
(
260
)
-
In the design of application-specific microprocessors, designers need to iteratively experiment with different architectural parameters to achieve optimal application support. Multi-port register files, as core components, still rely on full-custom design or traditional compiler-assisted design. However, these methods often struggle to balance high performance requirements with design flexibility, making it difficult to achieve co-optimization with the architecture. This paper proposes a physical compiler for multi-port register files, which can automatically and quickly generate register file circuits and layouts with specified capacity and port count. Additionally, this paper proposes an optimized port structure to enhance the parallel access performance of the register file and a performance-driven heuristic algorithm to achieve optimized placement and routing results. Experimental results show that the proposed compiler can generate register files in approximately tens of hours to meet co-optimization requirements, achieving 31.5% speed improvement and 28.8% power reduction compared to full-custom designs, as well as 20.7% higher speed and 33.9% lower power consumption relative to traditional compiler-assisted designs.
-
ReHuff:A Huffman coding hardware architecture based on ReRAM
- ZHENG Daowen1, ZHOU Yikai1, TANG Yibin2, 3, LIU Bosheng1, WU Jigang1
-
2025, 47(6):
988-997.
doi:
-
Abstract
(
168 )
PDF (1293KB)
(
322
)
-
With the rapid expansion of data volume in various application scenarios such as deep learning, the hardware overhead of communication and storage has significantly increased. Against this backdrop, the importance of compression methods has grown substantially. Huffman coding is one of the most representative and widely used compression methods, known for effectively compressing data and saving storage space without compro-mising data integrity. However, due to the limitations of hierarchical memory storage, traditional hardware solutions for Huffman coding face challenges of high latency and energy consumption. This paper proposes a hardware architecture named ReHuff, which leverages resistive random-access memory (ReRAM) to enable in-memory Huffman encoding, and designs a ReRAM-based Huffman coding mapping method to extract valid data. To address the mismatch between variable-length encoded data and fixed-length ReRAM blocks during mapping, a dual-stage variable-length data selection and segmentation approach is proposed, adapting to the architectural design to integrate variable-length outputs, thereby reducing energy consumption and improving ReRAM utilization efficiency. Simulation results demonstrate that the proposed design out-performs representative benchmarks, improving performance by 18.6 times and reducing energy consumption by 82.4%.
Computer Network and Znformation Security
-
Research on OpenVPN protocol subversion attack technology
- LI Ziyu, HE Jun, LIU Yixi
-
2025, 47(6):
998-1007.
doi:
-
Abstract
(
135 )
PDF (1088KB)
(
293
)
-
OpenVPN, as a typical application for channel encryption, faces severe threats from large-scale surveillance and subversion attacks on its security. By studying the communication characteristics of the four stages of the protocol and conducting a detailed analysis of the attack surfaces and conditions at each stage, researchers construct a security game model and attack framework for subversion attacks targeting the data encryption and decryption stages of the OpenVPN protocol. To address the challenge of traditional models struggling to accurately assess attack effectiveness in complex real-world communication environments, a definition of attack advantage is proposed, with data decryption probability serv- ing as the primary evaluation criterion. This aids in a more precise quantitative assessment of attack effectiveness under the influence of different factors. Based on this, an IV (Initialization Vector) replacement attack method is designed and instantiated, with the fundamental properties of the attack method proven. Systematic analysis and comparison of attack advantages against different encryption algorithms of the OpenVPN protocol are conducted, and specific mitigation measures are proposed.
-
A cross-chain decentralized identity authentication scheme based on relay chain
- DENG Haotian, WU Tong, ZHANG Chuan, ZHU Liehuang
-
2025, 47(6):
1008-1017.
doi:
-
Abstract
(
253 )
PDF (1643KB)
(
436
)
-
With the rapid development of blockchain technology, hundreds of different blockchain platforms have emerged. The heterogeneous characteristics of different blockchain platforms have brought new challenges:how to perform cross-chain identity authentication between heterogeneous blockchain systems to ensure the security of cross-chain interactions? Due to the heterogeneity of the underlying technologies and cryptographic systems used by different blockchain platforms, existing identity authentication technologies struggle to adapt to cross-chain identity authentication and mostly suffer from issues such as single-point failure risks, insufficient privacy protection, and low verification efficiency. In response to the above problems, this paper proposes a decentralized cross-chain identity authentication scheme based on a relay chain, which innovation is reflected in three aspects:firstly, by combining relay chain technology with decentralized identity authentication technology, a decentralized identity management system is constructed to achieve decentralized cross-chain identity authentication; secondly, privacy protection verification of identity credentials is realized based on zero-knowledge proof, effectively guarding against privacy tracing risks in cross-chain interactions; finally, a relay chain dynamic sharding protocol is designed to improve system throughput by processing verification tasks in parallel. Experimental evaluation shows that this scheme has complete functions and high efficiency, providing a feasible path for the secure interoperability of the cross-chain ecosystem.
-
A survey on artificial intelligence based congestion control
- LI Tianyun, LI Tao, WEN Dong, YANG Hui, ZHANG Yutao, LUO Xin, DONG Dezun
-
2025, 47(6):
1018-1027.
doi:
-
Abstract
(
328 )
PDF (1151KB)
(
252
)
-
With the rapid development of network applications and the increasing diversification of network scenarios, the design of congestion control algorithms faces unprecedented challenges. Artificial intelligence (AI) methods, leveraging their robust adaptability and decision-making capabilities, have become a focal point for both academia and industry. Consequently, AI-based network congestion control algorithms have emerged. This paper systematically reviews recent advancements in AI-based network congestion control research, analyzing technical approaches, application scenarios, training, and experimentation. Building on this analysis, future research directions are also explored.
-
A hybrid access control model based on smart contract for the Internet of Vehicle environment
- WEI Kexin1, LI Leixiao1, 2, SI Qin1, 2, SHI Jianping3
-
2025, 47(6):
1028-1040.
doi:
-
Abstract
(
234 )
PDF (1777KB)
(
255
)
-
When the Internet of Vehicle (IoV) traditional access control is to provide safer road driving and efficient traffic management, it has problems such as the inability to dynamically authorize vehicle permissions, the inability to authorize resources in a fine-grained manner, and the difficulty of tracing the historical communication between the two communication parties. To solve these problems, a hybrid access control model (BARV-BAC) combining role-based access control (RBAC) and attribute-based access control (ABAC) is proposed. Firstly, role-attribute rules, attribute authorization rules and access control policies are established. Secondly, role publisher smart contract (Role publisher-SC) and resource smart contract (RE-SC) are designed to achieve dynamic and fine-grained management of IoV. In addition, a digital signature is used to verify the ownership of the role and verify the reliability of the model. The experimental results show that in the simulated road environment, the average delay of the model is less than 200 ms, which is greatly improved compared with the traditional access control models, and the smart contract cost is significantly reduced compared with other access control models. It is further verified that BARV-BAC has high efficiency, practicability and security in vehicle network access control.
-
MATLAB-to-Python code converter based on AST
- GUO Rui1, XU Wenhao1, XIE Pengzhi2, YANG Wei2, SONG You1
-
2025, 47(6):
1041-1049.
doi:
-
Abstract
(
265 )
PDF (623KB)
(
691
)
-
MATLAB is widely used in various stages of industrial product development. However, in practical engineering applications, the mechanistic models built in MATLAB often need to be decoupled from the MATLAB runtime environment and integrated into real-world engineering systems. Thus, a tool for rapidly converting MATLAB models into deployable engineering solutions is required. To address this, this paper proposes M2P (MATLAB-to-Python), a converter based on abstract syntax tree (AST). The converter involves transforming source code into an AST structure, analyzing it, and applying substitution rules to generate equivalent Python code. Comparative experiments demonstrate that the proposed converter achieves superior conversion performance compared to existing MATLAB-to-Python converters.
-
Formal verification of Stokes’ theorem and its applications
- LIU Yongmei1, 2, WANG Guohui1, GUAN Yong2, ZHANG Jingzhi2, SHI Zhiping1, 2, DONG Lu1
-
2025, 47(6):
1050-1061.
doi:
-
Abstract
(
170 )
PDF (618KB)
(
656
)
-
Stokes’ theorem is one of the fundamental theorems in field theory, with extensive applications in fluid mechanics, electromagnetics, and other domains. However, in practical applications, the satisfaction of its prerequisite conditions is often not rigorously verified, which introduces certain risks. Therefore, it is necessary to validate Stokes’ theorem. This paper constructs a formal model of Stokes’ theorem based on its mathematical definition. By analyzing the mathematical proof process of the theorem, this paper derives the derivation methodology for its formal verification. Following the analysis, construction, and verification objectives, this paper ccompletes the formal proof of the theorem. Finally, the proven Stokes’ theorem is applied to the validation of the pipeline flow design models.
-
Bounded model checking of HyperCTL* based on QBF
- MING Zhiyong1, 2, 3, WANG Yisong2, 3, FENG Renyan4
-
2025, 47(6):
1062-1070.
doi:
-
Abstract
(
156 )
PDF (663KB)
(
194
)
-
Model checking of hyperproperties is an important research topic in formal verification. HyperCTL* extends Computation Tree Logic (CTL*) by explicitly quantifying properties over multiple execution paths of a system. To address the issue of high time complexity in HyperCTL* model check- ing, this study first proposes a bounded semantics for HyperCTL*, followed by a bounded model check- ing algorithm based on quantified Boolean formulas. The correctness of this algorithm is analyzed, and finally, a prototype system for HyperCTL* bounded model checking tool, named Hybmc, was implemented. Experimental results demonstrate that the efficiency of Hybmc’s bounded model checking significantly outperforms that of HyperQube, a bounded model checking tool for HyperLTL.
-
A traffic sign detection model based on coordinate convolution and optimal transport assignment
- XIONG Changzhen, LI Xiyu, WANG Pangwei
-
2025, 47(6):
1071-1078.
doi:
-
Abstract
(
179 )
PDF (1129KB)
(
262
)
-
To address the issue of failed traffic sign detection caused by factors such as small scale, high-speed movement, and adverse weather conditions, a traffic sign detection model based on coordinate convolution and optimal transport label assignment is proposed. In order to take into account the embedded platform deployment and detection speed, the proposed model builds upon the YOLOv5s detection model. Firstly, spatial information is perceived and the feature representation capability for small-scale traffic signs is enhanced by utilizing coordinate convolution with additional coordinate channels. Secondly, an optimal transport assignment method is employed to seek globally optimal object label assignments, reducing the number of ambiguous bounding boxes and improving the utilization of training data. Finally, a SIoU loss function incorporating angle loss is utilized to enhance the con- vergence speed and detection capability of predicted bounding boxes. Experimental evaluations of the proposed model on CCTSDB and TSRD traffic sign datasets demonstrate significant improvements over the original YOLOv5s model. Compared with the YOLOv7 model, the proposed model achieves a 2.35% increase in mAP_0.5 and a 1.45% increase in mAP_0.5:0.95 on TSRD dataset, while performing on par with YOLOv7 on the CCTSDB dataset. Moreover, the proposed model exhibits more than 2.5 times faster detection speed than YOLOv7 on both datasets, highlighting its excellent detection accuracy and speed.
-
Detection and recognition of concrete cracks in underground engineering based on improved YOLOv8 model
- ZHOU Fengjun, KANG Huaiqiang, GAO Shen, LI Feng, SUN Yunhou, GAO Hang, MA Pengsheng
-
2025, 47(6):
1079-1089.
doi:
-
Abstract
(
282 )
PDF (1496KB)
(
508
)
-
Cracks on concrete surfaces in underground engineering are one of the key factors affecting construction safety. Accurate and efficient crack detection can help prevent safety incidents to a certain extent. To address this issue, an improved YOLOv8 model for cracks on concrete surfaces detection and identification is proposed. Firstly, the backbone network of YOLOv8 is enhanced by incorporating dilated convolutions to improve feature extraction for shallow-layer targets. Secondly, the CBAM (convolutional block attention module) is introduced to strengthen the model’s ability to capture crack features. Thirdly, the neck network structure of YOLOv8 is modified to address the challenges of small target features being too miniature and weak feature textures making learning difficult. Finally, the feature fusion method in the neck network is optimized. Experimental results show that the improved YOLOv8 model achieves a 36.94% increase in Precision, a 49.18% increase in Recall, and a 51.74% increase in mAP(mean average precision). The enhanced model is better suited for concrete crack detection in complex scenarios and further improves the recognition performance for small targets in challenging environments.
-
Virtual fire simulation of fire extinguishers based on Unity 3D
- XUE Jinyun1, ZHOU Zhipeng1, 2, XUE Huiqi3, YI Xinwu1, 2, LI Zhihui1, 2, LIU Zhigao1, 2
-
2025, 47(6):
1090-1096.
doi:
-
Abstract
(
252 )
PDF (814KB)
(
172
)
-
Traditional firefighting training requires personnel to engage in hands-on practice with real fires, which not only demands significant space and financial resources but also carries the risk of casualties. With the advancement of virtual reality (VR), leveraging its ability to simulate realistic environments for users to complete target tasks has become a key research focus. By utilizing Unity 3D’s powerful 3D rendering capabilities and flexible scripting system, combined with VR’s particle system to simulate flames and fire extinguisher sprays, the training environment for firefighters can be authentically recreated. To enhance simulation realism, the principles of dynamics were employed to model the initial trajectory of the fire extinguisher spray, while a Gaussian diffusion model was proposed to analyze the later-stage dispersion of the spray in the air. Experimental results demonstrate that the principles of dynamics and the Gaussian diffusion model successfully simulates the dynamic changes of the fire extinguisher spray. Additionally, collisions between particles were utilized to achieve the fire-extinguishing process, allowing users to intuitively observe the impact of the spray on the flames.
-
An image calibration method for pointer instruments based on improved STN
- QU Haicheng, ZHANG Wang, TIAN Pengfei
-
2025, 47(6):
1097-1105.
doi:
-
Abstract
(
140 )
PDF (1887KB)
(
279
)
-
Aiming at the issues in pointer meter calibration tasks, such as excessive tilt rotation angles and unsatisfactory performance of conventional calibration methods, this paper proposes an improved STN-based image calibration method for pointer instruments. This method employs a front-end network model (ASTN-FP), to predict the homography parameters and pointer angles of meter images. By incorporating an adaptive transformation layer and a feature pyramid structure, it enhances the model’s learning capability for multi-scale meter processing and improves network performance. During the training phase, a Sim2Real training strategy is adopted, where synthetic datasets are used for initial training, followed by fine-tuning with real-world data. In the calibration stage, homography transformation and perspective transformation are combined to strengthen the model’s ability to handle complex transformations. Validation experiments conducted on both simulated and real-world data demonstrate that, compared to mainstream image calibration methods, the proposed method achieves significant improvements in calibration efficiency and average calibration time, and achieves a recognition accuracy of 95.3% on the calibration data, verifying the effectiveness of the proposed method.
Artificial Intelligence and Data Mining
-
Research on named entity recognition for radar maintenance based on the ERNIE model
- ZENG Chuizhen1, 2, CUI Liangzhong1, MA Wenzhuo2
-
2025, 47(6):
1106-1113.
doi:
-
Abstract
(
184 )
PDF (1353KB)
(
305
)
-
In the construction of a knowledge graph for radar maintenance, the strong specialization and scarcity of annotated datasets pose significant challenges in training named entity recognition (NER) models, with traditional model training failing to meet application requirements. Building upon the BiGRU-CRF model, this paper introduces a pre-trained model and proposes the ERNIE-BiGRU-CRF model. First, taking a specific radar model as an example, maintenance data were collected and preprocessed. The doccano platform was used for manual annotation, resulting in over 1,100 labeled NER data entries in the radar maintenance domain. Next, dynamic word embeddings for the radar maintenance training data were obtained using the ERNIE pre-trained model, while BiGRU captured bidirectional semantic information. Finally, the most reasonable sequence labeling results were derived through CRF constraints. Experimental results show that, with limited training data, the proposed model achieves strong recognition performance. Compared to BiGRU-CRF and BiLSTM-CRF models, it demonstrates an improvement in F1-score, effectively addressing the issues of insufficient training data and suboptimal training performance in the radar maintenance domain. This model holds practical value for the automated construction of knowledge graphs in radar maintenance.
-
An automatic Tibetan dialect identification method by integrates multiple features
- GAZANG Cairang1, 2, GAO Dingguo1, 2 , RENQING Dongzhu1
-
2025, 47(6):
1114-1120.
doi:
-
Abstract
(
167 )
PDF (860KB)
(
395
)
-
Tibetan dialects are numerous and exhibit significant internal differences, making research on their automatic identification valuable in the fields of speech processing, criminal investigation, public security, and linguistics. Currently, common methods for Tibetan dialect identification rely on various acoustic features and deep learning models based on big data. However, traditional acoustic features fail to effectively characterize the subtle distinctions among Tibetan dialects, and deep learning struggles to achieve high-precision dialect recognition on small-scale datasets. To address this issue, this paper proposes an automatic Tibetan dialect identification method by integrating multiple features. This method combines Mel-frequency cepstral coefficients (MFCC), Gammatone frequency cepstral coefficients (GFCC), and short-time energy (STE) values containing voicing information to construct an information-fused feature system. A bidirectional long short-term memory (Bi-LSTM) network is employed to identify major Tibetan dialects such as U-Tsang, Amdo, and Kham. Experimental results show that the proposed multi-feature fusion method improves accuracy by 10.73%, 10.78%, and 59.48% compared to single-feature methods using MFCC, GFCC, and short-time energy, respectively, ultimately achiev- ing a recognition accuracy of 94.89%. This effectively validates the efficacy and practicality of the proposed method.
-
DRG medical Q&A research based on both knowledge and data
- XU Chun, SUN Enwei, WANG Xiaojie
-
2025, 47(6):
1121-1132.
doi:
-
Abstract
(
219 )
PDF (1794KB)
(
535
)
-
The real electronic medical record data covering diagnosis related groups (DRG) coding are too scarce to support language models in learning text features, and the existing disease coding models are difficult to interpret the results for complex text. Therefore, this paper designs a medical question answering system model GLM-2B-DRAGON (generative language model-deep bidirectional language-knowledge graph pretraining) that integrates medical knowledge graph and large language model. Firstly, ChatGLM-6B model is employed to extract and update medical entities and entity relationships, and a knowledge graph DRG-Net covering medical knowledge such as DRG coding is obtained. Secondly, the cross-modal encoder is used to jointly encode the QA pairs and the knowledge graph to realize the complementary text-graph bidirectional information flow to capture the characteristics of medical text. Finally, the interpretability of the answer results is verified through the visual analysis of the path weights of the knowledge graph. The experimental results show that the proposed system model is superior to the existing knowledge graph enhanced language models on the public dataset CommenSenseQA and the self-built medical dataset MedicalQA.
-
Tibetan long text classification by fusing denoising fine-tuning and graph attention mechanism
- JING Rong1, WAN Fucheng1, 2, HUANG Rui1, YU Hongzhi1, 2, MA Ning1, 2
-
2025, 47(6):
1133-1140.
doi:
-
Abstract
(
175 )
PDF (745KB)
(
211
)
-
In Tibetan long text classification tasks, the issue of long-distance dependencies is particularly prominent. Meanwhile, multilingual pre-trained models exhibit certain biases when handling Tibetan text classification tasks. To address these issues, this paper proposes a Tibetan long text classification method based on the pre-trained language model CINO-Large, which integrates denoising fine-tuning and a graph attention network. Firstly, the In-trust loss function is introduced into CINO-Large to enhance the model’s generalization ability in downstream tasks through task-adaptive loss. Secondly, sliding windows and linear classification are introduced into graph structure modeling to selectively increase document-document edges, thereby improving the feature distinguishability among nodes. Finally, the graph attention mechanism is utilized to capture the importance of different nodes in the graph, completing the Tibetan long text classification task. On the TNCC news long text dataset, the classification accuracy of the proposed method reaches 71.66%. Compared to the pre-trained language model CINO-Large, the accuracy, precision, and F1 score of the proposed model are improved by 1.77%, 2.67% and 2.03%, respectively. For some subclasses that are difficult to classify, the F1 score of the proposed method can be significantly improved by approximately 20%.