High Performance Computing
-
Selection of sparse matrix multiplication algorithms based on supervised learning
- PENG Lin, ZHANG Peng, CHEN Junfeng, TANG Tao, HUANG Chun
-
2025, 47(03):
381-391.
doi:
-
Abstract
(
58 )
PDF (1271KB)
(
102
)
-
The sparse matrix multiplication algorithms with mainstream row-by-row calculation formulas, including SPA, HASH, and ESC, have significant performance disparities in different sparse matrices. There is no single optimal algorithm for all sparse matrices. A single algorithm cannot always achieve optimal performance on different non-zero element scales, and there is a significant gap between a single algorithm and the optimal selection. To this end, a selection model of sparse matrix multiplication algorithm based on supervised learning is proposed. A given set of matrices is used as the data source to extract the features of the sparse matrices, and performance data is obtained using SPA, HASH, and ESC calculations for training and validation. The resulting model can select the optimal algorithm for a new dataset solely based on the features of the sparse matrix. The experimental results show that this model can achieve a prediction accuracy of over 91%, with an average performance of 98% of the optimal selection, which is more than 1.55 times the performance of a single algorithm. It can also be used in practical library functions and has good generalization ability and practical value.
-
Parallel file system network driver based on Tianhe inter-connection system
- DONG Yong, WU Huijun, YANG Lihua, ZHANG Wei, WANG Ruibo, ZHOU Enqiang
-
2025, 47(03):
392-399.
doi:
-
Abstract
(
38 )
PDF (1640KB)
(
81
)
-
Parallel file system is an essential component of the software stack in high performance computing systems. The driver designed for high-speed networks is a crucial aspect of parallel file systems in providing efficient data access. A parallel file network driver based on the Tianhe high-speed interconnect network (TH-Express), named GLND, has been designed and implemented. GLND has been optimized specifically in three areas: parallelization, communication protocol, and fault tolerance. It achieves high throughput through VP-level parallelism combined with appropriately balanced pipeline partitioning. It adaptively selects the underlying communication protocol based on factors such as message size differences, implementing a NUMA-aware memory management mechanism. Additionally, an adaptively adjustable timeout mechanism is employed to avoid the impact of abnormal timeouts at the software layer on the completion of communication operations. Experimental results show that under the same hardware conditions, GLND improves write bandwidth by an average of 23.69% and read bandwidth by an average of 79.25% compared to TCP.
-
Optimization of MPI_Barrier based on the offloading characteristics of Tianhe-2
- ZHU Qi, DAI Yi, PENG Jintao, XIE Min, LIANG Chongshan, LIU Peng, YANG Bo, LIU Jie,
-
2025, 47(03):
400-411.
doi:
-
Abstract
(
46 )
PDF (1390KB)
(
61
)
-
Barrier, as a fundamental operation in message passing interface (MPI) programs, is one of the critical mechanisms ensuring the correct execution of programs. Existing Barrier implementation schemes primarily suffer from two defects: firstly, there is significant redundant data path transmission overhead during inter-node synchronization; secondly, there are numerous cache misses during intra-node synchronization. To address these performance limitations, this paper proposes two optimization techniques tailored for the aggregate communication offload features of the Tianhe-2 customized network, TH-Express: Barrier acceleration based on GLEX NIC and shared memory flag bits rearrangement. These techniques effectively reduce the synchronization overhead between nodes and improve the synchronization efficiency within nodes based on shared memory. Based on the aforementioned optimization methods, this paper redesigns the MPI_Barrier algorithm and integrates it into the MPI communication library. Performance tests of the proposed scheme are conducted on micro-benchmark programs and real applications running on the National Supercomputing Center in Changsha, with a scale of up to 7168 nodes. Experimental results show that the optimized MPI_Barrier collective operation achieves a speedup ranging from 1.3 to 14.5 times, and in application-level real-load evaluations, the performance improvement reaches up to 54%.
-
QTorch:A quantum-classical hybrid machine learning framework built on a standalone quantum programming language
- CHEN Wenjin
-
2025, 47(03):
412-421.
doi:
-
Abstract
(
38 )
PDF (1031KB)
(
94
)
-
In recent years, quantum computing systems have demonstrated their quantum supremacy in specific sampling problems, marking humanity’s entry into the noisy intermediate-scale quantum (NISQ) era. Quantum machine learning (QML) algorithms have garnered significant attention in the field of quantum computing due to their potential to leverage quantum supremacy in solving practical problems of significance. This has made them a prominent and highly relevant topics in quantum computing research. However, efficiently describing and compiling QML algorithms using existing hybrid quantum-classical machine learning frameworks remains a significant challenge, hindering the development of algorithms. This paper addresses this challenge by introducing QTorch, a quantum-classical hybrid machine learning framework. QTorch is constructed by leveraging PyTorch, an open-source classical machine learning framework, in conjunction with a standalone quantum programming language. It incorporates automatic differentiation techniques tailored for real quantum hardware and quantum-classical hybrid machine learning algorithms. Additionally, QTorch introduces parallel training optimization and parameter substitution optimization, two key features designed to enhance time performance. To evaluate the effectiveness of QTorch, a series of experiments were conducted to validate its capabilities and advantages. The results demonstrate that QTorch serves as an efficient platform supporting the development and implementation of quantum-classical hybrid machine learning algorithms, thereby propelling advancements in the field of QML.
-
Reinforcement learning control for data center refrigeration systems
- WEI Dong , JIA Yuchen, HAN Shaoran
-
2025, 47(03):
422-433.
doi:
-
Abstract
(
33 )
PDF (2580KB)
(
64
)
-
The refrigeration system in data centers needs to operate continuously throughout the year, and its energy consumption cannot be ignored. Moreover, traditional PID control methods struggle to achieve overall energy savings for the system. To address this, a reinforcement learning control strategy is proposed for data center refrigeration systems, with the control objective of enhancing the overall energy efficiency of the system while meeting cooling requirements. A two-layer hierarchical control structure is designed. The upper optimization layer introduces the multistep prediction-deep deterministic policy gradient (MP-DDPG) algorithm, which leverages DDPG to handle the multi-dimensional continuous action space of the refrigeration system to determine the water valve opening of the air hand- ling unit and the optimal setpoint for each loop in the chilling station system. Multistep prediction is employed to enhance algorithm efficiency and overcome the impact of large system delay during real-time control. The lower field control layer uses PID control to enable the controlled variables to track the optimal setpoints obtained from the optimization layer, achieving performance optimization without disrupting the existing field control system. To address the challenge of real-time control with model-free reinforcement learning, a system prediction model is first constructed, and the reinforcement learning controller is trained offline through interaction with this model. Subsequently, online real-time control is implemented. Experimental results show that compared to the traditional DDPG algorithm, the learning efficiency of the controller is improved by 50%. Compared to PID and MP-DQN (multistep prediction-deep Q network), the systems dynamic performance is improved, and the whole energy efficiency is increased by approximately 30.149% and 11.6%, respectively.
Computer Network and Znformation Security
-
RCGNN: Robustness certification for graph neural networks under graph injection attacks
- WANG Yuheng, LIU Qiang, WU Xiaojie
-
2025, 47(03):
434-447.
doi:
-
Abstract
(
43 )
PDF (1041KB)
(
62
)
-
In recent years, graph neural network (GNN) has been widely applied in fields such as anomaly detection, recommendation systems, and biomedicine. Despite their excellent performance in specific tasks, many studies have shown that GNN is susceptible to adversarial perturbations. To mitigate the vulnerability of GNN to adversarial examples, some researchers have proposed robustness certification defense techniques against graph modification attacks, aiming to enhance the ability of GNN models to resist malicious perturbations in this scenario. However, the robustness analysis of node classification models in the context of graph injection attack (GIA) has not been widely explored. Facing this challenge, we extend the sparse-aware randomized smoothing mechanism and design a robustness certification method, RCGNN, based on randomized smoothing for the GIA scenario. To align the noise perturbation space with GIA attack behaviors, we pre-inject malicious nodes and restrict perturbations near these nodes, and improve the noise perturbation function to increase the certification ratio and expand the maximum certification radius. Comparative experiments on real datasets demonstrate that RCGNN can achieve robustness certification for node classification tasks in the GIA scenario, and it outperforms the sparse-aware randomized smoothing mechanism in terms of certification ratio and maximum certification radius.
-
Log anomaly detection based on Transformer and Text-CNN
- YIN Chunyong, ZHANG Xiaohu
-
2025, 47(03):
448-458.
doi:
-
Abstract
(
46 )
PDF (905KB)
(
63
)
-
Log data, as one of the most important data resources in software systems, records detailed information during system operation, and automated log anomaly detection is crucial for maintain- ing system security. With the widespread application of large language models in the field of natural language processing, Transformer-based log anomaly detection methods have been widely proposed. Traditional Transformer-based methods struggle to capture the local features of log sequences. To address this issue, this paper proposes a log anomaly detection method, LogTC, based on Transformer and Text-CNN. Firstly, logs are converted into structured log data through rule matching, while preserving the effective information in log statements. Secondly, log statements are divided into log sequences using fixed windows or session windows according to log characteristics. Thirdly, natural language processing technology, specifically Sentence-BERT, is used to generate semantic representations of log statements. Finally, the semantic vectors of the log sequences are input into the LogTC log anomaly detection model for detection. Experimental results show that LogTC can effectively detect anomalies in log data and achieves good results on two datasets.
-
A network intrusion detection method based on graph heat kernel diffusion convolution
- JING Yongjun, WANG Hao, SHAO Kun, WANG Xiaofeng
-
2025, 47(03):
459-471.
doi:
-
Abstract
(
38 )
PDF (1505KB)
(
54
)
-
Network intrusion detection is a crucial means of protecting computing resources and data from cyber-attacks. In recent years, the methods based on deep learning have made significant progress for intrusion detection. However, challenges remain, such as effective feature extraction and over- reliance on manually annotated data. To address these issues, a semi-supervised intrusion detection method based on graph heat kernel diffusion convolution is proposed. The method builds the host interaction graph by using source IP and destination IP addresses as nodes, and their interaction relationships as edges. By fusing network flow statistics and latent graph structural features, the method leverages the graph heat kernel diffusion to aggregate the neighborhood information. These node representations can significantly improve the downstream intrusion detection tasks, enhancing the accuracy of identifying anomalous nodes and malicious connections. Experiments conducted on the CIC-IDS-2017 and CIC-IDS-2018 datasets demonstrate that the proposed method can effectively capture the complex topological structures and node relationships in network traffic data. It can learn low-dimensional node embeddings using only a small number of flow features and label information. Furthermore, cluster analysis and visualization of the node representations can reveal the community structure and connection characteristics of attack nodes, providing valuable references for the prevention of novel or evolving attacks.
-
A cross-domain mutual authentication scheme based on Chebyshev chaotic map for V2G
-
2025, 47(03):
472-484.
doi:
-
Abstract
(
27 )
PDF (2632KB)
(
48
)
-
Due to the fact that existing V2G (Vehicle-to-Grid) authentication schemes fail to consider the mobility characteristics of vehicles, they are unable to effectively complete the cross-domain authentication process when cross-domain vehicles communicate and interact with grid servers in different regions. Additionally, when vehicles arrive at their destination domain for cross-domain authentication, they need to communicate frequently with the server in their registered domain, resulting in inefficient cross-domain authentication. To address these issues, a V2G cross-domain mutual authentication scheme based on Chebyshev chaotic map is proposed. In the proposed scheme, firstly, the physical security of the aggregator is quickly verified, and a temporary session key between the aggregator and the grid server is negotiated to complete the first layer of authentication, ensuring the confidentiality of information transmission during node queries on the blockchain. The second layer of authentication between the aggregator and the cross-domain vehicle is completed using authentication parameters stored on a smart card, which optimizes the cumbersome communication process between the vehicle and the aggregator in the registered domain and further reduces communication overhead. Theoretical analysis and experimental results demonstrate that the proposed scheme can achieve secure and efficient cross- domain authentication.
-
A compressive sensing image reconstruction network based on iterative shrinkage thresholding and deep learning
- XU Wen, YU Li
-
2025, 47(03):
485-493.
doi:
-
Abstract
(
32 )
PDF (1200KB)
(
59
)
-
Aiming at the problems of low refinement of image reconstruction and weak network generalization ability in compressive sensing reconstruction algorithms based on deep learning, a compressive sensing image reconstruction network (EH-ISTANet) based on iterative shrinkage thresholding and deep learning is proposed. The model consists of three parts: extraction subnetwork, initialization subnetwork and enhancement reconstruction subnetwork. It adds the attention mechanism and cooperates with the neighborhood mapping module to send the obtained features to the enhancement module, so as to enhance the edge and texture of the reconstructed image. The reconstruction stage mimics the unfolding process of the traditional iterative shrinkage thresholding algorithm, and each stage can flexibly model the measurement matrix and dynamically adjust the step size in the gradient descent step. It is verified that the peak signal-to-noise ratio of the model is improved in different datasets with different sampling rates. It is demonstrated that the model outperforms other models in improving generalization ability and reconstruction accuracy. When the compressive sensing rate is 10%, the average signal-to-noise ratio of this model on five testsets is improved by 1.69 dB, 4.36 dB and 1.93 dB compared with CSNet, AMP-Net, and AMP-Net-BM models.
-
A lightweight face super-resolution reconstruction method based on pulse attention mechanism
- LI Jiao, GAO Leiyi, ZHANG Ruixin, WU Yue, DENG Hongxia
-
2025, 47(03):
494-503.
doi:
-
Abstract
(
39 )
PDF (1673KB)
(
49
)
-
Research on face super-resolution based on deep learning has made significant progress in recent years. However, a challenging aspect in this field is how to effectively restrict model complexity while preserving fine and natural facial textural details during the restoration process, and it’s crucial to meet the demand of transferring the network model onto lightweight devices. Therefore, a lightweight face super-resolution reconstruction method based on pulse attention mechanism is proposed. The proposed new pulse attention mechanism integrates the multi-round global information extracted by the pulse-coupled neural network into the window self-attention mechanism, uses global information and local information to improve the learning ability of the network, and uses the adversarial generation network structure to build a progressive generator based on window self-attention to ensure the lightweighting of the method. Experimental results on the CelebA and Helen datasets show that this method performs excellently on LPIPS and MPS perceptual evaluation indicators. Compared with methods of the same parameter magnitude, it achieves significant improvement across all metrics and exhibits superior subjective visual quality.
-
A visual SLAM method based on improved instance segmentation for indoor dynamic scenes
- LIANG Rongguang, YUAN Jie, ZHAO Yingying, CAO Xuewei
-
2025, 47(03):
504-512.
doi:
-
Abstract
(
26 )
PDF (3087KB)
(
48
)
-
Addressing the issues that visual SLAM has data association mismatch in dynamic scenarios and false detection in instance segmentation, an indoor dynamic point feature detection method based on improved instance segmentation is proposed. Firstly, the YOLOv7-seg algorithm is improved, and a double gradient path aggregation network (D-ELAN) and a hole attention mechanism (DwCBAM) are designed to obtain the accurate contour information of dynamic objects in the current image frame. Secondly, dynamic feature points are eliminated from the SLAM front-end image frames after determining the object class. Finally, static points are utilized to construct an error optimization model. The experimental results show that the improved algorithm increases the mAP by 2.3% on average compared to YOLOv7-seg. On the TUM dataset, the method reduces the SLAM absolute trajectory error by 95.91% on average compared to ORB-SLAM2.
-
Emotional color transfer combining image decomposition and self-sparse fuzzy clustering
- XIE Bin, LI Yanwei, YANG Shumin, XU Yan, WANG Guanchao
-
2025, 47(03):
513-523.
doi:
-
Abstract
(
33 )
PDF (3641KB)
(
42
)
-
Aming at the problems of lack of layering, blurring of details and poor visual effect in traditional emotion color transfer methods, a new transfer method is proposed by combining image decomposition and self-sparse fuzzy clustering. Firstly, in order to better maintain the details of the image, a cartoon texture decomposition based on low-rank texture prior is introduced to divide the source image into a smoothed map containing the main colors and a texture map with local information. Secondly, the self-sparse fuzzy clustering method is used to obtain the main representative colors and corresponding segmentation regions of the smooth map, enabling the result image to better retain the structure of the source image. Finally, an adaptive brightness correction anti-overflow strategy is designed, and based on this, a new emotional color transfer method is proposed to make the result image more consistent with human visual characteristics. Experimental results show that the proposed method produces higher- quality transfer result images and performs better in both subjective and objective evaluations.
Artificial Intelligence and Data Mining
-
Optimization of speech enhancement based on mismatched negative latency
- JI Chenguo, JIA Hairong, PEI Yijing, DUAN Shufei
-
2025, 47(03):
524-533.
doi:
-
Abstract
(
26 )
PDF (2117KB)
(
40
)
-
Addressing the mismatch between the existing speech enhancement loss function and the evaluation index, the performance of the speech enhancement algorithm is effectively improved by combining the EEG component evaluation speech index with the loss function. Firstly, it is verified that the latency of mismatched negative waves of EEG components can be used as an objective evaluation index of speech. A latency function of mismatched negative waves is proposed, and it is connected to the signal-to-noise ratio, so as to solve the problem that the currently commonly used evaluation index cannot be directly used as a loss function to optimize the speech enhancement algorithm. Secondly, the latency function is trained jointly with the learning objectives in the traditional neural network, and the latency function is continuously optimized through training. Finally, the latency function is applied to the loss function of the discriminator that generates the adversarial network. Combining Conformer can effectively capture long-term dependencies and extract local features in both time and frequency dimensions. The experimental results show that the speech enhancement algorithm can effectively improve the speech characteristics by using the objective measures of EEG component evaluation. The effectiveness of the proposed algorithm is verified from the aspects of speech enhancement quality, intelligibility and distortion.
-
Node classification with graph structure prompt in low-resource scenarios
- CHEN Yuling, LI Xiang
-
2025, 47(03):
534-547.
doi:
-
Abstract
(
19 )
PDF (1437KB)
(
47
)
-
Text-attribute graph has increasingly become a hotspot in the field of graph research. In traditional graph neural network (GNN) research, the node features used are typically shallow features derived from text information or manually designed features, such as those from the skip-gram and continuous bag of words (CBOW) models. In recent years, with the advent of large language models (LLMs), profound changes have taken place in the direction of natural language processing (NLP). These changes have not only impacted NLP tasks but have also begun to permeate into GNNs. Consequently, recent graph-related work has started to introduce language representation models and large language models to generate new node characterization, aiming to further mine richer semantic information. In existing work, most models still adopt traditional GNN architectures or contrastive learning approaches. In the category of contrastive learning methods, since traditional node features and node characterization generated by language models are not produced by a unified model, they face the challenge of dealing with two vectors located in different vector spaces. Based on these challenges and considerations, a model named GRASS is proposed. Specifically, in the pre-training task, the model introduces text information expanded by large language models, which is used for contrastive learning with textual information processed by graph convolution. In downstream tasks, to reduce the cost of fine-tuning, GRASS aligns the formats of downstream tasks with those of pre-training tasks. Through this model, GRASS can perform well on node classification tasks without the need for fine-tuning, especially in low-shot scenarios. For example, in the 1-shot scenario, compared with the best baseline, GRASS improves by 6.10%, 6.22%, and 5.21% on the Cora, Pubmed, and ogbn-arxiv datasets, respectively.
-
Driver identification model based on driving context-aware
- YANG Lin, ZHANG Lei, LIU Bailong, LIANG Zhizhen, ZHANG Xuefei
-
2025, 47(03):
548-560.
doi:
-
Abstract
(
33 )
PDF (1652KB)
(
66
)
-
With the increasing awareness of privacy protection, identifying car drivers using vehicle trajectory data has become a hot topic in vehicle data analysis. However, existing models struggle to accurately capture the relationship between driving style and driving context, resulting in low identification accuracy. Therefore, a driving context-aware driver identification model (CDIM) is proposed. CDIM utilizes trajectory data to calculate vehicle motion features and obtains travel routes through road network matching. It also designs a road segment information embedding module based on a bidirectional Transformer, which generates embeddings for each road segment in the travel route by fusing features of adjacent road segments. Then, a convolutional cross-modal attention fusion module is used to combine road segment features with motion features, achieving efficient fusion of the two. Additionally, external factor features are incorporated to comprehensively capture the influence of driving context on driving style. Experimental results on public datasets show that CDIM achieves a identification accuracy of 68.54%, which is an improvement of 8.14% and 4.81% compared to RM-Driver and Doufu, respectively, demonstrating higher driver identification accuracy.
-
A container truck prediction model for ports based on multi-source heterogeneous fusion and spatiotemporal graph convolutional network
- XUE Guixiang, CHEN Yuang, LIU Yu, ZHENG Qian, SONG Jiancai
-
2025, 47(03):
561-570.
doi:
-
Abstract
(
24 )
PDF (2078KB)
(
42
)
-
Timely and accurate container truck prediction algorithms are crucial to the scheduling optimization and resource allocation of port logistics systems. Because the arrival volume of container trucks is affected by many complex factors, such as the traffic condition of the adjacent road, weather, and port operation plan, it shows highly nonlinear and complex characteristics. Traditional traffic flow prediction methods are complicated by effectively integrating the influence of internal and external factors and accurately extracting their spatial and temporal correlations. Regarding this matter, a hybrid container truck prediction model based on multi-source heterogeneous fusion and spatiotemporal graph convolutional network (MHF-STGCN) is proposed, which adopts the attention mechanism to adaptively extract the critical information from multi-source heterogeneous historical data of port traffic flow and mine its dynamic spatiotemporal evolution characteristics. Multi-source data fusion decreases the models MAE by 34.99% and RMSE by 31.10% compared to single traffic data. Detailed comparative experimental results show that the model significantly outperforms the baseline model in terms of MAE, RMSE, and R-Square.