High Performance Computing
-
DeepFlame: An open-source platform for reacting flow simulations empowered by deep learning and high-performance computing
- MAO Run-ze, WU Zi-heng, XU Jia-yang, ZHANG Yan, CHEN Zhi,
-
2024, 46(11):
1901-1907.
doi:
-
Abstract
(
103 )
PDF (2134KB)
(
143
)
-
In recent years, deep learning has been widely recognized as a reliable approach to accele- rate reacting flow simulations. In recent work, this paper has developed an open-source platform named DeepFlame, which supports machine learning libraries and algorithms during the simulation of reacting flows. Leveraging DeepFlame, this paper has successfully employed deep neural networks (DNNs) to compute chemical reaction source terms. This paper focus on optimizing the platform for high-performance. Firstly, to fully harness the acceleration potential of DNNs this paper implements support for multi-GPU parallel inference in DeepFlame, developing intra-node partitioning algorithms and a master-slave communication structure, and complete the migration to Graphics Processing Units (GPUs) and Deep Computing Units (DCUs). Furthermore, this paper implements the solution of partial differential equations and the construction of discrete sparse matrices on GPUs based on the Nvidia AmgX library. Finally, this paper evaluates the computational performance of the updated DeepFlame on a CPU-GPU/DCU heterogeneous architecture. The results indicate that using a single GPU card alone can achieve a maximum speedup of up to 15 times when simulating a reactive Taylor Green Vortex (TGV).
-
Gyration: A packet deflection congestion control algorithm based on RTT measurement
- LU Ping-jing, YU Jia-ren, YUAN Guo-yuan
-
2024, 46(11):
1908-1915.
doi:
-
Abstract
(
54 )
PDF (809KB)
(
95
)
-
Efficient congestion control has always been a critical challenge in the field of datacenter networks. Accurate measurement of Round Trip Time (RTT) is the cornerstone of RTT-based reactive congestion control algorithms. Based on Swift congestion control algorithm, this paper proposes Gyration, a packet deflection congestion control algorithm based on RTT measurement. Gyration incorporates the deflection packet delay into the calculation of RTT, thereby augmenting the RTT calculation with the measurement of deflection delay. This approach enables a more accurate assessment of network congestion conditions. Experimental results demonstrate that compared to Swift, under heavy load traffic patterns such as Cache Follower, Data Mining, Web Search, and Web Server, Gyration achieves a reduction in flow completion time FCT by 20%, 80%, 13%, and 60%, respectively, and an increase in throughput by 38%, 6%, 15%, and 2%, respectively. This signifies that Gyration provides more timely and precise congestion control for datacenter networks, effectively mitigating congestion issues within these networks.
-
Design and implementation of an efficient memory allocation algorithm based on TLSF algorithm
- CHEN Zhao-hui, DUAN Xiong
-
2024, 46(11):
1916-1923.
doi:
-
Abstract
(
49 )
PDF (792KB)
(
87
)
-
In embedded systems, due to the limited availability of memory resources, the perform- ance and fragmentation rate of memory allocators become crucial factors to consider. Currently, the primary algorithm employed is TLSF (two-level segregated fit), but this algorithm poses certain issues in embedded systems, such as external fragmentation caused by small memory allocations and internal fragmentation resulting from large memory allocations. To address these issues, optimizations have been made to the TLSF algorithm: (1) For small memory allocations, a static memory pool (POOL) algorithm is introduced to resolve the external fragmentation problem that arises from numerous small memory allocations in the TLSF algorithm; (2) For large memory allocations, a DBL (divided by level) memory allocation strategy based on hierarchical division is proposed to tackle the internal fragmentation issue in large memory allocations. Experiments have shown that by utilizing the optimized TLSF algorithm (DBL+POOL) for memory management, it is possible to better utilize memory resources, thereby enhancing the performance and reliability of embedded systems.
-
A method for improving the robustness of mixed-precision optimization based on floating-point error analysis
- YU Heng-biao, YI Xin, LI Sheng-guo, LI Fa, JIANG Hao, HUANG Chun
-
2024, 46(11):
1924-1930.
doi:
-
Abstract
(
36 )
PDF (648KB)
(
87
)
-
Floating-point arithmetic is a typical numerical solution model for high-performance computing. Mixed-precision optimization enhances performance and reduces energy consumption by decreas- ing the precision of floating-point variables in programs. However, existing automatic mixed-precision optimization techniques are limited by low robustness, meaning that the optimized programs fail to meet the result accuracy constraints for given inputs. To address this issue, a method for improving the robustness of mixed-precision optimization based on floating-point error analysis is proposed. Firstly, inputs that can trigger imprecise calculations in the program are identified through floating-point error analysis. Then, based on these error-triggering inputs, the precision configurations are evaluated to guide the search for highly robust mixed-precision configurations. Experimental results show that for typical floating-point applications, this method can improve the robustness of mixed-precision optimization by an average of 62%.
-
Parallel implementation of a 3D-HEVC intra prediction algorithm based on dynamic self-reconfiguration structure
- YANG Hang, SHAN Rui, YANG Kun, CUI Xin-yue
-
2024, 46(11):
1931-1939.
doi:
-
Abstract
(
31 )
PDF (2332KB)
(
74
)
-
The implementation of intra prediction algorithms in 3D high efficiency video coding (3D-HEVC) on dedicated hardware has certain limitations, which can not fulfill the need for flexible and autonomous switching among multiple modes of the intra prediction algorithm. This leads to poor encoding performance and low utilization of hardware resources. To address this issue, a novel implementation method of 3D-HEVC intra prediction algorithm on a programmable dynamically self-reconfigurable array processor is proposed. This method, based on the dynamic self-reconfiguration mechanism, utilizes a programmable controller to collect the execution states of the array in real-time and autonomously issue new tasks once the current task is completed. By achieving hardware-level autonomous reconfiguration for different prediction mode mapping schemes, the algorithm can switch flexibly. Compared with related work, experimental results show that while enhancing flexibility, the hardware resources are reduced by 49%, and the computational latency is decreased by 29.2%. When the test sequences are subjected to the entire intra-frame loop test, the results demonstrate good image quality.
-
A low-jitter Retimer circuit for high-performance computer optical interconnection
- LIU Qing, WANG He-ming, Lv Fang-xu, ZHANG Geng, Lv Dong-bin
-
2024, 46(11):
1940-1948.
doi:
-
Abstract
(
41 )
PDF (2231KB)
(
79
)
-
With the significant increase in communication bandwidth, low jitter, as a crucial indicator of signal transmission quality in multi-scenario applications, has become an important research direction in signal integrity. The 56 Gbaud Retimer chip serves as the key component in optical interconnection data transmission for high-performance computers, and its jitter performance also restricts the overall performance of the optical module in high-performance computers. To address the challenge of low jitter performance in traditional high-speed Retimer chips, a low-jitter Retimer circuit with a data rate exce- eding 100 Gbps is proposed for the first time. This Retimer circuit, based on the CDR+PLL architecture, is integrated into a fiber optic repeater, featuring equalization and full-rate retiming functions. By adopting a jitter elimination filter circuit, it achieves excellent output data jitter performance under high-noise input signals, providing technical support for solving the issue of high output data jitter caused by direct sampling and forwarding in traditional Retimers. The design of the low-jitter Retimer circuit based on the CDR+PLL architecture was completed using TSMC 28 nm CMOS technology. Simulation results show that when the input is 112 Gbps PAM4, the output data jitter of the Retimer is 741 fs, representing a 31.4% reduction compared to traditional Retimer structures.
-
A heterogeneous differential synchronous parallel training algorithm
- HUANG Shan, WU Yu-fan, L He-xuan, DUAN Xiao-dong,
-
2024, 46(11):
1949-1959.
doi:
-
Abstract
(
33 )
PDF (1783KB)
(
74
)
-
Back propagation neural network (BPNN) is widely used in fields such as behavior recognition and prediction due to its advantages including strong nonlinearity, self-learning capability, adaptability, and robust fault tolerance. With the upgrade and optimization of models and the accelerated growth of data volume, parallel training architectures based on big data distributed computing frameworks have become mainstream. Apache Flink, as a new generation of big data computing frameworks, is widely applied due to its high throughput and low latency characteristics. However, due to the accelerated pace of hardware upgrades and different purchase batches, Flink clusters in real-life scenarios are mostly heterogeneous, meaning that computing resources within the cluster are unbalanced. Existing BPNN parallel training models cannot address the issue of high-performance nodes idling during the training process due to this unbalanced computing resource distribution. Additionally, in a heterogeneous environment, as the number of nodes increases, so does the communication overhead between nodes during BPNN parallel training. The traditional mini-batch gradient descent method possesses precise optimization capabilities, but the combination of random model initialization and precise mini-batch gradient descent characteristics leads to slow convergence speeds in BPNN parallel training. To address the aforementioned issues, this paper aims to accelerate BPNN parallel training speed and improve BPNN parallel training efficiency in a heterogeneous environment by proposing the heterogeneous micro-difference synchronous parallel training (HMDSPT) algorithm. This algorithm scores node performance based on variations in performance within a heterogeneous environment and dynamically allocates data in proportion through a data partitioning module in real-time, ensuring that node performance is directly proportional to the amount of data allocated to each node. This approach reduces the idling time of high- performance nodes.
Computer Network and Znformation Security
-
A blockchain-based crowdsourcing incentive mechanism
- YANG Song, WANG Xin-ru, LI Fan, ZHU Lie-huang, ZHAO Bo
-
2024, 46(11):
1960-1970.
doi:
-
Abstract
(
50 )
PDF (1125KB)
(
92
)
-
Crowdsourcing refers to the utilization of collective intelligence to collect, process, infer, and determine a vast amount of useful information, holding significant potential in areas such as service ratings, surveys, voting, and the industrial Internet of Things. A crowdsourcing system involves three stakeholders: the platform, workers, and task requesters. Traditional crowdsourcing systems are incentive- incompatible, and due to a lack of trust, all data transmitted between requesters and workers requires a remote centralized platform to act as a credit intermediary, which implies issues such as network congestion and privacy breaches. To address these issues, this paper proposes a trust-based crowdsourcing incentive mechanism, encompassing a “reward-penalty” model for crowdsourcing workers, a commission mechanism between requesters and master nodes, and a smart contract solution for the prisoners dilemma of resource exchange among master nodes. This multi-party incentive mechanism is realized by constructing smart contracts on master nodes in an edge environment. A low-cost, real-time, and high-volume transaction channel is established through the off-chain transaction implementation of the lightning network, solving the trust issues and transaction efficiency problems between master nodes and workers. Finally, the effectiveness of the proposed crowdsourcing incentive mechanism and its implementation approach is verified through multi-dimensional comparative simulation experiments.
-
An optimal placement mechanism for software-defined networking controllers based on genetic algorithm and clustering
- WANG Bing-bin, TANG Zhen-zhou
-
2024, 46(11):
1971-1978.
doi:
-
Abstract
(
43 )
PDF (1426KB)
(
72
)
-
In a logically centralized but physically distributed multi-controller software-defined networking (SDN) environment, the placement of controllers directly impacts network performance, including latency and load balancing. Therefore, the multiple controllers placement (MCP) problem is a crucial issue in SDN. Based on this analysis, a heuristic SDN MCP mechanism that integrates the genetic algorithm (GA) and k-medoid clustering algorithm, termed the GA-K-Medoids MCP mechanism, is proposed. This mechanism aims to minimize the propagation delay between controllers and switches, as well as among controllers. The performance of the proposed MCP mechanism is evaluated using two common network topologies, Internet2 OS3E and Palmetto, and compared with other mechanisms. Simulation results demonstrate that the GA-K-Medoids MCP can provide an effective and low-latency controller placement solution for multi-controller SDN.
-
Edge-disjoint path pair selection for the frame replication and elimination mechanism in time-sensitive networking
- HU Shao-liu, CAI Yue-ping
-
2024, 46(11):
1979-1988.
doi:
-
Abstract
(
31 )
PDF (921KB)
(
78
)
-
Industrial internet applications, such as industrial automation control systems, pose stricter performance requirements on networks, including bounded low latency, low jitter, and high reliability. The traditional Ethernets best-effort forwarding technology is difficult to meet the deterministic transmission demands of the industrial internet. The time-sensitive networking (TSN) being standardized by the IEEE 802.1 working group enhances Ethernet's capabilities in time synchronization, deterministic flow scheduling, and reliability. The frame replication and elimination for reliability mechanism (FRER) improves the reliability of TSN by transmitting identical frames in parallel over two disjoint paths with the same source and destination nodes and eliminating duplicate frames at the destination node. However, this mechanism has two main issues: firstly, path selection does not consider the inherent reliability of the paths; secondly, completely disjoint path pairs may not exist. This paper addresses these issues by constructing a path reliability model and proposing a calculation method based on edge-disjoint path pairs. Simulation results show that the proposed method effectively improves path reliability while reducing frame delay jitter. When the network load is 0.9, compared to traditional FRER and FRER-MPC, the proposed method reduces delay jitter by 15.6% and 11.19%, respectively.
-
Transmission bottleneck localization based on QoS-QoE prediction
- MA Xin-yu, LI Tong, CAO Jing-kun, WU Bo, SUN Yong-qian, ZHAO Yi
-
2024, 46(11):
1989-1996.
doi:
-
Abstract
(
39 )
PDF (937KB)
(
67
)
-
In real-time audio and video transmission, QoS (Quality of Service) metrics reflect the perceived network conditions at the server side, while QoE (Quality of Experience) metrics directly embody the satisfaction level of users with video services. Although QoE metrics are of greater concern to service providers, cloud service providers often cannot obtain QoE data in real-time due to issues such as interface adaptation and user privacy protection, making it difficult to predict and optimize potential QoE anomalies in a timely manner. Given the existing mapping relationship between QoS and QoE, this paper proposed a model that utilizes server-side QoS metrics to detect bottlenecks in QoE metrics, aiming to reduce the workload of operation and maintenance personnel and improve network optimization efficiency. The model employs an imbalanced decision tree for QoS-QoE prediction to achieve QoE anomaly detection. Furthermore, an LSTM regression model is utilized for causal analysis to locate bottlenecks. Experiments show that this model achieves high accuracy in QoE anomaly detection and can identify QoS metrics that significantly impact transmission outcomes.
-
A personalized differential privacy protection scheme for multidimensional data of participatory sensing devices
- WANG Tian-yang, LI Xiao-hui, CHEN Hong-yang
-
2024, 46(11):
1997-2006.
doi:
-
Abstract
(
23 )
PDF (932KB)
(
74
)
-
With the rise of Participatory Sensing technology, the scale and diversity of personal devices participating in data collection have continued to increase, leading to the emergence of a vast amount of multi dimensional numerical sensitive data, which has exacerbated the risk of privacy leakage. To address this issue, a personalized differential privacy protection scheme for multi dimensional numerical data from participatory sensing devices is proposed. This scheme achieves minimization of the mean squared error by designing a personalized privacy budget allocation scheme within a certain range and optimizing the sampling dimension of DPM (differential privacy mechanism). Based on this, PDPM (personalized dimensional partition mechanism) is designed to improve data availability and reduce the mean squared error after perturbation. Finally, experiments conducted on two real-world datasets verify that the proposed method significantly reduces the mean squared error of numerical data while protecting user privacy. Therefore, the proposed scheme provides a better balance between privacy protection and data availability.
-
A multi-target tracking algorithm based on Gamma distribution Bayesian RCS estimation
- LI Bo, WANG Jian, LI Jia-yu, LU Zhe-jun
-
2024, 46(11):
2007-2016.
doi:
-
Abstract
(
30 )
PDF (2439KB)
(
76
)
-
To address the issue of track mixing in multi-target tracking algorithms under dense target scenarios, this paper proposes a multi-target tracking algorithm based on Bayesian radar cross section (RCS) estimation using the Gamma distribution, which incorporates RCS information to assist in tracking. Firstly, the target RCS state and measurement filtering process are presented. A non-stationary autoregressive Gamma process is used to model the state dynamics, enabling Bayesian RCS estimation during the time update. Then, Bayesian RCS estimation is introduced into the probability hypothesis density (PHD) filter, resulting in the PHDwRCS filter, which enables tracking of dense targets. To address the limitations of PHD-based filters in real-time track formation and low tracking accuracy, RCS estimation is further integrated into the Track-before-Detect (TPHD) filter, yielding the TPHDwRCS filter, which achieves effective track tracking of dense targets. Computer simulation experiments demonstrate that the proposed algorithm can effectively implement Bayesian RCS estimation. The PHDwRCS and TPHDwRCS filters incorporating RCS information can accurately track dense targets, result- ing in improved quantitative error performance based on the generalized optimal subpattern assignment (GOSPA) metric. This approach mitigates the problem of track mixing to a certain extent.
-
A RGB-D visual SLAM system based on lightweight object detection network
-
2024, 46(11):
2017-2026.
doi:
-
Abstract
(
40 )
PDF (2269KB)
(
79
)
-
RGB-D SLAM is a technology that utilizes depth cameras to achieve simultaneous localization and mapping (SLAM). Traditional visual SLAM systems are based on the assumption of a static environment, yet dynamic objects often exist in real-world scenarios, potentially leading to significant deviations in the pose estimation of SLAM systems. To address this issue, this paper proposes a SLAM system based on lightweight YOLOv8s object detection. This system employs Socket communication to transmit object detection results to the SLAM system, which then utilizes the Depth Value-RANSAC geometric algorithm to eliminate dynamic feature points within the detected bounding boxes, thereby enhancing the positioning accuracy of the SLAM system in dynamic environments. The experiments were conducted using the TUM dataset for validation, and the results indicate that the systems accuracy is significantly improved compared to ORB-SLAM2. Compared to other SLAM systems, varying degrees of improvement in accuracy and real-time performance were observed.
-
Semantic segmentation of foggy driving scenes based on learnable image filter
- XU Xin, LI Ruo-shi, YUAN Ye, LIU Na
-
2024, 46(11):
2027-2034.
doi:
-
Abstract
(
36 )
PDF (1219KB)
(
81
)
-
Although deep learning-based semantic segmentation methods have achieved excellent results on traditional driving datasets, low-quality images captured under foggy conditions remain challenging. To address this issue, this paper proposes a learnable image filter (LIF) module, aiming to leverage the intrinsic characteristics of driving scene images under varying fog densities to improve semantic segmentation in foggy driving conditions. The LIF module consists of a hyperparameter prediction module (HPM) and an image filtering module (IFM), where the hyperparameters of the filter in the IFM are predicted by the HPM. This paper jointly learns the HPM and the semantic segmentation network in an end-to-end manner, ensuring that the HPM can learn appropriate IFM parameters to enhance images for segmentation in a weakly supervised manner. Taking DeepLabV3+, PSPNet, and RefineNet as baselines, respectively, experiments were conducted on a mixed dataset of Cityscapes and Foggy Cityscapes. The mean intersection over union (MIoU) scores of the baselines with the learnable image filter module are 63.14%, 60.45%, and 61.41%, representing improvements of 3.03%, 1.52%, and 1.69% over the baselines, respectively. The experimental results demonstrate the effectiveness and generality of the proposed module.
-
MCL based multi-rate point cloud action recognition
- LI Tao, WANG Song, XIE Tian, MA Ya-tong
-
2024, 46(11):
2035-2044.
doi:
-
Abstract
(
26 )
PDF (1277KB)
(
58
)
-
To address the issues of voxel data occupying a large amount of memory space and limited action information that can be extracted by a single network, multiple choice learning (MCL) based multi-rate point cloud action recognition model is proposed. Firstly, the preprocessing method of point cloud data is optimized, reducing the overall size of the point cloud data by half. Secondly, an MCL-based multi-rate point cloud action recognition model is introduced, which takes the MCL framework as the main structure and incorporates confidence loss fuction and generalized distillation. The confidence loss is used to determine the “teacher” and “student” networks during knowledge distillation. The “teacher” network is subjected to generalized distillation to guide the “student” network, enabling information fusion between networks operating at different rates. This model was evaluated on the publicly available MMActvity dataset and Pantomime dataset, achieving accuracies of 91.3% and 95.2%, respectively. The experimental results validate the effectiveness of the proposed model.
-
A smoke recognition method based on CNN and Transformer feature fusion
- FU Yan, YANG Xu, YE Ou
-
2024, 46(11):
2045-2052.
doi:
-
Abstract
(
59 )
PDF (1471KB)
(
102
)
-
Currently, many smoke recognition algorithms suffer from high false alarm rates, partly due to the fact that most existing convolutional neural networks (CNNs) mainly focus on local information in smoke images during feature extraction, neglecting the global features of smoke images. This bias towards local information processing can easily lead to misjudgments when dealing with variable and complex smoke images. To address this issue, it is necessary to capture the global features of smoke images more accurately, thereby improving the accuracy of smoke recognition algorithms. Therefore, this paper propose a dual-branch smoke recognition method, TCF-Net, which combines the Inception and Transformer structures. This model is improved to enrich feature diversity while reducing channel redundancy. Additionally, the self-attention mechanism from Transformer is introduced, combining its ability to learn global context information with CNNs capacity to learn local relative position information. During feature extraction, a feature coupling unit (FCU) is embedded to continuously interact the local features and global information in both branches, maximizing the retention of both local and global information and enhancing the performance of the algorithm. The proposed algorithm can classify video frames into three states: black smoke, white smoke, and no smoke. Experimental results show that the improved network can better extract smoke features, reducing the false alarm rate while increasing the accuracy to 97.8%, confirming the excellent performance of the algorithm.
Artificial Intelligence and Data Mining
-
Inverse reinforcement learning algorithm based on D2GA
- DUAN Cheng-long, YUAN Jie, CHANG Qian-kun, ZHANG Ning-ning
-
2024, 46(11):
2053-2062.
doi:
-
Abstract
(
34 )
PDF (1976KB)
(
72
)
-
Aiming at the difficulty in obtaining expert demonstrations and the low utilization rate of generated samples in the traditional generative adversarial reinforcement learning,a double discriminator generative adversarial (D2GA) inverse reinforcement learning algorithm based on hindsight experience replay (HER) is proposed.In this algorithm,HER automatically synthesizes positive expert-like samples,and conducts adversarial training with negative samples generated by D2GA and reinforcement learning algorithm soft actor-critic (SAC).Based on the solved optimal reward function,SAC is used to solve the optimal strategy.The proposed D2GA algorithm is compared with the classical inverse reinforcement algorithm on four tasks in the Fetch environment.The results show that the success rate of D2GA in completing the task in relatively few rounds can reach ideal performance without available demonstration data,which is better than the current popular inverse reinforcement learning algorithm.
-
Construction and algorithm research of emergency material distribution model based on deprivation
- PENG Pin, WANG Xin-yue
-
2024, 46(11):
2063-2070.
doi:
-
Abstract
(
24 )
PDF (752KB)
(
71
)
-
This paper focuses on the emergency material scheduling problem after sudden natural disasters, taking into account the victims psychological distress perception and post-disaster road conditions. By referring to the scarcity theory, the victims distress function is established. With the goal of minimizing both the psychological distress cost of victims and the emergency material transportation cost, a multi-objective emergency material scheduling model under the background of sudden natural disasters is constructed. The model is solved by a rapid non-dominated sorting genetic algorithm. Finally, a case study of a certain area affected by the Wenchuan earthquake is conducted to verify the effectiveness of the model and algorithm.