High Performance Computing
-
State of the art analysis of China HPC 2024
- ZHANG Yun-quan, DENG Li, YUAN Liang, YUAN Guo-xing
-
2024, 46(12):
2091-2098.
doi:
-
Abstract
(
5 )
PDF (1031KB)
(
5
)
-
In this paper, according to the latest China HPC TOP100 rank list released by CCF TCHPC in the late November, the total performance trends of China HPC TOP100 and TOP 10 of 2024 are presented. Followed with this, characteristics of the performance, manufacturer, and application area are analyzed separately in detail.
-
Design and FPGA implementation of a high-precision double step branching hybrid CORDIC algorithm
- CHEN Xiao-wen, RUI Zhi-chao, ZHU Qi-jin, DONG Yu, MENG Yu,
-
2024, 46(12):
2099-2108.
doi:
-
Abstract
(
6 )
PDF (1142KB)
(
1
)
-
The CORDIC (coordinate rotation digital computer) algorithm is an approach used for computing trigonometric functions and other mathematical operations. It is widely applied in complex fields such as digital signal processing and computer graphics. The CORDIC algorithm, which only requires addition, subtraction, and shift operations, is particularly suited for hardware implementation. A limitation of the traditional CORDIC algorithm is its excessive number of iterations. Many studies have aimed to optimize this, but these optimizations often increase hardware overhead and may lead to precision loss. To address this, this paper proposes an optimized CORDIC algorithm based on the Hybrid CORDIC algorithm and the double step branching CORDIC algorithm, called high-precision double step branching hybrid CORDIC (HD CORDIC) algorithm. This algorithm reduces the number of iterations to N/4+“1” (where N is the number of micro-rotation angles and the bit width), presents a new parti- tioning formula for the hybrid radix set to achieve high precision of ε<2-(N-2), which is similar to the basic CORDIC algorithm (ε<2-(N-1)), and does not require the calculation of the scaling factor K. The HD CORDIC algorithm employs a pipelined architecture with only N/4+3 pipeline stages (while the basic CORDIC algorithm without scaling factor compensation operation is N+2). This algorithm was implemented in hardware using Verilog and synthesized on the XILINX Zynq-7000 xc7z100ffv900-2 FPGA platform. Experimental results show that when the input angle bit width is 16, the operating frequency is 315.66 MHz and it only takes 6 clock cycles to complete one sine & cosine function operation. Compared with the XILINX CORDIC IP, the HD CORDIC algorithm reduces the processing time by 59.13%, the LUT overhead by 55.74%, the register overhead by 80.24%, and the power consumption by 35.99%.
-
A method for converting mesh files from open-source pre-processing software to universal finite element solver
- TIAN Zhuo, DING Jia-xin, ZHANG Chang-you, SHAO Yun-xia
-
2024, 46(12):
2109-2116.
doi:
-
Abstract
(
4 )
PDF (1195KB)
(
0
)
-
Finite element analysis (FEA) software enables numerical simulation of products, reducing the number of experiments, lowering research and development costs, and accelerating the innovative design cycle. It is one of the core components of industrial software. However, the fact that over 95% of users in China rely on foreign commercial FEA software has become one of the critical bottlenecks in the development of industrial software. Among the components, mesh generation technology is a key aspect of FEA software, as it discretizes the computational domain for subsequent finite element solutions. Yet, current research and development in FEA software primarily focus on solver, while mesh generation mostly relies on commercial or open-source software. The output file formats of open-source mesh generation software often lack universality, preventing compatibility with finite element solver. Therefore, this paper investigates and implements a method for converting mesh files from open-source preprocessing software to universal finite element solver. It analyzes and compares the differences in data organization between mesh files and numerical solution files, achieving unification and standardization. The converted mesh files are compatible with mainstream commercial and open-source finite element solver, exploring a potential path for the autonomy of industrial simulation software.
-
Bowtie 2-NUMA: Gene sequence alignment application with NUMA architecture adaptability
- WANG Qiang, SUN Yan-jie, QI Xing-yun, XU Jia-qing
-
2024, 46(12):
2117-2127.
doi:
-
Abstract
(
4 )
PDF (1706KB)
(
0
)
-
Bowtie 2, as one of the most widely used second-generation sequencing software in the field of bioinformatics, is characterized by its computational intensity. How to conduct adaptive optimization based on the architecture of multi-core platforms to improve parallel efficiency has become an urgent problem to be solved. This paper first analyzes the diversity of non-uniform memory access (NUMA) architectures and the structural bottlenecks of Bowtie 2 under various NUMA architectures, including memory access congestion and low last level cache hit rates. Then, based on the performance characteristics of Bowtie 2 on different NUMA platforms, optimizations are carried out in three aspects: index replication, memory allocation, and data partitioning, leading to the proposal of Bowtie 2- NUMA. Finally, experiments show that Bowtie 2-NUMA can achieve adaptive optimization based on the architecture for different computing platforms, thereby improving parallel efficiency.
-
Constructing and analyzing deep learning task dataset for R&D GPU clusters
- LUO Jing, YE Zhi-sheng, YANG Ze-hua, FU Tian-hao, WEI Xiong, WANG Xiao-lin, LUO Ying-wei,
-
2024, 46(12):
2128-2137.
doi:
-
Abstract
(
3 )
PDF (1324KB)
(
0
)
-
In recent years, with the growing demand for training deep learning models, research institutions and enterprises have established shared GPU clusters to reduce costs and improve efficiency. Existing research mainly focuses on task scheduling and resource allocation in enterprise-level GPU clusters. However, this paper focuses on the Pengcheng Cloud Brain I, a research and development GPU cluster, by monitoring and collecting key indicators during task runtime. It constructs a dataset for deep learning training tasks, named the Pengcheng Cloud Brain I Task Dataset, which includes fine-grained time-series resource usage information for tasks. This dataset represents the first publicly available dataset tailored for R&D GPU clusters. It reveals the phenomenon of low resource utilization in R&D GPU clusters and provides a basis and reference for designing schedulers with high resource utilization for R&D GPU clusters, thereby promoting research on task scheduling and resource allocation mechanisms.
-
OASIS: An interference-aware online scheduling algorithm for deep learning jobs
- JING Chao, BI Yu-shen
-
2024, 46(12):
2138-2148.
doi:
-
Abstract
(
2 )
PDF (993KB)
(
0
)
-
Since GPU can accelerate the processing of deep learning jobs, many researchers aim to reduce job completion time by improving GPU utilization. Different from the traditional approach of dedicating GPU resources to a single job to reduce completion time, this paper considers the issue of job colocation (i.e., executing multiple jobs simultaneously on the same GPU to effectively improve GPU utilization and reduce job completion time) and proposes an interference-aware online scheduling algorithm for deep learning jobs (OASIS). This algorithm first uses an improved machine learning approach to construct a prediction model for the resources required by jobs in the context of job colocation. Then, to calculate the interference values between jobs, a job combination model is designed. The interference values calculated by this model are used to proactively adjust the job scheduling strategy to avoid ineffective scheduling, thereby reducing job completion time. Finally, experiments are deployed in a real-world environment, and the results show that compared to the classical FCFS, MBP, and SJF algorithms, the proposed OASIS algorithm not only reduces the average total job completion time by 5.7%, but also decreases the average energy consumption by 4.0%. These results fully demonstrate the effectiveness and superiority of the proposed algorithm.
Computer Network and Znformation Security
-
Vulnerability analysis and verification of 5G-AKA authentication mechanism
- HAN Xiao-xuan, ZHOU Wen-an, HAN Zhen
-
2024, 46(12):
2149-2157.
doi:
-
Abstract
(
3 )
PDF (3181KB)
(
0
)
-
Research on the security of authentication mechanism has been an important concern in mobile communication, and each generation of mobile communication standards has developed different authentication and key agreement (AKA). With the diversification of access terminal types and access scenarios in 5G IoT, 3GPP has developed an unified user security access authentication mechanism, 5G-AKA, which is still found to be vulnerable after investigation. In this paper, by analyzing the request parameters and response contents in the bidirectional authentication process of 5G-AKA, the risk of user authentication identifier (SUPI) leakage is found, and a SUPI eavesdropping attack model is designed. Based on the UERANSIM and open5gs testing platforms, this paper designs the network topology and experimental scenarios, simulating signaling traffic to validate the aforementioned model.
-
Malicious behavior detection method based on iFA and improved LSTM network
- SHEN Fan-fan, TANG Xing-yi, ZHANG Jun, XU Chao, CHEN Yong, HE Yan-xiang
-
2024, 46(12):
2158-2170.
doi:
-
Abstract
(
3 )
PDF (1192KB)
(
0
)
-
In recent years, the scale and performance of data platforms and systems have expanded rapidly, making security performance increasingly critical. Existing malicious behavior detection schemes based on deep learning lack optimization algorithms tailored to the models, resulting in a lack of self-optimization capabilities. This paper proposes a malicious behavior detection method called iFA-LSTM (improved firefly algorithm and improved long short-term memory network), which leverages an improved firefly algorithm and an improved LSTM network to effectively perform binary classification detection of malicious behaviors. The proposed method is validated using the UNSW-NB15 dataset. In single-attack binary classification experiments, the method achieves an average recognition accuracy of 99.56%, while in mixed-attack binary classification experiments, the average recognition accuracy reaches 98.79%. Additionally, the iFA fully demonstrates its effectiveness. The proposed method can detect malicious behaviors quickly and effectively, holding great promise for application in security monitoring and recognition of malicious behaviors.
-
A distributed location anonymization method based on multi-blockchain collaboration
- YANG Xu-dong, LI Qiu-yan, GAO Ling, LIU Xin, DENG Ya-ni
-
2024, 46(12):
2171-2185.
doi:
-
Abstract
(
5 )
PDF (2634KB)
(
1
)
-
In recent years, researchers have conducted in-depth studies on location anonymity-based privacy protection methods amidst the issue of privacy leakage in location-based services (LBS). However, these studies overlook the performance and security bottlenecks inherent in the anonymity process during collaboration, as well as the potential for privacy leakage in anonymous sets due to attacks lever- aging semantic knowledge. To address these issues, this paper proposes a distributed anonymous location privacy protection method based on multi-blockchain collaboration, integrating the concepts of cross-chain collaboration across multiple blockchains and k-anonymity. In this approach, firstly, to tackle the privacy leakage caused by centralized anonymity, this paper present a method for selecting anonymous collaboration users based on cross-chain collaboration between private and public blockchains. Secondly, to ensure the reliability of user collaboration behavior during anonymity and the correctness of cross-chain data transmission, designing an anonymous collaboration consensus mechanism. Lastly, to mitigate privacy leakage arising from overlooked individual-related semantics, this paper devises an anonymous set construction method that combines differential privacy mechanisms with semantic diversity entropy for selecting anonymous locations. Experiments conducted on real-world datasets demonstrate that the proposed method can effectively enhance the semantic privacy security of locations, outperforming existing methods in terms of privacy and usability.
-
An intrusion detection model for vehicular networks based on optimized feature stacking and ensemb
- LIU Pei, LIU Chang-hua, LIN Qiao-ling
-
2024, 46(12):
2186-2195.
doi:
-
Abstract
(
3 )
PDF (2022KB)
(
0
)
-
With the increasing complexity of in-vehicle networks and the diversity of vehicle-to- everything (V2X) connections, the cybersecurity risks faced by the internet of vehicles (IoV) have significantly escalated. Addressing the issues of insufficient feature extraction and inaccurate model classification in existing intrusion detection systems, a novel intrusion detection model for IoV based on feature stacking and ensemble learning is proposed. This model slices one-dimensional data traffic into segments based on feature steps, stacks them into images in the third dimension, and utilizes the VGG19 model to extract specific types of features, the Xception model to capture intra-channel and inter-channel information, and the Inception model to process complex image categories and obtain multi-scale information. These three models are then integrated into the CS-IDS model. The proposed model was tested on two open-source IoV datasets, Car-Hacking and the traffic dataset CIC-IDS2017, achieving F1 scores of 99.97% and 96.44%, respectively. Moreover, the model can complete rapid detection of a single traffic flow within 12 ms, demonstrating the effectiveness and availability of the proposed CS-IDS model.
-
Long-term object tracking based on template update and redetection
- XU Shu-ping, WEI Hao-bo, SUN Yang-yang, WAN Ya-juan
-
2024, 46(12):
2196-2204.
doi:
-
Abstract
(
2 )
PDF (1877KB)
(
0
)
-
In order to solve the problem of frequent disappearing and reappearing of targets due to occlusion and out of view in long-term target tracking scenes, a long-term target tracking algorithm based on update and redetection (LTUSiam) is designed. Firstly, based on the basic tracker Siamese region proposed network(SiamRPN), a three-level cascade gated cycle unit is introduced to judge the target state and choose the right time to update the template information adaptively. Secondly, a redetection algorithm based on template matching is proposed. The candidate region extraction module is used to relocate the target position and size, and the evaluation score sequence is used to judge the target loss to determine the tracking state of the next frame. Experiments show that the success rate and precision of LTUSiam on LaSOT dataset reach 0.566 and 0.556 respectively, and the F1-score of LTUSiam on VOT2018_LT dataset is 0.644, which has better robustness in dealing with target loss recurrence problem, and effectively improves the performance of long-term tracking.
-
A MFFBSNet crowd counting algorithm based on multi-scale feature fusion and background suppression
- ZHAO Jia-bin, XU Hui-ying, ZHU Rong, CHEN Bin, WANG Xiao-Lin, , ZHU Xin-zhong
-
2024, 46(12):
2205-2214.
doi:
-
Abstract
(
4 )
PDF (1868KB)
(
3
)
-
Aiming at the problems of scale variation, uneven distribution, and background occlusion of dense crowds in complex scenes, a crowd counting algorithm MFFBSNet based on multi-scale feature fusion and background suppression is proposed.The first 13 layers of the visual geometry group network VGG-16 are utilized as the front-end of the network. An atrous spatial pyramid pooling (ASPP) and a pyramid split attention (PSA) mechanism based on a lightweight design are introduced to construct a multi-scale feature fusion module, which addresses the problem of scale variation in dense crowds; In the middle of this network, spatial and channel attention mechanisms are incorporated to refine the feature maps, highlighting the head regions in the image; The backend of this network employs atrous convolution, which enlarges the receptive field without losing image resolution, to generate a background segmentation attention map. This suppresses background noise in the image and enhances the quality of the crowd density map. Experimental results on three public datasets, namely ShanghaiTech, UCF_CC_50, and NWPU-Crowd,demonstrate that the proposed crowd counting algorithm based on the MFFBSNet achieves higher counting accuracy compared to methods such as MCNN,SwitchCNN,and CSRNet.
-
An object contour tracking method based on Siamese network
- LI Hao
-
2024, 46(12):
2215-2226.
doi:
-
Abstract
(
3 )
PDF (4310KB)
(
0
)
-
Accurate scale estimation poses a challenge in object tracking, with existing methods plagued by high computational complexity, numerous hyperparameters, and low accuracy. To address these issues, this paper proposes a Siamese segmentation network for object tracking utilizing object contours. This network consists of a twin sub-network and a contour segmentation network, offering the advantage of eliminating the need to predefine anchor boxes based on prior knowledge, thereby reducing the number of hyperparameters. Furthermore, a multi-point regression-based object contour tracking method is implemented. This method models object tracking through region classification and contour regression, enabling the simultaneous acquisition of various object states, including upright bounding boxes, rotated bounding boxes, and contours. The tracking process of this method is as follows: first, the Siamese sub-network is used to estimate the initial bounding box of the object; second, the feature vector of the initial bounding box is transformed into an object contour through the contour segmentation network; finally, the final bounding box is fitted based on the object contour. Experimental results on the OTB-2015 (Success=70%), VOT-2020 (EAO=52%), TrackingNet (AUC=78.9%), and LaSOT (AUC=64.1%) datasets demonstrate that the proposed tracking method outperforms existing advanced object tracking methods in terms of tracking performance.
-
A multi-layer mask recognition method for Tangut characters
- MA Jin-lin, YAN Qi, MA Zi-ping
-
2024, 46(12):
2227-2238.
doi:
-
Abstract
(
4 )
PDF (2179KB)
(
0
)
-
Aiming at the problem of poor recognition ability of existing methods for fuzzy and mutilated Tangut characters, a Tangut character recognition model MMSFTR is proposed. Firstly, a multi-layer mask learning strategy is introduced to extract key character features in a hierarchical manner, assisting the model in understanding the internal structure of the Tangut characters more efficiently, and improving its ability to describe complex features of Tangut characters. Secondly, a multi-scale feature fusion module is designed to extract richer multi-scale features. Then, a channel adaptive attention module is proposed to better select and focus on information from specific channels. A mask attention module is also designed to improve the model's perception capabilities. Finally, a feature enhancement module is designed to optimize multi-level features of the network and enhance deep-level features. Through the collaborative work of these 4 modules, MMSFTR achieves the desired results. Experimental results show that MMSFTR achieves a recognition accuracy of 99.40% on the TCD-E dataset, effectively enhancing the recognition effect of fuzzy and mutilated Tangut characters.
Artificial Intelligence and Data Mining
-
A word-pair relationship modeling method for aspect-based sentiment information extraction in dialogue text
- ZENG Tao, WANG Jing-jing, ZHANG Han, LIU Yi-ding
-
2024, 46(12):
2239-2251.
doi:
-
Abstract
(
6 )
PDF (1647KB)
(
0
)
-
Aspect-based sentiment analysis aims to capture fine-grained sentiment information contained in text and has drawn considerable attention due to its wide applications. However, traditional research in aspect-based sentiment analysis predominantly relies on non-interactive review texts, with limited investigation into aspect-based sentiment analysis within interactive dialogue contexts. Addressing this gap, this paper proposes a joint extraction task for aspect-based sentiment information in interactive dialogue scenarios. The task aims to extract complete fine-grained sentiment information triplets consisting of target aspects, opinion expressions, and corresponding sentiment polarities, thereby obtaining comprehensive sentiment information from the final utterance in an interactive dialogue. To this end, this paper devises an end-to-end extraction method based on word-pair relation modeling, where in the relationship between word pairs are modeled to map dialogue text onto a directed graph, transforming the decoding process into a search for specific cyclic structures within the graph. To enhance the accuracy of word-pair relationship modeling, this paper introduces a novel model architecture that integrates relative distance information and dialogue turn information when constructing word-pair relationship representations, and utilizes multi-granularity 2D convolution to enhance interaction between word pairs. Additionally, this paper proposes a dynamic loss weighting method to effectively mitigate the issue of imbalanced category distributions in word-pair relation within dialogue texts. Experimental results demonstrate that, the proposed method outperforms strong baseline methods, achieving an average F1 score improvement of 7.70% and a maximum improvement of 15.05%.
-
A contradiction separation unit resulting deduction method and its application
- CAO Feng, XIE Yu, YI Jian-bing, LI Jun
-
2024, 46(12):
2252-2260.
doi:
-
Abstract
(
4 )
PDF (768KB)
(
0
)
-
First-order logic automated theorem proving is an important branch in artificial intelligence. In order to improve the deduction efficiency of unit resulting resolution, a new unit resulting deduction method is proposed based on multi-clause, dynamic and synergized deduction, named contradiction separation unit resulting deduction method. Its definition, deduction method, deduction advantage analysis and algorithm implementation are given in detail. The proposed deduction method allows two or more clauses involved in each deduction steps, and allows multiple non-unit clauses to participate in one unit resulting deduction, and it can better handle the long clauses. The proposed deduction algorithm can select optimal clause under the strategy and dynamically set the variable unification complexity, and optimize the deduction search path by backtracking mechanism. Taking the last two years international prover competition problems (The total number is 500 respectively) and the most difficult problems with a rating of 1 from the TPTP benchmark database as the test object, the Eprover with the proposed contradiction separation unit resulting deduction algorithm can solve 10 theorems and 10 theorems more than the original Eprover respectively. It can solve 17 theorems and 13 theorems respectively that original Eprover cannot solve, and can solve 9 theorems with a rating of 1 that cannot be solved by all other provers. The experimental results show that the proposed contradiction separation unit resulting deduction method can be effectively applied to the first-order logic automated theorem proving.
-
A decoupled contrastive clustering integrating attention mechanism
- LIU He-bing, KONG Yu-jie, XI Lei, SHANG Jun-ping
-
2024, 46(12):
2261-2270.
doi:
-
Abstract
(
4 )
PDF (1548KB)
(
1
)
-
To address the issue of negative-positive coupling between positive and negative samples in contrastive clustering, a decoupled contrastive clustering integrating attention mechanism (DCCIAM) is proposed. Firstly, data augmentation techniques are employed to expand the image data to obtain positive and negative sample pairs. Secondly, a convolutional block attention module (CBAM) is integrated into the backbone network to make the network pay more attention to target features. The expanded image data is then input into the backbone network to obtain a feature. Subsequently, the featurespassed through a neural network projection head to calculate instance loss and clustering loss separately. Finally, feature representation and cluster assignment are performed by combining the instance loss and clustering loss. To validate the effectiveness of the DCCIAM method, experiments are conducted on public image datasets CIFAR-10, STL-10, and ImageNet-10, achieving clustering accuracies of 80.2%, 77.0%, and 90.4%, respectively. The results demonstrate that the decoupled contrastive clustering method integrated with an attention mechanism performs well in image clustering.
-
An automatic ocular artifact removal algorithm based on channel selection and adaptive entropy threshold
- LI Yi-lin, ZHOU Biao
-
2024, 46(12):
2271-2280.
doi:
-
Abstract
(
3 )
PDF (1894KB)
(
0
)
-
To enhance the effectiveness of removing ocular artifacts from electroencephalogram (EEG) signals, an automatic ocular artifact removal algorithm is proposed that combines fast indepen- dent component analysis (FastICA) and heuristic wavelet thresholding (HWT), using fuzzy entropy as the criterion for identifying ocular artifacts. Firstly, a channel selection algorithm is employed to reduce the dimensionality of the original EEG signals, thereby improving computational efficiency. Subsequently, the FastICA algorithm is utilized to decompose the selected EEG signals into independent components. Then, fuzzy entropy analysis is conducted to identify the independent components containing ocular artifacts. Next, the HWT algorithm is applied to eliminate the ocular artifact components from those identified components while preserving the useful EEG signals. Finally, inverse wavelet transform and inverse ICA reconstruction are performed to obtain the artifact-free EEG signals. The proposed algorithm was validated using the BCI Competition IV dataset. The results indicate that, compared to existing algorithms, this algorithm performs well across multiple performance metrics, with a signal-to-noise ratio (SNR) improvement of approximately 12% compared to existing kurtosis-based artifact identification algorithms.