High Performance Computing
-
Large-scale 3D electromagnetic modeling in frequency domain using integration equation method
- XIAO Tiao-jie, ZHOU Feng, ZHENG Xuan-yu, LIU Jian, CHEN Lin, LIU Jie, YI Ming-kuan, CHEN Xu-guang, GONG Chun-ye, YANG Bo, GAN Xin-biao, LI Sheng-guo, ZUO Ke,
-
2023, 45(11):
1901-1910.
doi:
-
Abstract
(
192 )
PDF (1818KB)
(
240
)
-
Electromagnetic method of geoelectrical frequency domain has a wide range of applications in exploring Earth's deep structure, petroleum exploration, environmental and engineering surveying, and its accuracy and efficiency of numerical simulation directly affect the interpretation results of data. However, there are currently issues with low accuracy and long computation time in three-dimensional numerical simulation of frequency domain electromagnetic fields, as well as limitations in computational scale. This paper proposes to use integral equation method and direct solution method to improve the solution accuracy, and adopt technologies such as hybrid parallel processing at multiple levels and multiple grain sizes, distributed storage, etc., to greatly reduce the computation time and expand the computational scale. This paper implements a fast, high-precision, and highly scalable three-dimensional numerical simulation method of frequency domain electromagnetic fields, which includes parallel processing between frequency points, parallel filling of impedance matrix, and parallel direct solution of equations. Firstly, the theoretical framework of integral equation method and its parallel implementation solution are introduced in detail. Then, typical cases are selected to verify the correctness of the program by comparing with previous calculation results. Finally, the scalability is tested for a large-scale example with 16 frequencies, 16×12 495 unknowns, and 861 observation points. Compared to a single node with 32 processes, when the computational scale reaches 256 nodes and 8 192 processes, the speedup ratio is 69.69 and the parallel efficiency is 27.22%. This large-scale parallel algorithm is applicable to both geomagnetic and controlled source audio geomagnetic integral equation methods.
-
Distributed Kriging interpolation algorithm optimization for large region carbon satellite data
- ZHOU Xiao-hua, WANG Xue-zhi, ZHOU Yuan-chun, MENG Zhen,
-
2023, 45(11):
1911-1921.
doi:
-
Abstract
(
104 )
PDF (1914KB)
(
220
)
-
To address the issues of long computation time and difficulty in parallel acceleration when using the original Kriging algorithm for interpolation of carbon satellite data at a large regional scale, the Kriging algorithm and its key parts are restructured and optimized. The whole interpolation process is broken up into several fine-grained operations and then organized into a distributed DAG workflow based on dependency relationship and data features. Finally, a distributed computing framework based on the double-tier scheduling structure is designed to accelerate the interpolation workflow on the distributed computing cluster. Experiments show that methods and framework described above can perform Kriging interpolation of different regional scales with high efficiency, and the efficiency advantages are more significantly than Spark at the large regional scale.
-
A high-speed meteorology data management system based on non-volatile memory
- CHEN Chao, GU Qing-feng
-
2023, 45(11):
1922-1928.
doi:
-
Abstract
(
96 )
PDF (853KB)
(
214
)
-
As typical big data, meteorological data have the characteristics of large scale, fast growth, and complex types. Among them, the meteorological model data are continuously generated by high-performance computers, and they are also faced with various complex queries of meteorological services such as weather forecasts. Therefore, the management of meteorological model data faces severe space and performance challenges. Non-volatile memory (NVM) is a new emerging type of storage medium that is practically applied in recent years. NVM has the advantages of high performance, high storage density, and non-volatility. However, it also has many special hardware characteristics, which requires software-hardware co-design to leverage its performance advantage. Therefore, based on the hardware characteristics of NVM, a high-speed meteorological data management system is designed. It has 114.2% higher performance than the current memory data management system, and it has larger storage capacity and lower cost per byte at the same time.
-
Design and optimization of scalar memory access unit in VLIW DSPs
- ZHENG Kang, LI Chen, CHEN Hai-yan, LIU Sheng, FANG Liang
-
2023, 45(11):
1929-1940.
doi:
-
Abstract
(
134 )
PDF (1096KB)
(
229
)
-
In recent years, the speed difference between processors and memories has become increasingly larger with the development of integrated circuit technology, and memories have increasingly become the bottleneck that limits the performance of computing systems. For DSPs in embedded and low-power consumption areas, their architectures and application scenarios are different from general-purpose CPUs, and the memory access design of CPUs cannot meet the memory access requirements of DSPs. To address the requirements of Very Long Instruction Word (VLIW) DSPs in terms of real-time memory access, order and fixed delay, and efficient data consistency, a scalar memory access unit suitable for DSPs is designed. The configurable design can meet the real-time memory access requirements of DSPs. The ID-based ordering mechanism ensures the order and fixed delay requirements of VLIW with a storage overhead of 87.5 B. The write back operation, designed for data consistency, is accele- rated by searching leading-one in hardware. The time consumed by the optimized write back operation are 26.4%, 51.3% and 76.2%, compared to the basic overhead of the progressive scan method, when 25%, 50% and 75% lines of the cache need to be written back. The consistency write back performance is proportional to the number of lines under concern, regardless of the cache capacity.
-
A learnt clause evaluation algorithm based on recent literal polarity assignment
- FENG Xin-yan, WU Guan-feng, ZHANG Ding-rong, WANG Ke-ming,
-
2023, 45(11):
1941-1948.
doi:
-
Abstract
(
97 )
PDF (762KB)
(
210
)
-
In order to maintain the size of the learned clause database and perform unit propagation with reasonable cost during the SAT solver’s solving process, it is necessary to evaluate the learnt clauses and remove those that are not useful to the solving process. Therefore, it is necessary to propose a new method for evaluating clause usefulness, including the analysis and deletion of learned clauses, for dynamic management strategies of the learned clause database, thereby retaining the clauses that are most effective for solving and improving solving efficiency. This paper starts by capturing the recent polarity assignments of learnt clauses, combined with a heuristic based on literal polarity commonly used in the backtracking process of modern solvers-progress saving, to infer the relevance of a given learnt clause to the remaining search steps. Based on the two state-of-the-art Conflict Driven Clause Learning (CDCL) solvers, Glucose and MapleLCMDistChronoBT, their clause evaluation algorithms are improved and tested. The experimental results show that this clause evaluation strategy based on the recent literal polarity assignment can generally improve the solving efficiency of CDCL serial and parallel solvers, and effectively reducing the excessive time consumption of original solvers on some problems. Besides, 2 more Conjunctive Normal Form (CNF) files are solved at the level of advanced solvers, and the average solve time of a single file is decreased by 13~34 seconds.
Computer Network and Znformation Security
-
Recent progress of new technologies for satellite network transport protocol optimization
- LIANG Xiang-bin, ZHAO Bao-kang, PENG Wei
-
2023, 45(11):
1949-1959.
doi:
-
Abstract
(
121 )
PDF (836KB)
(
311
)
-
Satellite network has attracted extensive attention from academia and industry due to its characteristics of wide coverage, high bandwidth, strong survivability, natural broadcast and so on. However, due to the characteristics of long propagation delay, high bit error rate and asymmetric bandwidth, the traditional TCP protocol has poor performance in satellite network. In recent years, with the rapid development and large-scale deployment of the giant constellations represented by "Starlink", the satellite network has shown the characteristics of high dynamics, which poses a severe challenge to the high-performance transmission of satellite constellation network. This paper summarizes the research progress of new technologies for satellite network transport protocol optimization that have emerged in recent years. Especially, the new technologies represented by multi-path transmission, QUIC, intelligent transmission optimization are deeply compared and analyzed.
-
QUIC encryption and decryption offloading based on data processing unit
- WANG Ji-chang, L Gao-feng, LIU Zhong-pei, YANG Xiang-rui
-
2023, 45(11):
1960-1969.
doi:
-
Abstract
(
197 )
PDF (1187KB)
(
233
)
-
QUIC, as an emerging transmission protocol parallel to TCP, follows the TCP research approach. The mainstream research way is hardware offloading, which offloads computation-intensive functional modules to network devices and replaces host CPU computation by hardware processing. However, due to the poor generality of hardware offloading, although its performance is high, it cannot guarantee user programmability. To overcome this limitation, this paper proposes a software offloading model—NanoBPF, which is a protocol offloading model based on the RISC-style many-core DPU (Data Processing Unit). By modifying the Bootloader's startup code, it guides the eBPF (extended Berkeley Packet Filter) code as a runtime environment and offloads encryption and decryption functional modules with high CPU utilization rates in the protocol stack using software. The encryption and decryption functional modules are written in high-level languages (C) and compiled into custom BPF (Berkeley Packet Filter) bytecode dynamically loaded into the DPU. The throughput and fairness of the prototype system are validated using local and Docker-based network topologies. The results show that software offloading of message encryption and decryption can increase the message throughput of the protocol stack by nearly 13%, and under certain conditions, it can ensure link fairness with TCP.
-
A fog target detection algorithm fusing high-resolution network
- ZHANG Qian, CHEN Zi-qiang, SUN Zong-wei, LAI Jing-an
-
2023, 45(11):
1970-1981.
doi:
-
Abstract
(
122 )
PDF (1924KB)
(
234
)
-
To address the issues of false detection and missed detection in foggy weather scenarios where images are blurred and targets are difficult to distinguish, a target detection algorithm that fuses a high-resolution network, named High Resolution Cascade RCNN (HR-Cascade RCNN), is proposed. This algorithm adopts HRNet as the feature extraction network for Cascade RCNN, connects parallel sub-networks with different resolutions to extract multi-scale feature information, thus reducing information loss during downsampling and enhancing the semantic representation of targets. Secondly, the CIoU loss function is used to replace the original Smooth L1 loss function, and a penalty term is introduced to measure the correlation between the aspect ratio of real bounding boxes and detected bounding boxes, thus optimizing the convergence performance of the network, and helping to improve the positioning accuracy of detected bounding boxes. Finally, SoftNMS is adopted to improve the candidate box selection mechanism, reducing the false negative rate in situations such as vehicle occlusion, and enhancing the detection ability of the network. Experimental results on real foggy weather datasets RTTS and synthetic foggy weather datasets Foggy Cityscapes show that compared with the original Cascade RCNN, HR-Cascade RCNN improves mAP by 5.9% and 3% respectively.
-
An efficient and high-precision 3D gaze estimation method based on MLP
- WU Zhi-hao, ZHANG De-jun, WU Yi-qi, CHEN Yi-lin
-
2023, 45(11):
1982-1990.
doi:
-
Abstract
(
129 )
PDF (1137KB)
(
221
)
-
With the wide application of convolutional neural network (CNN) in the field of computer vision and the release of a large number of 3D gaze datasets, research on 3D gaze estimation based on the combination of apparent and deep learning has received more and more attention. However, due to the complex structure of CNN, such methods need to be further improved in occasions with high real-time requirements. Recent studies have shown that MLP models with simpler structures can achieve performance comparable to the current best CNN and Transformer models. Inspired by this, an efficient and high-precision 3D gaze estimation method based on MLP is proposed. The MLP model is used to extract features from face and binocular images and then fuse them to derive 3D gaze. Experiment shows that, for the 31 subjects with different appearance characteristics in MPIIFaceGaze dataset and EyeDiap dataset, the proposed method UM-Net achieves gaze estimation accuracy that is comparable to CNNs-based method, and it has obvious advantages in gaze estimation speed. Therefore, it has a good application prospect in fields with high real-time requirements.
-
A multi-scale feature fusion network based fast CU partitioning in HEVC intra coding
- LIU Yu-mo, LIU Jian-fei, HAO Lu-guo, ZENG Wen-bin
-
2023, 45(11):
1991-1998.
doi:
-
Abstract
(
76 )
PDF (725KB)
(
210
)
-
High Efficiency Video Coding (HEVC) significantly improves the coding efficiency but increases the coding complexity, especially in the process of coding unit (CU) partitioning based on quadtree structure, so it is important to study the fast CU partitioning. A multi-scale feature fusion network can achieve fast HEVC CU partitioning. Therefore, the UcuNet network structure is designed by combining the U-Net and CU partitioning features. Meanwhile, asymmetric convolutional AC and CBAM attention mechanisms are used to enhance the feature extraction of pixels at different scales. In order to sufficiently train the deep learning model, the original video with different resolutions and the corresponding encoding information are collected to build a large-scale dataset. Finally, the model is embedded into the HEVC coding architecture to predict the result of CU partitioning in advance, which can effectively reduce the coding complexity caused by CU partitioning by eliminating the recursive rate distortion optimization (RDO) calculation process in the original CU partitioning method. Compared with the official HEVC test model (HM16.20), the proposed UcuNet reduces the average coding time by 68.13% while BD-BR is only decreased by 2.63%.
-
A graph similarity computation model based on adaptive structure aware pooling graph matching
- JIA Kang, LI Xiao-nan, LI Guan-yu
-
2023, 45(11):
1999-2007.
doi:
-
Abstract
(
84 )
PDF (590KB)
(
221
)
-
Graph similarity computation is one of the core operations in many graph related tasks such as graph similarity search, graph classification, graph clustering, etc. Since computing the exact distance/similarity between two graphs is typically NP-hard, based on the neural network, an Adaptive Structure Aware Pooling graph Matching Network (ASAPMM) model is proposed. ASAPMN calculates the similarity between any pair of graph structures in an end-to-end way. In particular, ASAPMN utilizes a novel self-attention network along with a modified GNN formulation to capture the importance of each node in a given graph. It also learns a sparse soft cluster assignment for nodes at each layer to effectively pool the subgraphs to form the pooled graph. On the pooled graph pairs, a node-graph match- ing network is used to effectively learn cross-level interactions between each node of one graph and the other whole graph. Comprehensive experiments on four public datasets empirically demonstrate that our proposed model can outperform state-of-the-art baselines with different gains for graph-graph classification and regression tasks.
-
iSFF-DBNet:An improved text detection algorithm in e-commerce images
- LI Zhuo-xuan, ZHOU Ya-tong
-
2023, 45(11):
2008-2017.
doi:
-
Abstract
(
194 )
PDF (1394KB)
(
256
)
-
Aiming at the problem that existing text detection models cannot accurately detect text locations due to complex backgrounds and variable text region shapes in e-commerce images, an improved text detection model, named Iterative Self-selective Feature Fusion DBNet (iSFF-DBNet), is proposed. Firstly, after extracting features from the backbone network, an attention mechanism is introduced in the process of building a Feature Pyramid Network (FPN), and an Iterative Self-selective Feature Fusion (iSFF) module is proposed to enhance the feature extraction ability of the model. Finally, a bilinear upsampling module is introduced to improve the adaptive performance of the differentiable binaryization module. Experimental results show that compared to the standard DBNet model, the recall and F-score of the improved model are increased by 6.0% and 2.4%, respectively, in the text detection task of the ICPR MTWI 2018 web-scale image dataset. Compared with other text detection models, this model achieves a balance between accuracy and recall, and can detect text more accurately.
Artificial Intelligence and Data Mining
-
A hierarchical graph attention network text classification model that integrates label information
- YANG Chun-xia, MA Wen-wen, XU Ben, HAN Yu,
-
2023, 45(11):
2018-2026.
doi:
-
Abstract
(
135 )
PDF (1062KB)
(
308
)
-
Currently, there are two main limitations in single-label text classification tasks based on hierarchical graph attention networks. First, it cannot effectively extract text features. Second, there are few studies that highlight text features through the connection between text and labels. To address these two issues, a hierarchical graph attention network text classification model that integrates label information is proposed. The model constructs an adjacency matrix based on the relevance between sentence keywords and topics, and then uses word-level graph attention network to obtain vector representations of sentences. This method is based on randomly initialized target vectors and utilizes maximum pooling to extract specific target vectors for sentences, making the obtained sentence vectors have more obvious category features. After the word-level graph attention layer, a sentence-level graph attention network is used to obtain new text representations with word weight information, and pooling layers are used to obtain feature information for the text. On the other hand, GloVe pre-trained word vectors are used to initialize vector representations for all text labels, which are then interacted and fused with the feature information of the text to reduce the loss of original features, obtaining feature representations that are distinct from different texts. Experimental results on five public datasets (R52, R8, 20NG, Ohsumed, and MR) show that the classification accuracy of the model significantly exceeds other mainstream baseline models.
-
An intuitionistic fuzzy three-way decision method based on perturbation dominance relationship
- ZHOU Shi-ji, TANG Xiao, ZHAO Rong-le, LIANG Yan-ling,
-
2023, 45(11):
2027-2035.
doi:
-
Abstract
(
79 )
PDF (691KB)
(
170
)
-
To deal with the problems of too strict requirements for the existing intuitionistic fuzzy dominance relationship, large loss and incomplete utilization of evaluation information, an intuitionistic fuzzy perturbation dominance relationship in line with the characteristics of intuitionistic fuzzy is proposed by using the disturbance degree between intuitionistic fuzzy sets. Then, a perturbation dominance class with lower requirements than the equivalence class is obtained, so as to make greater use of evaluation information. Its related properties are discussed. Next, aiming at the problem that the conditional probability in the existing intuitionistic fuzzy three-way decision method is expressed by real numbers, which leads to the loss of uncertain information, a calculation method of conditional probability expressed by intuitionistic fuzzy numbers is proposed based on the perturbation dominance relationship, and the three-way decision and multi-attribute decision rules are given. Then, an example is given to verify the effectiveness of the method, and the sensitivity of the dominance degree and risk avoidance coefficient is analyzed.
-
A particle swarm optimization algorithm based on variable-scale black hole and population migration
- XU Wen-jun, WANG Xi-huai
-
2023, 45(11):
2036-2046.
doi:
-
Abstract
(
81 )
PDF (1156KB)
(
180
)
-
Aiming at the problems of slow convergence and premature convergence of particle swarm optimization (PSO), a PSO algorithm based on variable-scale black hole and population migration, named IRBHPSO, is proposed. The variable-scale black hole is introduced to balance the weight of the global exploration and local optimization of the algorithm. The displacement coefficient based on the hybrid strategy is introduced into the position update strategy to enhance the convergence speed of the algorithm in the early iteration and the local optimization ability in the later iteration. The Butterfly Optimization Algorithm (BOA) based on population migration is integrated into PSO as a local operator to improve the problem that PSO has slow convergence speed and is easy to fall into local optimum. IRBHPSO, PSO, and other related algorithms are simulated on 12 benchmark test functions, and Wilcoxon rank sum test is performed. The results show that IRBHPSO has better convergence accuracy, convergence speed and stability.
-
Review of recommendation based on heterogeneous information network
- WANG Chun-bo, WEN Ji-wen
-
2023, 45(11):
2047-2059.
doi:
-
Abstract
(
190 )
PDF (1321KB)
(
317
)
-
Recommendation plays an important role in satisfying users needs of information and solving information overload. Heterogeneous information network contains rich semantics and provides a new way for recommendation optimization. Based on the research of recommendation on heterogeneous information network at home and abroad, this paper conducts bibliometric analysis and visual analysis with SATI, Ucinet, NetDraw, and SPSS to obtain the current research focuses and progress. According to the clustering results of literature keywords, the previous research are mainly based on clustering, random walk, meta-path, matrix factorization, network embedding algorithms, and applied in academic research, points of interest, Web services, social friends, patent trading, news and other recommendation scenarios. There is still a large development space for recommendation research based on heterogeneous information network. Future research can be carried out in dynamic recommendation, deep network representation learning and wider applications.
-
A TextRank automatic summarization generation algorithm based on co-occurrence keywords
- YAN Hong-can, LI Bo-chu, GU Jian-tao,
-
2023, 45(11):
2060-2069.
doi:
-
Abstract
(
104 )
PDF (1069KB)
(
235
)
-
The traditional TextRank algorithm only considers the similarity between sentences but neglects the similarity between articles themselves when generating summaries, and the generated summaries often contain repeated expressions of information. Therefore, a TextRank algorithm based on co-occurrence keywords is proposed. The article is represented as a sentence vector by word2Vc model. Considering the category of the article, the co-occurrence keywords of this kind of article are taken as parameters to participate in the iterative calculation of sentence weight. The sentence weight obtained by iteration is corrected by sentence length, keyword number and other information. The experimental results show that the proposed algorithm can improve the comprehensiveness and accuracy of the summary generation. At the same time, this algorithm uses MMR to remove the redundancy of abstracts, which improves the problem of repeated representation of abstracts.
-
A fault feature extraction method based on FCEEMD composite screening
- ZHOU Cheng-jiang, JIA Yun-hua, ZHANG Yu-kuan, LU Jun
-
2023, 45(11):
2070-2077.
doi:
-
Abstract
(
106 )
PDF (1443KB)
(
186
)
-
Aiming at the defects of fast ensemble empirical mode decomposition (FEEMD) and intrinsic mode functions (IMF) selection method in feature extraction, a fast complementary ensemble empirical mode decomposition (FCEEMD) composite screening based fault feature extraction method is proposed. Firstly, pairs of white noise with opposite signs are introduced to neutralize the residual noise in FEEMD and suppress the mode aliasing, and obtain a series of IMF. Secondly, a composite screening model is constructed based on the energy and correlation coefficients, and the reconstructed signal is constructed according to the effective IMF obtained by screening. Finally, the periodic pulse features contained in the reconstructed signal are extracted by Hilbert envelope demodulation to diagnose the bearing fault. The analysis results of Case Western Reserve bearing data show that the method can extract bearing fault features efficiently and accurately, which has reference significance and application prospects in the fault diagnosis of rotating machinery.
-
A differential mutation and territorial search equilibrium optimizer and its application in robot path planning
- ZHANG Bei, MIN Hua-song, ZHANG Xin-ming
-
2023, 45(11):
2078-2090.
doi:
-
Abstract
(
107 )
PDF (947KB)
(
225
)
-
Equilibrium Optimizer (EO) is a recently proposed excellent metaheuristic algorithm, but it encounters issues such as insufficient search ability, poor operability, and low search efficiency when solving complex optimization problems. Therefore, this paper proposes an improved EO, namely Differential mutation and Territorial search EO (DTEO). Firstly, a differential mutation method with territorial search is proposed to update the concentration of the best particle. Then, an elite-worst individual particle differential mutation strategy is proposed to strengthen the worst individual. Finally, a differential mutation strategy with information sharing and a simplified concentration updating way in EO are proposed and integrated dynamically to update the other particles' centralizations to improve the operability and search ability of the algorithm and reduce the running time. The experimental results on the complex functions from CEC2014 test set demonstrate that compared with EO and other excellent algorithms, DTEO has stronger search ability, higher efficiency, and stronger operability. Experimental results on robot path planning also show DTEO is more competitive.