High Performance Computing
-
Research and implementation of a Flink-oriented load balancing task scheduling algorithm
- LI Wen-jia, SHI Lan, JI Hang-xu, LUO Yi-peng
-
2022, 44(07):
1141-1151.
doi:
-
Abstract
(
152 )
PDF (1192KB)
(
153
)
-
Apache Flink is one of the mainstream big data distributed computing engines, and task scheduling is a key issue in distributed computing systems. Due to the heterogeneity of clusters and the different complexity of operators, uneven load will inevitably appear in the big data computing system Flink. To solve this problem, a load balancing task scheduling algorithm based on resource feedback, named RFTS, is proposed. Through the three modules (real-time resource monitoring, area division, and task scheduling algorithm based on glowworm swarm optimization), the tasks in the waiting queue in the over-loaded machine are allocated to the lighter-loaded machines, so as to reduce the load unevenness of the entire cluster and improve the cluster utilization and execution efficiency of the system. Finally, through the experimental verification based on the TPC-C and TPC-H datasets, the results show that the load balancing task scheduling algorithm based on resource feedback (RFTS) can effectively improve the performance of the Apache Flink computing system in terms of execution time and throughput.
-
A parallel large dataset generator based on MPI
- GE Xu-ran, LIU Yang, CHEN Zhi-guang, XIAO Nong
-
2022, 44(07):
1152-1161.
doi:
-
Abstract
(
141 )
PDF (1763KB)
(
148
)
-
The speed of big data processing and analysis algorithms in optimization research is often limited by the size of the dataset. In the case of insufficient data volume, the communication time of the algorithm is often higher than the real calculation time, and the real effect cannot be verified. Therefore, a large dataset generator is designed to provide benchmark datasets for parallel big data processing and analysis algorithms running on supercomputers. Firstly, a parallel random number generator is constructed using MPI parallel programming technology. On this basis, artificial datasets with controllable scale and complexity are implemented which mainly includes classification and clustering datasets, regression datasets, manifold Learning datasets, factorization datasets, etc. Besides, the I/O system of the large dataset generator is designed. The system provides interfaces for MPI-I/O parallel read and write datasets. It also sets the distribution and mapping rules of the dataset between different processes and realizes the data access between different nodes through point-to-point communication. Experimental results show that the parallel large dataset generator effectively improves the efficiency and scale of data generation, and provides high-quality, large-scale test datasets for big data processing and analysis algorithms.
-
Implementation of cryptographic instructions for general purpose processors
- CHEN Zi-yu, HE Jun, GUO Xiang-yu
-
2022, 44(07):
1162-1170.
doi:
-
Abstract
(
117 )
PDF (932KB)
(
138
)
-
This paper introduces the international mainstream cryptographic algorithms AES and SHA, and summarizes the development status of cryptographic algorithm instructions in the current mainstream general processor architecture. In order to improve the performance of general purpose processor in the field of cryptographic security, an extended instruction set of AES and SHA cryptographic algorithm for general purpose processor is designed, and an instruction execution unit of AES and SHA cryptographic algorithm that can be fully pipeline execute are realized, evaluated and optimized. The operating frequency of the instruction execution unit can reach 2.0 GHz, the total area is 17 644 μm2, and the total power consumption is 59.62 mW. Compared with the software implementation with the gene- ral instructions, the minimum speedup for AES cryptographic algorithm is 8.90 times, the minimum speedup for SHA cryptographic algorithm is 4.47 times and its speedup can reach 19.30 times in the case of fully pipelined execution of instructions, which significantly improves the performance of the processor in executing AES and SHA cryptographic algorithms, It is expected to be applied to general purpose processors and further enhance the competitiveness of general purpose processor chips in the field of cryptographic security applications. In addition, the cryptographic algorithm instruction execution unit can also be encapsulated into an IP specially used to support cryptographic algorithm and applied to a special chip in the field of cryptographic security.
-
Error tracing and location technology in multi-processor cache coherence verification
- LI Hui, JU Peng-jin, JI Yong-xing
-
2022, 44(07):
1171-1180.
doi:
-
Abstract
(
104 )
PDF (1495KB)
(
102
)
-
Taking the verification of a domestic multi-processor system as an example, based on the Transaction Based Verification (TBV) technology, this paper proposes and implements an automatic error tracking and positioning technology that can be applied to simulation verification. Through the transaction-level modeling of the processors specific functional process, various related request responses, memory access addresses and data streams in the verification environment, the transaction-level information library generated by the verification environment is recorded and generated. Based on the above information, the algorithm realizes the automatic tracking and positioning of the errors, which significantly shortens the error positioning time and improves the debugging efficiency of the simulation verification in the multi-processor system. Based on TBV, verifiers can describe the coverage points of Cache consistency with complex processes at a higher level than the design components. This transaction-level coverage description can compensate the deficiency that the original code coverage and functional coverage are limited to the module and component level, and is useful for comprehensiveness and sufficiency of verification.
-
Cluster job runtime prediction based on NR-Transformer
- CHEN Feng-xian
-
2022, 44(07):
1181-1190.
doi:
-
Abstract
(
122 )
PDF (1140KB)
(
139
)
-
Job scheduling of high-performance clusters is usually implemented by the job scheduling system. Filling in the job running time accurately can greatly improve the efficiency of job scheduling. Existing research usually uses machine learning for prediction, and the prediction accuracy and practicality can be further improved. In order to further improve the accuracy of cluster job running time prediction, cluster job logs are firstly clustered, and job category information is added to job features. Secondly, the job log data is modeled and predicted using the attention-based NR-Transformer network. In data processing, according to the correlation with the prediction target, the integrity of the feature and the validity of the data, 7-dimensional features are selected from the historical log dataset, the dataset is divided into multiple job sets according to the length of the job running time, and then each job set is trained and predicted separately. The experimental results show that, compared with traditional machine learning and BP neural network, its timing neural network structure has better prediction performance, and NR-Transformer has better performance on each job set.
-
A lightweight collective communication library for distributed deep learning
- WANG Xiao-yu, DONG De-zun
-
2022, 44(07):
1191-1198.
doi:
-
Abstract
(
142 )
PDF (924KB)
(
130
)
-
Collective communication operations are widely used in distributed training, especially AllReduce operations are used to synchronize model parameters on each node. In order to obtain higher accuracy, the scale of datasets and neural network models is getting larger and larger, and the communication overhead between nodes accounts for a large proportion in the training process and becomes a bottleneck for accelerating training. There have been many optimizations for collective operations in this scene, such as communication scheduling and gradient quantization, but they typically focus on the rational employ instead of the operations themselves. Actually, there are mismatches between the collective operations and distributed training applications. For example, the latter does not require all nodes to synchronize gradients simultaneously while the former does. This makes researches on collective communication in distributed training necessary. However, we found that current communication frameworks in distributed training are inappropriate, because of their complex architecture and vast codes. To overcome this difficulty, a lightweight collective communication library is designed and implemented for analyzing and improving the collective operations in distributed training conveniently. It supports the mainstream frameworks, and comes with a clean architecture. This makes researchers to implement custom communication operations efficiently, and these operations can be applied to mainstream experimental environments for wider impact. Our collective communication library is evaluated by pure collective operations and distributed deep learning applications respectively in various network situations. The experiments show that the library can achieve similar performance to the MPI, and can be used as an collective communication library for analyzing and researching gradient synchronization in distributed training.
-
An automatic calculation system of arbitrary waveform THD for small signal amplification circuit
- GUI Dan, YU Zong-jie
-
2022, 44(07):
1199-1206.
doi:
-
Abstract
(
108 )
PDF (2008KB)
(
122
)
-
Total Harmonic Distortion (THD) is an important index to measure the nonlinear degree of channel. The automatic measurement device applied in arbitrary output waveform identification and total harmonic distortion of the audio small signal amplification system has become an effective means to detect the signal amplification performance. In order to accurately calculate the total harmonic distortion of the amplified waveform, FFT technique is adopted. In order to improve the system efficiency, high frequency sampling and half extraction techniques are adopted. Dynamic control relays, resistance-capacitance components and multi-stage amplification are used to reasonably adjust the position of the static operating point of the amplification system and the size of the input waveform to realize the output of different types of distorted waveforms. Based on microcontroller stm32f103c6, the combination switch completes the collection, identification, calculation and display of arbitrary waveform in one-button, greatly improving the working efficiency of small signal amplification systems. Matlab algorithm simulation, Multisim circuit simulation and practical test show that, the automatic calculation device of arbitrary waveform THD for small signal amplification systems consisting of multi-stage transistor amplification consistency adjustment circuit and microcontroller FFT operations can accurately and real-time identify waveforms and calculate THD. The one-button switching design is convenient to operate, saves cost, and has practical value.
Computer Network and Znformation Security
-
A network traffic classification method based on clustering and noise
- PANG Xing-long, ZHU Guo-sheng, YANG Shao-long, LI Xiu-yuan
-
2022, 44(07):
1207-1215.
doi:
-
Abstract
(
98 )
PDF (1303KB)
(
120
)
-
Because the real network traffic data inevitably cause wrong labeling in label labeling, the label data are inevitably polluted by noise, that is, the observed label of the sample is different from the real label. In order to reduce the negative impact of noise labels on the classification accuracy of the classifiers, this experiment considers two cases of wrong labeling: wrong labeling of correct label type and wrong spelling of label type. A network traffic classification method based on label noise correction is proposed. The method uses clustering and weight division to evaluate and repair the observation samples, and experiments are carried out on two network traffic datasets. The experimental results show that, compared with the three tag noise repair algorithms STC, CC and ADE, the proposed repair algorithm has a certain improvement on the final classification results under the interference of different noise proportions. On the NSL-KDD data set, the average tag correction rates are increased by 23.00%, 7.58% and 2.05% respectively; Similarly, on the MOORE data set, the average correction rates of tags are increased by 35.12%, 10.40% and 4.71% respectively. The proposal has good classification stability in the final classification model.
-
An AoI minimization scheme of UAV-assisted wireless sensor networks
- ZHAO Yu-hua, JIA Xiang-dong, HU Hai-xia, JING Le-tian
-
2022, 44(07):
1216-1222.
doi:
-
Abstract
(
126 )
PDF (952KB)
(
149
)
-
Aiming at the problem of information freshness in the process of massive data processing in wireless sensor networks, based on the constraints of UAV flight speed, altitude, collision avoidance, and reliable transmission, this paper uses Age of Information (AoI) as the assessment parameter to propose a non-convex optimization strategy for AoI minimization that combines collection point selection, trajectory optimization, and UAV working time trade-off. Taking multiple UAVs' transmitting energy to multiple sensor nodes and collecting sensor data under the same frequency band as a scenario, the information collection process of multiple UAVs in three-dimensional space is simulated and verified. Through the Successive Convex Approximation (SCA) optimization algorithm, the established non-convex problem is transformed into a convex optimization problem for solution. Finally, the optimal collection point, optimal flight strategy, allocation trade-off index of energy delivery time and transmission time during the UAV flight are obtained. The experimental results show that the optimal solution obtained by the scheme can effectively minimize the system AoI.
-
Simulation of overseas anti-terrorism transportation and delivery based on multi-agent
- DONG Peng, SHI Huai-bin, SHI Bo-yuan, ZHANG Qi-xiao
-
2022, 44(07):
1223-1231.
doi:
-
Abstract
(
77 )
PDF (2026KB)
(
95
)
-
Aiming at the problems of high-cost, untimely and insufficient overseas material transportation and delivery, in order to effectively plan the overseas anti-terrorism offshore supply scheme, this paper explores the efficient security mode of transportation and delivery, and establishes an overseas delivery scenario for anti-terrorism operations of bases in a certain area. By closely combining the characteristics of overseas transportation and delivery, Anylogic multi-agent is used to carry out the simulation is carried out, and two sets of overseas delivery schemes are compared. The results show that the model has good applicability for overseas material transportation and delivery, which can reduce the delivery cost, and also has certain reference significance for the exploration of overseas delivery modes.
-
A privacy protection scheme for parking services based on fog computing
- KOU Bang-yan, CAO Su-zhen, Lv Jia
-
2022, 44(07):
1232-1238.
doi:
-
Abstract
(
100 )
PDF (711KB)
(
110
)
-
Parking services provide drivers with the convenience of finding free parking spaces, but most existing parking solutions upload parking requests to cloud servers for processing and analysis, resulting in communication delays and user privacy leakage. In response to the above problems, a privacy protection scheme for parking services based on fog computing is proposed. The scheme realizes the matching of parking spaces between drivers and parking space owners through fog nodes, and uses the elliptic curve-based signature technology to achieve identity authentication. Bloom filters and fuzzy extractors are used to are used to query free parking spaces without exposing user private information. A signed electronic wallet is used to realize the payment function. In addition, security analysis and simulation show that this scheme not only satisfies security and privacy, but also has low computational cost.
-
A 3D point cloud feature learning network based on feature channel and spatial position attentions
- WU Yi-qi, HAN Fang, ZHANG De-jun, HE Fa-zhi, CHEN Yi-lin
-
2022, 44(07):
1239-1246.
doi:
-
Abstract
(
182 )
PDF (704KB)
(
143
)
-
The classification and part segmentation of point cloud models are the basic tasks of 3D point cloud data processing, and the core is to obtain point cloud features that can effectively represent 3D models. This paper proposes a 3D point cloud feature learning network that introduces attention mechanisms. The network adopts a hierarchical point cloud feature extraction method. In the process of hierarchical feature extraction, the feature channel attention mechanism is adopted to obtain the correlation among channels, and the key channel information is enhanced. The spatial position attention mechanism is adopted to obtain the attention weight of each point based on the spatial information of the points. The enhanced point cloud feature is obtained by combining two or more attention mechanisms. Based on this feature, multi-level feature extraction is performed to obtain the final point cloud features for downstream tasks. Shape classification and part segmentation experiments are performed on ModelNet40 and ShapeNet datasets, respectively. The experimental results show that the proposed method can achieve high-precision and robust 3D point cloud shape classification and segmentation.
-
A space-aware adversarial neural network for 3D reconstruction of point cloud
- LU Lin-peng, GUAN Bo-liang, LIN Shu-jin
-
2022, 44(07):
1247-1255.
doi:
-
Abstract
(
97 )
PDF (1567KB)
(
163
)
-
In order to overcome the quality problem of reconstructed meshes caused by the restrictive factors such as sparse and uneven density or wrong normal vector of original point cloud data, a three-dimensional reconstruction method of point cloud based on Countermeasure network is proposed by using the characteristics of weight sharing in adversarial neural network and adversarial training process.Firstly, the predictor is used to predict the offset of the edge of the mesh model, so as to obtain the displacement of each vertex, and perform the vertex redirection of the topology preservation to obtain a new mesh. Secondly, the point cloud classifier in the discriminator is used to extract the high-dimensional features of the original point cloud data and the sampled point set of the mesh surface, and the spatial perception discrimination based on the high-dimensional features is performed to distinguish the original point cloud from the samples point set data. Finally, the output data of the predictor and the discriminator are correlated by the adversarial training method, and the network model is optimized through multiple iterations to obtain a three-dimensional mesh model that meets the spatial characteristics of the point cloud. Experiments are carried out on different point cloud datasets, and the results are displayed by using MeshLab software. The results show that the method can reconstruct the 3D mesh model that meets the spatial information of point cloud, and overcome the mesh quality problems caused by poor point cloud data.
-
Color image copy-move forgery detection based on region division and quaternion
- WEI Wei-yi, WANG Wan-ru, ZHAO Yi-fan, CHEN Guo
-
2022, 44(07):
1256-1264.
doi:
-
Abstract
(
84 )
PDF (1442KB)
(
95
)
-
Aiming at the problem that the insufficient extraction of feature points in existing forgery detection methods leads to low accuracy of forgery detection and poor recognition rate of feature points descriptor, a color image copy-move forgery detection algorithm based on color moment region division and quaternion Hu moment is proposed. Firstly, an adaptive morphological reconstruction algorithm is adopted to perform superpixel segmentation on the image, and then a density clustering algorithm is used to adaptively divide the image into regions. Secondly, a key point extraction method is proposed to obtain uniform SIFT feature points, and then a local Gaussian pyramid is constructed in a novel color image quaternion representation method to extract the Hu moment features. Finally, after matching features using the 2NN, the paper proposes to locate the copy-move forgery region by the Delaunay triangle algorithm. Experimental results on public datasets show that this method can effectively locate the forgery region.
-
Object detection research based on lightweight neural network
- HUANG Zhi-qiang, LI Jun, ZHANG Shi-yi
-
2022, 44(07):
1265-1272.
doi:
-
Abstract
(
86 )
PDF (1042KB)
(
156
)
-
Due to the huge amount of parameters of the YOLOv4 neural network with CSPDarknet53 as the backbone, the detection accuracy and speed will be reduced when it is transplanted to small devices such as mobile phones. In order to improve the detection speed and control the detection accuracy within a reasonable range, this paper proposes to change the original 53-layer neural network to a 15-layer one, and optimizes its clustering algorithm. The K-means++ clustering algorithm is introduced to analyze the data set to generate an anchor box that satisfies the detection conditions. LeakyReLU activation function with a certain slope in the negative interval is used to replace the Sigmoid activation function with vanishing gradients, thereby enhancing the learning ability of the shallow network. At the same time, considering that the center distance and the aspect ratio between the Bounding Box and the Anchor Box have a certain correlation, The corresponding penalty term is added to the original loss function to generate the LCIoU loss function, so that the loss function has a better directionality of the gradient drop during back propagation. Experimental results show that the improved CSPDarknet15 neural network in the VOC2007 data set has an average detection accuracy of 83.94%, and the detection time of a picture is 3 625 ms. Compared with the CSPDarknet53 neural network, the detection speed is increased by 54.43%, which can meet the speed and accuracy requirements of real-time detection of small devices.
Artificial Intelligence and Data Mining
-
Robot path planning based on an improved A* algorithm and an improved dynamic window method
- GUO Yuan-yuan, YUAN Jie, ZHAO Ke-gang
-
2022, 44(07):
1273-1281.
doi:
-
Abstract
(
214 )
PDF (1518KB)
(
177
)
-
Aiming at the efficiency of path planning of mobile robots in complex environments (includ- ing static and dynamic environments), a hybrid algorithm combining an improved A* algorithm and an improved dynamic window method is proposed. Aiming at the problem of insufficient security of the traditional A* algorithm, the obstacle avoidance strategy is adopted to optimize the selection of nodes to increase the safety of the path. Aiming at the problem of many turning points, the recursive dichotomy optimization strategy is adopted to remove redundant nodes and reduce the number of turns. Aiming at the problem of insufficient path smoothness in a static environment, the dynamic inscribed circle smoothing strategy is used to optimize the polyline angle to a radian angle to increase the smoothness of the path. In the traditional dynamic window method, when there are obstacles near the target point, the planning effect is not good and it is easy to fall into the local optimum in the concave groove obstacle. The distance deviation and trajectory deviation are introduced into the original evaluation function. Finally, the proposed improved A* algorithm and hybrid algorithm are simulated and compared with other algorithms in static and dynamic environments respectively. The results show that, compared with the traditional hybrid algorithm, the proposal reduces the path length and running time in the temporary obstacle environment by 13.2% and 65.8%, respectively, and reduces the path length and running time in the mobile obstacle environment by 13.9% and 44.9%, respectively. The proposed algorithm improves the efficiency of path planning in complex environments.
-
The upper bound of phase transition point of multi literal satisfiability problem
- LU Lei, WANG Xiao-feng, LIANG Chen, ZHANG Jiu-long
-
2022, 44(07):
1282-1290.
doi:
-
Abstract
(
107 )
PDF (664KB)
(
90
)
-
The satisfiability problem (SAT) refers to whether there is a set of boolean variable assignments that make at least one literal in each clause of the conjunctive normal form (CNF) formula true. The multi literal SAT problem refers to whether there is a set of boolean variable assignments that make at least two literals in each clause of the CNF formula true. Obviously, this problem is still an NP difficult problem. Defining as the ratio of the number of clauses in the CNF formula to the number of variables, we study the upper bound α* of phase transition point of the problem. If α strictly exceeds α*, the multi literal satisfaction problem is almost certainly unsatisfiable. A simple inference of the first moment can be used to prove α*=-ln 2/ln1-(k+1)/2k. α*=1 when k=3. The local maximum technique proposed by Kirousis et al is used to improve the upper bound α* of phase transition point of the multi literal 3-SAT problem to 0.719 3. Finally, a large amount of data is selected for experimental verification, and the results show that the theoretical results are consistent with the experimental results.
-
A fairness-aware ridesharing pricing and matching algorithm
- LIN Yan-jia, WU Ji-gang, WU Jia-xin, CHEN Long
-
2022, 44(07):
1291-1301.
doi:
-
Abstract
(
85 )
PDF (1326KB)
(
96
)
-
In a ridesharing scenario, multiple passengers with similar itineraries and time schedules take one vehicle to reduce travel cost, increase vehicle occupancy and ease traffic congestion. However, the existing studies ignore the effect of inequitable charging standard and malicious bidding behavior on passengers ridesharing Quality of Experience (QoE). Considering the constraints of cost, vehicle capacity and detour distance , this paper proposes the problem of maximizing the fairness of matching results, and models the process of ridesharing pricing and matching as a two-stage Stackelberg game. Aiming at the above problems, a request division algorithm based on K-means++ is proposed to narrow the match- ing range and improve the matching efficiency. On the premise of satisfying all participant constraints, an iterative algorithm DPMA based on two-stage Stackelberg game is designed and its convergence is proved theoretically. Simulation experiments are carried out on the New York taxi dataset, and the convergence of the algorithm DPMA is verified under different parameter settings. Compared with two existing algorithms, DPMA improves the fairness index by 34.03% and 24.42%, respectively, while ensuring the drivers benefits. The experimental results show that the designed mechanism can effectively avoid malicious bidding among drivers and improve the fairness of ridesharing matching.
-
A highly robust gait recognition method based on CSI
- HAO Zhan-jun, QIAO Zhi-qiang, DANG Xiao-chao, DUAN Yu
-
2022, 44(07):
1302-1312.
doi:
-
Abstract
(
103 )
PDF (1848KB)
(
135
)
-
Aiming at the problems of high computational cost, high equipment cost, and low robustness of existing indoor personnel gait recognition methods, this paper proposes a highly robust indoor personnel gait recognition method based on Channel State Information (CSI), called WiKown. This method uses the fast Fourier transform to set an energy indicator to monitor the occurrence of walking behavior. The collected CSI gait data are filtered and de-noised. Then, the characteristic values are extracted in the form of the sliding window and the observation sequence is established. Finally, Gaussian Mixture Model (GMM) is used to superimpose and fit the sequence, and then Hidden Markov Model (HMM) is used to calculate the probability of observation sequence to generate the gait parameter model. The method can effectively identify the gait information. In the real multi-person environment of corridors, laboratories, and halls, the average recognition rate of the WiKown method for a single persons gait can reach 92.71%. The experimental results show that, compared with the decision tree, dynamic time structuring, and long-duration memory network methods, this method can effectively identify gait information and improve the recognition accuracy and robustness.
-
A text similarity calculation method based on multiple related information interaction
- YUAN Ye, LIAO Wei
-
2022, 44(07):
1313-1320.
doi:
-
Abstract
(
111 )
PDF (827KB)
(
118
)
-
Text similarity calculation is one of core tasks in natural language processing. Traditional text similarity calculation methods only consider structured features or semantic of the text. The lack of in-depth analysis of multiple text features leads to low performance. Therefore, a text similarity calcu- lation method based on multiple related information interaction is proposed, which adds cosine correlation characteristics to the text embedding matrix. The self-attention mechanism is used to consider the text relevance of itself and word dependence. Further, the alternate co-attention machanism is used to extract the se-mantic interaction information between texts, so deeper and richer text representations from different perspectives indicate obtained. The experimental results on two datasets show that the F1 values of the proposed method are 0.916 1 and 0.769 5 respectively, and indicate the proposed method outperforms the benchmark method.