High Performance Computing
-
Implementation and optimization of image processing algorithms based on ARMv8 CPUs
- WEI Cun-yang, JIA Hai-peng, ZHANG Yun-quan, QU Guo-yuan, WEI Da-zhou, ZHANG Guang-ting
-
2022, 44(10):
1711-1720.
doi:
-
Abstract
(
208 )
PDF (1585KB)
(
202
)
-
Color space conversion, image scaling, and image filtering are all common algorithms in the field of image processing, which are widely used in digital media, data communication, biomedicine, aerospace and other fields. Although there is an open source OpenCV library for the above algorithms on ARM processors, it lacks a high-performance image processing library with the same precision as Intel IPP library. Therefore, according to the computational memory access characteristics of the algorithms, the above algorithms are divided into three categories: data-independent algorithms, data sharing algorithms, and irregular memory access algorithms. An optimization method system for different types of algorithms on the ARMv8 computing platform is proposed, and finally a high-performance image processing algorithm library based on the ARMv8 computing platform is developed. Its accuracy is comparable to the Intel IPP library. In terms of performance, the application of a series of optimization methods such as algorithm optimization, memory access optimization, SIMD optimization, and assembly instruction optimization has greatly improved the image processing algorithm performance. The experimental results show that on the Huawei Kunpeng 920 computing platform, the performance of the CvtColor, Filter, and Resize modules are significantly improved compared to the OpenCV algorithm library.
-
Optimizations of mesh renumbering for unstructured finite-volume computational fluid dynamics
- ZHANG Yong , ZHANG Xi , WAN Yun-bo , HE Xian-yao , ZHAO Zhong , LU Yu-tong
-
2022, 44(10):
1721-1729.
doi:
-
Abstract
(
181 )
PDF (1305KB)
(
157
)
-
Mesh renumbering or reordering is one of the important means to improve the CPU and GPU parallel computing efficiency of Computational Fluid Dynamics (CFD). For unstructured meshes, due to the irregular data storage, indirect data access will lead to large memory access delays. Especially in GPU parallel computing, indirect data access will cause non-aligned memory access, amplifying the impact of memory access latency. In this regard, the Reverse Cuthill-Mckee mesh reordering method is used to optimize the data locality of unstructured meshes, and a face renumbering method is designed. The example test shows that the mesh reordering does not affect the final calculation result. The impact of mesh reordering on the performance of unstructured solvers on CPU and GPU is compared and analyzed. For CPU computing, the running time of some hotspot functions can be reduced by about 20%, and the overall running time can be reduced by 15%~20%. For GPU computing, the running time of most hotspot functions is reduced by 35%~60%, and the overall running time of the program is reduced by about 40%.
-
An online game user churn prediction method based on Spark platform
- HU Yan-fang, XIONG Wen, GAO Wei
-
2022, 44(10):
1730-1737.
doi:
-
Abstract
(
127 )
PDF (1022KB)
(
157
)
-
With the widespread popularity of the mobile Internet, the domestic online game market has become increasingly saturated. The cost of acquiring new users for game companies continues to increase. How to prevent the loss of existing users has become the focus of marketing. This paper predicts user churn based on a real game log data. First, user features are extracted and computed from log data. Second, a set of important features is selected by weight. Finally, a binary classification model is constructed with features as input and churn as output. 6 common algorithms such as random forest, support vector machine, multi-layer perceptron, gradient boosting decision tree, and logistic regression are comprehensively compared. The experimental results show that the random forest algorithm performs the best, and its model prediction accuracy reaches 91%.
-
Optimization of median filtering algorithm based on ARM architecture
- MU Ming-ren, JIA Hai-peng, ZHANG Yun-quan, DENG Ming-sen, QU Guo-yuan, WEI Da-zhou, ZHANG Guang-ting
-
2022, 44(10):
1738-1746.
doi:
-
Abstract
(
135 )
PDF (1072KB)
(
130
)
-
Median filtering is an effective method to reduce salt and pepper noise in image processing. Its core is to calculate the median of all pixels in the current filtering window. Median filtering is stable. When the pixels of an image are changed, the calculation results of median filtering will not be affected even if the changed value is large. After the filtering window traverses the whole image, the median filtering calculation of the whole image is completed. The key of the median filtering algorithm is to define the optimal median algorithm, which can obtain the median in the shortest time. In this regard, an adaptive median algorithm is proposed and implemented, which can automatically select the median algorithm with the best performance according to the filtering window radius and data type, and use ARM NEON instruction set for optimization and acceleration. Experimental results show that the proposed adaptive median filtering algorithm significantly outperforms OpenCV, and the average performance is improved by 20%.
-
Quick customization for RISC-V processor based on FPGA
- LU Song, JIANG Ju-ping, REN Hui-feng
-
2022, 44(10):
1747-1752.
doi:
-
Abstract
(
159 )
PDF (1625KB)
(
159
)
-
With the rising of the open instruction set RISC-V, a number of open source and commercial soft cores have emerged, which are used in different fields such as IoT hardware, embedded systems, artificial intelligence chips, security devices, and high-performance computers. How to better balance between performance, power consumption, and chip area requires that the instruction set can be easily tailored, extended, and supported by the software development environment. To this end, this paper proposes a quick customization method for RISC-V processor, through adding custom instructions, extending ALU functional units, connecting control signals and data paths, FPGA prototype verification, customizing the cross compiler and application testing. Taking the matrix calculation acceleration as an example, a customized instruction for the vector inner production is designed on the open source IP Hummingbird E203, finishes the prototype verification on FPGA. The matrix calculation benchmark shows that the performance of the customized RISC-V processor has been significantly improved. For matrix multiplication, the performance speedup reaches 5.3~7.6.
-
Job failure prediction based on user behavior on supercomputers
- TANG Yang-kun, XIAN Gang, YANG Wen-xiang, YU Jie, ZHANG Xiao-rong, WANG Yao-bin
-
2022, 44(10):
1753-1761.
doi:
-
Abstract
(
117 )
PDF (1001KB)
(
114
)
-
The scale of supercomputers is expanding. Meanwhile, the complexity of scientific applications is also increasing, which leads to many job failures on supercomputers. These failed jobs causes a waste of resources and prolong the waiting time of queuing jobs, which seriously affects the reliability of the system. If these failed jobs can be predicted in advance, necessary measures can be taken to improve the system resource utilization and system execution efficiency, which is very important for the future exascale supercomputers. Therefore, this paper attempts to predict these job failures from the known traditional features and construction features, and find the features and processing methods that can reflect the users work behavior patterns and submission behavior patterns. By combining behavior features and traditional features, a comprehensive framework based on tree structure model is proposed to predict job failure. The prediction experimental results show that the comprehensive prediction framework is better than the single model prediction, and the comparative experimental results show that the prediction effect is better than other related methods.
Computer Network and Znformation Security
-
AFP-based link prediction of directed weighted attention flow network
- MA Man-fu, JIANG Lu-juan, LI Yong, ZHANG Qiang, FAN Yan-jun, DENG Xiao-fei
-
2022, 44(10):
1762-1770.
doi:
-
Abstract
(
90 )
PDF (882KB)
(
118
)
-
Personalized recommendation systems are widely used in reducing information overload, providing personalized services, and assisting users in decision-making. Link prediction is one of the important methods of personalized recommendation. Traditional heuristic link prediction methods only consider the graph structure characteristics of the network, and lack the application of explicit and implicit feature information, and most methods are based on undirected and unweighted networks. Aiming at the shortcomings of traditional link prediction methods, this paper proposes a link prediction method AFP based on the collective attention flow network and the R-GCN method. The different edge directions between the two nodes in the attention flow network are abstracted into two types of edge relations. The attention mechanism is introduced to learn the node attributes and edge attributes in the network, and the network's graph structure characteristics, implicit characteristics and explicit features are comprehensively considered. The scoring function is used to get the probability of the establishment of the triple, and the link prediction problem is transformed into a two-category problem, thus predicting the possibility that the edges between nodes belong to a certain type of relationship. Experiments show that, compared with 6 benchmark models such as GCN and GAT, this method improves the accuracy, precision, recall and other evaluation indicators.
-
Trusted data circulation between alliances based on supervision of blockchain
- DING Yan, WANG Chuang, FENG Liao-liao, WANG Feng, CHANG Jun-sheng
-
2022, 44(10):
1771-1780.
doi:
-
Abstract
(
166 )
PDF (1450KB)
(
262
)
-
With the promotion and application of big data technology in the open network environment, more and more institutional cooperation alliances have emerged. The sharing and circulation of data has become an important resource sharing method in the alliance. Especially in the joint-work scenarios such as banks and medical care. Only by sharing the data held by the various institutions, alliances can form comprehensive and clear business views and effectively improve the efficiency of their service and production. However, in the process of sharing data between organizations, how to control and trace the flow of data circulation between the stakeholders with different interests has become a must-be-resolved problem to support efficient and credible data circulation. Therefore, this paper proposes a trusted data circulation system based on the supervision of blockchain, which guarantees that the circulation information of data get traced and cannot be tampered with, thereby promoting the establishment of a secure and trustworthy ecosystem of data sharing and circulation. On this basis, in response to the complex situation of low transmission efficiency and network instability in a large-scale network environment, a trusted agent is introduced into the system as the data service interface, and then the security and performance are promoted. Finally, a prototype system based on Hyperledger Fabric is implemented. The experimental results verify that the system has good scalability in terms of data transmission and user request responses.
-
A distributed privacy-preserving data mining framework based on rational cryptography
- CHENG Xiao-gang, GUO Ren, ZHOU Chang-li,
-
2022, 44(10):
1781-1787.
doi:
-
Abstract
(
118 )
PDF (531KB)
(
136
)
-
Privacy protection is an important issue in data mining. Adding noise to the data can protect the privacy to some extent. However, the accuracy of the result is reduced due to the noisy mask. This paper proposes an efficient distributed privacy preserving framework based on rational cryptography. In the framework, it is assumed that each party is rational, rather than malicious or honest, which is the usual setting in cryptography. Based on this assumption, we show that many data mining functions can be realized efficiently in a distributed way with a semi-honest third party.
-
A traceable hierarchical attribute-based encryption scheme with hidden access policy
- TANG Guang-zhen, CHEN Zhuo
-
2022, 44(10):
1788-1794.
doi:
-
Abstract
(
86 )
PDF (634KB)
(
118
)
-
In traditional attribute-based encryption schemes, users may share the private key to multiple users with the same attribute without fear of being blamed. In addition, the information contained in the access policy may disclose the user's privacy. To solve these problems, this paper proposes a traceable hierarchical attribute-based encryption scheme with hidden access policy. The scheme is constructed based on the access tree under the combined order bilinear group, and has flexible expression ability. The random elements of the combined order subgroup are inserted into the access policy to realize the policy concealment. The user ID is added into the private key operation to realize the traceability of the illegal users. The hierarchical authorization system is used to reduce the computational load of single authority authorization, and improve the security and efficiency. The experimental results and efficiency comparison show that this scheme has advantages in the computational time cost of encryption and decryption, and supports the hiding of access policies and the traceability of users who violate the rules, thus greatly improving the security of this scheme.
-
A fast paper edge detection method based on cross-layer feature fusion
- XU Kun, ZHAO Qi-wen, XU Yuan, LIU You-quan
-
2022, 44(10):
1795-1803.
doi:
-
Abstract
(
165 )
PDF (1535KB)
(
143
)
-
Combined with the real-time and robust requirements of paper detection in the common paper-pen interaction, a fast paper detection method based on edge detection is proposed. In the edge detection stage, a fast paper edge detection method based on cross-layer feature fusion is advanced. The linear bottleneck inverted residual blocks and efficient channel B-ECA blocks are added to the HED backbone, which greatly reduce the numbers of parameters and increase the weight of salient features. The features of all stages and all layers are fused in order to retain the more edge features. The high-level features are upsampled and cross-layer fused with the low-level features to solve the problem of edge blur. Training and testing on the self-made MPDS data set shows that, compared with the original HED method, the proposal increases the ODS and OIS by 8.1% and 6.6% respectively, and improves the detection speed from 22.08 FPs to 39.02 FPS. In the paper extraction stage, a paper extraction method based on the paper structure is proposed. After thinning the paper edge based on non-maximum suppression, detection and filtering the line, and extraction paper vertex based on structural constraints, the image containing only paper is extracted. The experimental results show that the paper extraction method can quickly and accurately extract the entire paper image in various complex desktop environments and occlusion situations, which provides an interaction basis for the common paper-pen interaction method.
-
Research on trajectory tracking control of wheeled mobile robot
- ZHANG Xiao-jun, LIU Hao-xue
-
2022, 44(10):
1804-1811.
doi:
-
Abstract
(
118 )
PDF (1009KB)
(
199
)
-
Aiming at the problems of parameter perturbation and internal and external disturbances of wheeled mobile robots, a novel sliding mode control algorithm based on adaptive extended state observer is proposed. An adaptive virtual velocity controller is used to estimate unknown parameters of the system, a sliding mode controller is used to suppress parameter perturbations and internal and external disturbances, and a nonlinear extended state observer is used to observe system perturbations and reduce the chattering of the control inputs, so the trajectory tracking error is rapidly converging. Lyapunov theory is adopted to prove the stable convergence of the control system. The proposed algorithm is compared with the traditional adaptive inversion sliding mode algorithm, and the results show the effectiveness and robustness of the proposed control strategy.
-
Mask wearing detection and recognition based on the improved YOLOv3
- REN Xiao-kang, LIU Xing-xing
-
2022, 44(10):
1812-1821.
doi:
-
Abstract
(
216 )
PDF (1744KB)
(
252
)
-
The COVID-19 epidemic is still rampant around the world. Wearing masks can effectively block the spread of novel coronavirus, while mask wearing detection can timely remind people in public places to wear masks. To solve the problem and the difficulty of small scale target detection, an improved network model Face_mask Net based on the YOLOv3 algorithm is proposed for mask wearing detection. Because the network model trained by the YOLOv3 algorithm has a low detection rate of small targets,the same IoU value cannot reflect whether the prediction frame and the target frame intersect, and the traditional NMS often produces false suppression for occlusion, the algorithm in this paper improves the residual block and neural network structure, introduces SPP module and CSPNet network module, and adopt DIoU as the loss function and DIoU-NMS as the classifier. The experimental results show that Face_mask Net can effectively improve the target detection accuracy, and the average accuracy of AP75 is 58.05%, which is 4.11 percentage points higher than that of the network model trained by the Yolov3 algorithm.
-
Pavement crack detection based on curvature filters and N-P criterion
- WANG Mo-chuan, HE Li, HU Cheng-xue, TAO Jian, ZHANG De-jin
-
2022, 44(10):
1822-1831.
doi:
-
Abstract
(
98 )
PDF (2023KB)
(
101
)
-
In order to solve the problem of discontinuous detection of asphalt pavement crack, a crack detection method based on curvature filters and N-P criterion is proposed. Combined with the minimal rectangular and trigonometric tangent planes, an improved curvature filter based on the regular energy function is modified to eliminate the random noise and smooth the texture. The suspected crack targets are extracted by adopting a coarse-to-fine segmentation methodology and the geometric characteristics are applied to remove noises like blocks or spots to locate the crack and obtain the crack segments. On this basis, merged with the information of position and direction of the crack segments, the N-P criterion-based method is adopted to connect the endpoints of the crack segments and obtain the complete crack data. The results show that the proposed algorithm can effectively detect the cracks with high detection precision, such as transverse cracks, longitudinal cracks, block cracks and turtle cracks. The integrity of crack detection is more than 90.5%.
Artificial Intelligence and Data Mining
-
Slotting optimization of low-level manual picking warehouse
- LUO Man-ling, LIN Hai, LIU Wei
-
2022, 44(10):
1832-1843.
doi:
-
Abstract
(
94 )
PDF (1018KB)
(
143
)
-
Warehousing costs account for a large proportion of the total cost of modern logistics. Reasonably slotting optimization is the core of improving the efficiency of warehouse picking and reducing costs. By analyzing the outbound process of low-level manual picking warehouses and considering in-fluencing factors such as the degree of product hot-selling, the relationship between the products, and the location of shelf, a slotting optimization algorithm based on the community division algorithm is designed. The algorithm firstly builds undirected weighted networks based on the product relevance, and then uses a community division algorithm to divide it multiple times; Secondly, it is stored on the shelf in the community as a unit, and the shelf is filled through the adjustment phase. Finally the optimal product placement is selected from multiple placements based on evaluation indicators. The evaluation index is established based on the three optimization goals of shortening the walking path, alleviating congestion and reducing the number of pickers. Experiments show that the proposed slotting optimization algorithm has significant advantages compared with other comparative solutions in terms of time consumption and the quality of the product placement.
-
An improved sparrow search optimization algorithm and its application
- YIN De-xin, ZHANG Da-min, CAI Peng-chen, QIN Wei-na
-
2022, 44(10):
1844-1851.
doi:
-
Abstract
(
296 )
PDF (638KB)
(
188
)
-
The sparrow search algorithm (SSA) has poor population diversity, falls into the local optimum easily and low solution accuracy of multi-dimensional functions when solving the optimal solution of the objective function. To solve these probems, the improved sparrows search optimization algorithm (ISSA) is proposed. Firstly, the population is initialized with the opposition-based learning strategy to increase the population diversity. Secondly, the step factor is dynamically adjusted to improve the solution accuracy of the algorithm. Finally, Levy strategy is introduced into the sparrow position update formula for reconnaissance and early warning to improve the algorithms ability of global search and jumping out of local extremum. ISSA, SSA and other algorithms are tested and perform rank sum test on 8 test functions to evaluate the solution accuracy, and Wilcoxon rank sum test is carried out. The experimental results show that the ISSA has higher searching performance. Meanwhile, ISSA is applied to the spectrum allocation of cognitive radio, the experimental results show that ISSA has better system benefit and fairness than other algorithms, which verifies the feasibility of ISSA in practice.
-
An improved flocking control for multi-agent system with time-varying communication delay
- KOU Qiao-yuan, YUAN Jie
-
2022, 44(10):
1852-1860.
doi:
-
Abstract
(
103 )
PDF (1482KB)
(
105
)
-
Aiming at the time-varying communication delay and unknown interference problems in uncertain nonlinear second-order multi-agent systems, this paper proposes a robust adaptive flocking control law. In order to make the second-order multi-agent systems have better anti-interference ability, this paper designs a robust adaptive operator based on the statement information of agent's position and speed, and realizes the system under the time-varying communication delay disturbance. By using the Lyapunov-Krasovskii method to construct the energy function, the network connectivity of the multi- agent system is proved, the speed of the agent converges to the speed of the virtual leader, and the convergence condition of the multi-agent system with time-varying communication delay is given. The simulation experiment results show that the multi-agent system can achieve rapid convergence and form a stable topology under different interference intensities and communication delays, which proves that the proposed method is correct and effective.
-
Converting sign language to emotional speech
- WANG Wei-zhe, GUO Wei-tong, YANG Hong-wu,
-
2022, 44(10):
1869-1876.
doi:
-
Abstract
(
101 )
PDF (907KB)
(
135
)
-
In order to solve the problem of communication between speech-impaired people and healthy people, a neural network-based sign language-to-emotional speech conversion method is proposed. Firstly, a gesture corpus, a facial expression corpus, and an emotional speech corpus are established. Then, a deep convolution neural network is used to realize the recognition of gestures and facial expression. Mandarin vowels and consonants are used as synthesis units to train the deep neural network emotional speech acoustic model based on speaker adaptation and the mixed long short-term memory network emotional speech acoustic model based on speaker adaptation. Finally, the context-dependent labels of gesture semantics and the emotion labels corresponding to facial expression are input into the emotional speech synthesis model to synthesize the corresponding emotional speech. The experimental results show that gesture recognition accuracy and the facial expression recognition accuracy are 95.86% and 92.42%, respectively, and the average mean score of the synthesized emotional speech is 4.15. Meanwhile, the synthesized emotional speech has a high degree of emotional expression, which can be used for communication between speech-impaired people and healthy people.