Computer Engineering & Science

GNU Octave performance optimization based on Just-In-Time compilation

MO Shu-heng, LU Sheng-you, HUANG Dan, LU Yu-tong

2022, 44(12): 2091-2101. doi:

Abstract ( 274 )

PDF (1126KB) ( 332 ) 　　

GNU Octave is a numerical computing software that is free, open source, and almost fully compatible with MATLAB. However, its experimental LLVM-based Just-In-Time (JIT) compiler built into Octave only supports the JIT compilation of a small number of codes, and cannot effectively overcome the performance inefficiency of Octave. This paper explores the performance optimization of Octave based on its JIT compiler. From the perspective of working principle, the whole working principle of the JIT compiler and its type inference system are analyzed. From the perspective of current work status, the application scope and performance improvement of the JIT compiler for Octave code are eva- luated. Aiming at the built-in function calls, indexing operations, and arithmetic-logic operations, feature repairment and function enhancement are carried out to improve Octaves performance. The experimental results show that the optimization scheme based on the JIT compiler not only effectively expands the application scope of the JIT compiler, but also brings 56x-283x performance improvement for the Octave code execution. In addition, 16 types of defects in the JIT compiler, which are summarized to practical significance on further optimizing the performance of Octave.

A CUDA-based data-parallel processing method in industrial blockchain

CHEN Qiang, TAN Lin, WANG Yun-li, XIAO Jing

2022, 44(12): 2102-2110. doi:

Abstract ( 159 )

PDF (1179KB) ( 256 ) 　　

The industrial blockchain data transaction system can enable parties to conduct transactions safely without the presence of an intermediary, simplifying the transaction process and reducing transaction costs. For storing industrial large-scale data to blockchain, the current general method is to store the hash value of the source data as metadata to the blockchain, and the source data itself is stored locally or in the cloud. On the one hand, the traditional hash value calculation method is difficult to meet the needs of efficiency for storing industrial large-scale data to blockchain. On the other hand, since only the metadata is stored on the blockchain, the data demander cannot confirm whether the source data itself is complete during the transaction. Therefore, this paper designs a data-parallel processing method based on CUDA, which speeds up the calculation of large-scale industrial data hash value through reasonable data chunking, thread layout and other methods, and improves the efficiency of storing industrial large-scale data to blockchain. Moreover, based on this method, a two-party data integrity verification model is constructed. The data demander can effectively verify the integrity of the source data according to the proof information without obtaining the source data during the transaction, resulting in the reduction of communication cost due to the transmission of useless data. For the large scale industrial data, the proposed calculation method can increase the hash value calculation efficiency by at least 22%. In addition, the security analysis shows that in the special case the data owner holds the signature private key, the data demander can verify the integrity of the source data during the transaction.

Optimization of loop unrolling based on instruction Cache and register pressure

WANG Cui-xia, HAN Lin, LIU Hao-hao

2022, 44(12): 2111-2119. doi:

Abstract ( 271 )

PDF (797KB) ( 404 ) 　　

Loop unrolling is a common compiler optimization technique, which can effectively reduce loop overhead, improve instruction-level parallelism and register locality, and improve the execution efficiency of loop. However, excessive loop unrolling will cause instruction Cache overflow and increase register pressure, and too little loop unrolling will waste potential performance improvement opportunities. Therefore, finding an appropriate unroll factor is the core of the study of loop unrolling. Based on the open-source compiler GCC, the loop unrolling problems are deeply analyzed and studied. In view of the influence of instruction Cache and register resources on the loop unrolling, a loop unrolling factor calculation method based on instruction Cache and register pressure is proposed and implemented in GCC compiler. Experiments on Sunway and Hygon platforms show that, compared with the current loop unrolling factor calculation method in GCC, this method can obtain more effective unrolling factor and improve the program performance. The average performance of the SPEC CPU 2006 is increased by 2.7% and 3.1%, respectively, and NPB-3.3.1 is increased by 5.4% and 6.1%.

A dynamic self-reconfigurable implementation method of HEVC intra prediction algorithm

CUI Xin-yue, JIANG Lin, YANG Kun, HUI Chao, HU Chuan-zhan, ZHAO Jing

2022, 44(12): 2120-2127. doi:

Abstract ( 107 )

PDF (1355KB) ( 254 ) 　　

The implementation of intra-frame prediction algorithm in High Efficiency Video Coding (HEVC) on dedicated hardware cannot meet the requirements of flexible switching between various application scenarios such as HD and mobile video, resulting in poor coding performance and low utilization of hardware resources. To solve this problem, a new implementation method of intra-frame prediction algorithm on the reconfigurable array processor is proposed. The method is based on the state monitoring mechanism. When the idle processing unit is detected, a new execution task is delivered, and the flexible switching between different mapping schemes is realized according to the execution state of the processing unit, so as to achieve the dynamic self-reconstruction of the algorithm execution process. The experimental results show that, compared with the implementation of the intra prediction algorithm on the dedicated processor, the hardware resources are reduced by 33.6% and the number of clock cycles is reduced by 16.2%. Compared with the test results of HM16.7 official software, the average image quality is improved.

A parallel Monte Carlo tree search algorithm for multi-agent game

GUAN Yan-xia, LIU Xun-yun, LIU Yun-tao, Xie Min, XU Xin-hai

2022, 44(12): 2128-2133. doi:

Abstract ( 325 )

PDF (789KB) ( 556 ) 　　

Monte Carlo tree search algorithm is a commonly used reinforcement learning algorithm, and the exponential growth of the dynamic space of the algorithm in the game process has become a factor that restricts the improvement of the algorithm learning efficiency. Based on the parallel approach to optimize the Monte Carlo tree search algorithm, a parallel Monte Carlo tree search algorithm based on the transfer of winning rate estimate is proposed. The improved parallel game search strategy framework consists of one main process and several sub-processes, in which the sub-processes are used for exploration, and the main process makes decisions according to the winning rate estimate data transmitted by the sub-processes. Combined with the multi-agent game platform Pommerman for experimental validation, the parallel Monte Carlo tree search algorithm can enhance the resource utilization rate, game-winning rate, and decision-making efficiency over the traditional Monte Carlo tree search algorithm.

Extended codes of several classes of linear codes and their applications in secret sharing

2022, 44(12): 2134-2139. doi:

Abstract ( 129 )

PDF (388KB) ( 243 ) 　　

Minimal linear codes are widely used in constructing secret sharing schemes with safe and efficient access structures. The extended codes of several classes of linear codes are studied, and their parameters and weight distributions are calculated. It is proved that all the extended codes are minimal ones and thus can be used to construct secret sharing schemes. Besides, some optimal or almost optimal codes are given.

Forgery attack on the authenticated encryption algorithm Pyjamask

HE Shui-yu, WEI Yue-chuan, PAN Feng, CHANG Li-peng

2022, 44(12): 2140-2145. doi:

Abstract ( 153 )

PDF (519KB) ( 246 ) 　　

Pyjamask algorithm is one of the campaign algorithms shortlisted in the second round of LWC competition. This algorithm has the advantages of simple structure, light weight and high efficiency, and good parallel computing ability of nonlinear components, which has attracted the attention of many cryptographers. At present, there are relatively few researches on the security of this algorithm, and a new round of security evaluation is urgently needed. Based on the characteristics of Pyjamasks structure and adjustment parameters, this paper proposes a method for forging plaintexts, which can accurately forge authentication tags. Theoretical analysis shows that the success probability is 1 with negligible data complexity and time complexity when selecting a set of plaintexts, and the success probability is also 1 with high selected data requirements when selecting s+1 sets of plaintexts.

Pairing-free identity-based public key encryption with equality test

DING Bin-bin, CAO Su-zhen, DING Xiao-hui, DOU Feng-ge, MA Jia-jia

2022, 44(12): 2146-2152. doi:

Abstract ( 128 )

PDF (609KB) ( 269 ) 　　

Public key encryption with equality test can determine whether the plaintexts of ciphertexts encrypted with different public keys in the cloud server are identical. Most of the public key encryption schemes with equality test based on traditional PKI system are implemented by bilinear pairs, whose computation is cumbersome and inefficient. At the same time, with the increase in the number of users, the generation, application, issuance and revocation of a large number of certificates become more and more onerous, which brings challenges to the maintenance and sustainable work of the system. To address this problem, this paper proposes a pairing-free identity-based public key encryption with equality test. The scheme determines a straight line through the two points constructed from the plaintext information, and uses the straight line to implement the encryption, decryption, authorization and equality test processes, which gets rid of the restriction of bilinear pairs and improves the computational efficiency. Meanwhile, the scheme is constructed based on the identity cryptosystem, which solves the complex certificate management problem in the traditional PKI system. Under the CDH and DDH assumptions in the random oracle model, the proposed scheme is capable to achieve the security of OW-ID-CCA and IND-ID-CCA.

A fingerprint recognition algorithm based on improved Stacking ensemble learning

SU Fu, LUO Hai-bo

2022, 44(12): 2153-2161. doi:

Abstract ( 178 )

PDF (972KB) ( 293 ) 　　

Aiming at the problem that the generalization ability of traditional convolutional neural network for multi-sensor fingerprint recognition is reduced and the accuracy is not high, an improved Stacking algorithm is proposed. Firstly, AlexNet is improved by introducing depth-separable convolution to reduce the number of parameters and speed up the training. The spatial pyramid pool is introduced to improve the ability of the network to obtain global information. Batch normalization is introduced to speed up network convergence and improve accuracy of the network on the test set. Global average pooling is used instead of fully connected layer to prevent overfitting. Then DenseNet and the improved AlexNet convolutional neural networks are used as the base learner of Stacking to classify fingerprints and obtain the prediction results. Finally, each model trained with the same base learner is weighted according to the prediction accuracy, and the prediction results are then classified by the meta-classifier. The improved Stacking algorithm is tested on multi-sensor fingerprint database, and the final recognition accuracy is 98.43%, which is 20.05% higher than AlexNet and 4.25% higher than DenseNet.

Empirical analysis of the impact of failure detection rate on software reliability

SUN Zhi-chao, ZHANG Ce, JIANG Wen-qian, LIU Kai-wei, FAN Miao-miao, LI Wen-yu, WEN Ya-fei

2022, 44(12): 2162-2173. doi:

Abstract ( 204 )

PDF (974KB) ( 317 ) 　　

Abstract:The fault detection rate is one of the main parameters of the software reliability model. Different forms of fault detection rates have different functions. This paper focuses on the impact of fault detection rate on software reliability and proposes two empirical analysis solutions based on information entropy and superi-or-inferior distance decision algorithm: one solution is single-reliability-growth-model single-failure-data-set multi-fault-detection-rate plan and the other is multiple-reliability-growth-model multi-failure-data-set multi-fault-detection-rate. This paper aims to comprehensively analyze the power effect of fault detection rate. According to the experimental analysis, for a single reliability model and a single data set, the impact of failure detection rate on software reliability is mainly related to the failure data set. The performance of different fault detection rates under different data sets is quite different. Under the multiple-data-sets mul-tiple-software-reliability-models situation, the comprehensive performance of the software reliability model corresponding to the power function and the S-type fault detection rate is better, and the comprehensive performance of the software reliability model corresponding to the exponential fault detection rate is poor. The research has a certain guiding role in the selection of parameters in software reliability modeling and the determination of the optimal release time.

Survey on fuzzy testing technologies

NIU Sheng-jie, LI Peng, ZHANG Yu-jie,

2022, 44(12): 2173-2186. doi:

Abstract ( 920 )

PDF (884KB) ( 1302 ) 　　

As people pay more and more attention to software system security issues, fuzzy testing, as a security testing technology for security vulnerability detection, has become more and more widely used and more and more important due to its high degree of automation and low false alarm rate. After continuous improvement in recent years, fuzzy testing has achieved many achievements in both technical development and application innovation. Firstly, we briefly explain the related concepts and basic theories of fuzzing, summarize the application of fuzzy testing in various fields, and analyze the corresponding fuzzy testing solutions according to the needs of vulnerability mining in different fields. Then ,we focus on the important development results of fuzzy testing in recent years, including the improvement and innovation of testing tools, frameworks, systems, and methods. We also analyze and summarize the innovative methods and theories adopted by each development results, as well as the advantages and disadvantages of each tools and systems. Finally, from the perspectives of protocol reverse engineering application, cloud platform construction, emerging technology integration, fuzzy testing countermeasure technology research, and fuzzing tool integration, we provide direction reference for the further research of fuzzy testing.

A lightweight software fault localization method based on statement complexity

HE Hai-jiang

2022, 44(12): 2187-2195. doi:

Abstract ( 136 )

PDF (518KB) ( 272 ) 　　

In the program debugging process, software fault localization (SBFL) technology based on the program spectrum can provide effective help. In order to improve the performance of SBFL, a software fault localization method based on learning to rank is proposed, which combines program spectrum and static attributes of code statements. The optimal fault localization model is learned by the linear ranking support vector machine. The static attributes of code statements include the number of program entities such as local variables, class attributes, logical operators, and method invocations. On 22 actual fault projects developed in C, C++ and Java languages, the fault localization model was trained on the form of cross engineering. Experimental results confirm that the proposed method reduces the worst strategy EXAM by 37.1% and the average strategy EXAM by 22.6% compared with the optimal SBFL. Three types of lightweight features of program statements are also compared: structured categories, variable spectrum and static attributes. The time complexity of the proposed method is low, and it can recommend a sequence of sentences that may fail in real time.

Real-time flame detection with improved YOLO v4-tiny

WANG Guan-bo, ZHAO Yi-fan, LI Bo, YANG Jun-dong, DING Hong-wei

2022, 44(12): 2196-2205. doi:

Abstract ( 191 )

PDF (1300KB) ( 444 ) 　　

In order to solve the problems of large number of parameters for real-time flame detection and high requirements for hardware computing power, a lightweight real-time flame detection model based on improved YOLO v4-tiny is proposed. Firstly, the parameters of the model are pruned. Secondly, by adding improved Receptive Field Blocks (CSP-RFBs) in the shallow layer of the model, the perceptual field of the model shallow layer is improved. Thirdly, the framework of CSP-ResNet is improved, and the “hourglass CSP-ResNet” with faster and higher accuracy is proposed. Finally, a modified Spatial Pyramid Pooling (SPP) is adopted at the deep level of the model to further fuse the multiple sensory fields. The experimental results show that the accuracy of the improved YOLO v4-tiny model can reach 48.5%, which is 15.5% better than the original model. The number of parameters of the model and the weight size of the weight file are 2.45BFLOPs and 16.3Mb, which are 63.9% and 30.6% less than the original model, respectively. The FPS on the mobile development board NVIDIA Jeston Xavier can reach 49.6, which is 21.9% better than the original model.

An improved industrial defect data augmentation method based on pix2pix

LUO Yue-tong, DUAN Chang, JIANG Pei-feng, ZHUO Bo

2022, 44(12): 2206-2212. doi:

Abstract ( 441 )

PDF (583KB) ( 327 ) 　　

The object detection method based on deep learning is widely used in industrial inspection. In order to solve the problem of insufficient industrial defect data, an improved defect data augmentation algorithm based on pip2pix is proposed. Starting from the enhancement of the generator and discriminators attention to the defect area in the image, the following improvements have been made to pix2pix:(1)Only the defect area of the entire image is used as the input of the discriminator to enhance the generators attention to the defect area. At the same time, the discriminator uses a smaller convolution kernel to extract the characteristics of the defect area. (2)Only the average generation confrontation loss of all defect regions in the image is used as the generation confrontation loss of the image, so that the network pays more attention to the defects regional feature learning. The experimental results on the industrial LED defect dataset show that the defects generated by the proposed method have more realistic visual effects, lower FID, and effectively improve the accuracy of defect detection based on the RetinaNet algorithm.

Signature spatial improved temporal graph convolutional network

ZHAO Yi

2022, 44(12): 2213-2219. doi:

Abstract ( 238 )

PDF (681KB) ( 237 ) 　　

Aiming at the problem that the joint adjacency graph of GCN in Spatial Temporal Graph Convolutional Network (ST-GCN) is not easy to learn the semantic information between distal joints and that TCN is insufficient in describing time information, the digital signature preprocessing is introduced to enhance data, and a signature spatial improved temporal graph convolutional network (SSIT-GCN) is proposed. Firstly, the time series of human joint locations are input into the signature layer to preprocess the data, and they are transformed into a multi-dimensional path by various embedding algorithms. The multi-dimensional path is divided into multiple paths and the signature features of each path are calculated. Secondly, the adjacency matrix of GCN is redesigned, and deconvolution is used to replace zero padding to keep the size of TCN unchanged, and a 1×1 convolution kernel is also introduced to increase the nonlinearity to improve ST-GCN, so as to obtain spatial improved temporal graph convolutional network(SIT-GCN). Finally, the original data is replaced by signature features that is input into SIT-GCN to obtain the final result. The experimental results show that the signature-based SSIT-GCN greatly improves the training accuracy, reduces the training time, and has better recognition ability and speed for dynamic gesture recognition.

A coastline edge detection network based on deep learning

LI Zhong-rui, CUI Bin-ge, YANG Guang, ZHANG Hao-qing

2022, 44(12): 2220-2229. doi:

Abstract ( 362 )

PDF (1669KB) ( 474 ) 　　

The dynamic monitoring of coastline is of great significance to the planning and management of coastal zone. Due to the complex sea and land environment, the spectral characteristics of the sea and land boundary in remote sensing images are not obvious, which leads to inaccurate positioning of the extracted coastline. This paper proposes a deep convolutional neural network (EWNet) combining semantic segmentation network and edge detection network. The network contains two branch streams. The semantic segmentation stream is responsible for extracting hierarchical semantic information and is used to guide the edge detection stream to obtain coastline semantic information. The edge detection stream uses the semantic segmentation stream to refine the edge semantic information. Experimental results on GF-1 remote sensing images show that, compared with several latest models, EWNet obtains more accurate coastline boundary extraction results.

Unsupervised domain-adapted machine translation based on improving the quality of pseudo-parallel sentence pairs

XIAO Ni-ni, JIN Chang, DUAN Xiang-yu

2022, 44(12): 2230-2237. doi:

Abstract ( 244 )

PDF (597KB) ( 292 ) 　　

The good performance of neural machine translation system depends on a large amount of in-domain bilingual parallel data. Domain adaptation is a good solution when the specific domain data is sparse or non-existent. Unsupervised domain adaptation strategies fine-tune the pre-trained translation models by generating pseudo-parallel corpus. However, existing methods do not consider the semantic and emotional characteristics of the languages sufficiently, resulting in a lot of errors and noises in the target domain translation, which affects the cross-domain performance of the model. To alleviate this problem, this paper improves the quality of pseudo-parallel sentence pairs by combining model and data, so as to improve the adaptive ability of the model domain. Firstly, a more reasonable pre-training strategy is proposed to learn more general textual representations of out-domain data, in order to enhance the generalization capability of the model and improve the accuracy of the generated in-domain pseudo- corpus. Then, sentence sentiment features are combined to do posteriori filtering, in order to improve the quality of pseudo-parallel corpus. The experimental results show that, compared with the strong baseline system with back-translation, this method increases the BLEU value by 1.25 and 1.38 respectively in the Chinese-English and English-Chinese translation experiments, thus effectively improving the translation performance.

A simplex-guided sparrow search algorithm based on improved search mechanism

LIU Cheng-han, HE Qing

2022, 44(12): 2238-2245. doi:

Abstract ( 144 )

PDF (669KB) ( 266 ) 　　

In order to improve the problems of low convergence accuracy, slow speed and easy to fall into local minimum when the basic sparrow search algorithm deals with optimization problems, this paper proposes a simplex-guided sparrow search algorithm with improved search mechanism. Firstly, to solve the problem that the randomness of the finder search process is too high, the finder search mechanism is improved to improve the convergence speed and accuracy of the algorithm. Secondly, the sparrow search algorithm's reconnaissance mechanism is improved to improve the ability of the algorithm to jump out of the local minimum. Finally, the related operation of the simplex method is used for some individuals with poor fitness in each iteration to improve the searching ability of the algorithm. Performance comparison on eight benchmark test functions and some CEC2014 test functions and Wilcoxon rank sum test analysis verify the robustness of the improved algorithm.

A speech emotion recognition method using mixed distributed attention mechanism and hybrid neural network

CHEN Qiao-hong, YU Ze-yuan, JIA Yu-bo

2022, 44(12): 2246-2254. doi:

Abstract ( 159 )

PDF (698KB) ( 338 ) 　　

Aiming at the problem that there are many irrelevant features and low accuracy in the existing speech emotion recognition, a speech emotion recognition method based on mixed distributed attention mechanism and hybrid neural network is proposed. The method is in two channels, and the convolutional neural network and bidirectional short and long-time memory network are used to extract the spatial and temporal features of speech respectively, Then, the outputs of the two networks are used as the input matrix of the multi-head attention mechanism. At the same time, considering the low-rank distribution problem of the existing multi-head attention mechanism, the attention mechanism calculation method is improved. The low rank distribution and the similarity of the output characteristics of the two neural networks are superimposed by mixed distribution. After the normalization operation, all the subspace results are stitched together. Finally, the output is classified through the full connection layer. The experimental results show that, the speech emotion recognition method based on mixed distributed attention mechanism and hybrid neural network has higher accuracy than other existing models, verify- ing the validity of the proposed method.

A multi-objective fireworks algorithm with two-archive coevolution for low-carbon cold chain distribution optimization of vaccines

SHEN Xiao-ning, YOU Xuan, CHEN Qing-zhou, PAN Hong-li, HUANG Yao

2022, 44(12): 2255-2265. doi:

Abstract ( 141 )

PDF (967KB) ( 292 ) 　　

A constrained multi-objective optimization model for the low-carbon-cold chain distribution of vaccines is established to minimize the corporate transportation costs including the cost of carbon emissions and customer dissatisfaction, satisfying the constraints of the number of available vehicles, vehicle capacity and time window. A discrete two-archive-based multi-objective fireworks algorithm is proposed. The decoding method that can meet the constraints of the number of available vehicles and vehicle capacity is adopted. The partial mapping explosion operator is designed. Feasible solution archive and infeasible solution archive are set for coevolution. Feasibility search is performed on the infeasible solution archive. Experimental results show that, compared with the existing algorithms, the proposed algorithm can effectively obtain a group of Pareto non-dominated solutions with better convergence and distribution.

Current Issue

Author center

Review center

Online journal