Computer Engineering & Science

State of the art analysis of China HPC 2021

YUAN Guo-xing, ZHANG Yun-quan, YUAN Liang

2021, 43(12): 2091-2097. doi:

Abstract ( 885 )

PDF (976KB) ( 601 ) 　　

In this paper,according to the China HPC TOP100 rank list released in November 2021,the total performance trends of China HPC TOP100 and TOP 10 of 2021 are presented.Followed with this,
characteristics of the performance,manufacturer,and application area are analyzed separately in detail.

A symbolic simulator for agile hardware design

ZOU Hong-ji, LI Tun, LUO Dan, FANG Yu-de

2021, 43(12): 2098-2104. doi:

Abstract ( 549 )

PDF (485KB) ( 387 ) 　　

In the agile hardware design methodology, a domain specific hardware description language is often used for RTL modeling. This novel situation brings new challenges for design verification. In order to support design verification techniques such as (bounded) model checking and equivalence checking, a symbolic simulator is designed and implemented for PyRTL and its intermediate representations. This paper introduces the design principle, conversion rules and other key technologies of our symbol simulator. The experimental results show the correctness of the implemented symbol simulator.

Evaluation of OpenCL computing software stack

ZHU Hao, ZHOU Bo-yang, LU Xue-shan, DU Yi-mo

2021, 43(12): 2105-2114. doi:

Abstract ( 670 )

PDF (1442KB) ( 462 ) 　　

With the development of intelligent computing and big data applications, the demand for accelerators such as GPU is increasing. Computing software stacks such as CUDA and OpenCL software stacks are the key to making full use of GPU hardware performance. Considering the portability and implementation of software stacks on domestic fundamental OS and hardware platforms (such as Phytium CPU and Kylin OS) in future, this paper focuses on open-source OpenCL software stacks. The performance of OpenCL applications on different platforms is tested and analyzed. The performance of GPU computing on different OpenCL software stacks, such as Mesa, ROCm, etc., is evaluated. The impact of drivers and kernels in the software stack on GPU computing is evaluated. The entire test covers the time overhead of various stages of OpenCL calculations such as compilation, data transmission, and kernel execution. The test and evaluation found that it is more urgent and more suitable to use GPU for accelerated computing on domestic platforms. ROCm is an ideal OpenCL open source software stack with better performance and stability, and can be further optimized compared with close-source software stacks.

Overview on ocean mesoscale eddy detection and identification based on machine learning

ZHANG Jia-hao, DENG Ke-feng, NIE Teng-fei, REN Kai-jun, SONG Jun-qiang

2021, 43(12): 2115-2125. doi:

Abstract ( 1136 )

PDF (758KB) ( 930 ) 　　

Ocean mesoscale eddy is an important ocean mesoscale phenomenon that plays an important role in ocean circulation, material and energy transport, and has an important impact on the safety of ship navigation and hydroacoustic communication. Efficient and accurate detection and identification of ocean mesoscale eddies are of great research value for both physical ocean cognition and ocean exploitation. Traditional eddy detection and identification methods rely on a single threshold designed by experts' experiences. With the rise of deep learning, the current machine learning methods show certain advantages in the accuracy and automation of eddy detection and identification. This paper summarizes and comparatively analyzes the existing machine learning-based detection and identification methods to provide a systematic knowledge and reference basis for the development of ocean mesoscale eddy detection and identification research.

A parallel FPGA SAT solver based on incomplete algorithm

LI Tie-jun, MA Ke-fan, ZHANG Jian-min

2021, 43(12): 2126-2130. doi:

Abstract ( 869 )

PDF (561KB) ( 441 ) 　　

The Boolean satisfiability (SAT) problem is the key problem in computer theory and application. his paper proposes a parallel SAT solver based on incomplete algorithm on FPGA. The algorithm uses a multi-threaded strategy to reduce the waiting time of related components and improve the efficiency of the solver. In addition, different threads use data storage structures that share addresses and clause information to reduce the resource overhead of on-chip memory. When all data is stored in the FPGA’s on-chip memory, the solver can achieve the best performance. The experimental results show that, compared with the single-threaded solver, the solver proposed in this paper can achieve a speedup of more than 2 times.

Artificial bee colony algorithm for matrix multiplication problem

ZHUANG He-lin, YANG Huo-gen, XIA Xiao-yun, LIAO Wei-zhi

2021, 43(12): 2131-2138. doi:

Abstract ( 540 )

PDF (567KB) ( 386 ) 　　

As a basic operation in computer science and mathematics, matrix multiplication is widely used in scientific research and engineering calculation. Determining the minimum number of multipliers required for computing the product of two matrices is one of the most important problems that have not been solved in computer algebra. In this paper, the matrix multiplication problem is modeled as a combinational optimization problem, and then the matrix multiplication problem is solved by the artificial bee colony heuristic search algorithm. An improved artificial bee colony algorithm with a circle traversal method is proposed to avoid repeated searches for the same neighborhood of a solution. The effectiveness of the proposed algorithm is verified by numerical experiments on the 2×2 matrix multiplication problem. Experimental results show that the proposed algorithm can quickly find the product method of 2×2 matrix decomposition.

Design and FPGA implementation of YOLOv3-tiny hardware acceleration

CHEN Hao-min, YAO Sen-jing, XI Yu, ZHANG Fan, XIN Wen-cheng, WANG Long-hai, REN Chao

2021, 43(12): 2139-2149. doi:

Abstract ( 1167 )

PDF (1275KB) ( 808 ) 　　

YOLOv3-tiny has the excellent target detection capability, but the computational power required by the model is still large, so it is difficult to be used in the embedded application field. This paper proposes a hardware acceleration method of YOLOv3-tiny and implements it on FPGA platform. Firstly, for the fixed-point design of the network, with data accuracy and resource consumption as design indicators, through the statistics of the data distribution in the model and the division of data types, different fixed-point strategies are determined. Secondly, for the parallel design of the network, through the analysis of the calculation characteristics of the convolutional neural network, with the methods of loop adjustment, loop block, loop expansion, and array splitting, a scalable common hardware comput- ing unit is designed. Then, for the network pipeline design, the research is carried out from two aspects: the inter-layer and the intra-layer. Based on the direction of the inter-layer data flow and the division of tasks within the layer, a flexible pipeline computing architecture is designed. Lastly, on the XILINX XC7Z020CLG400-1 platform, experiments demonstrate that, compared with single-core ARM-A9 processor at 667MHz, the proposal achieves the calculation speed as high as 290.56.

Path selection of multi-hop network secure transmission based on wireless energy harvesting

HUO Yuan-lian, XU Xiao-peng, ZHENG Hai-liang

2021, 43(12): 2150-2156. doi:

Abstract ( 577 )

PDF (839KB) ( 353 ) 　　

Aiming at the multi-hop transmission network with wireless energy harvesting technology, the path selection problem of multi-hop and multi-path transmission in a full-duplex relay network with multiple eavesdroppers and multiple power nodes is proposed. Firstly, the proposed system model performs information transmission in the scenes of multiple eavesdroppers and power nodes, and selects the best path between the source transmitter and receiver as the data transmission path. Secondly, in order to enhance the performance of the system, the relay collect energy from the radio frequency signal sent by the power node and use the obtained energy for data transmission of the next hop. Finally, the accurate expression of the end-to-end outage probability of the proposed method under Rayleigh fading channels is derived. The Monte Carlo simulation fits the theoretical analysis curves well, indicating the correctness of the theoretical derivation and the superiority of the performance of the proposed method.

Research on the construction and characteristics of multi-subgroup hybrid growth model of e-commerce consumer network in the blockchain environment

YAN Yun-hong, QIAN Xiao-dong

2021, 43(12): 2157-2168. doi:

Abstract ( 495 )

PDF (1381KB) ( 416 ) 　　

In order to explore the complex network characteristics of e-commerce consumer behavior in the blockchain environment, based on the study of traditional local area networks and BBV growth networks, aiming at the characteristics of e-commerce consumer behavior in the blockchain environment, two optimization aspects are proposed. Firstly, a reputation-based limited trust consensus mechanism is established. Secondly, a smart contract based on the interactive selection of consumer communities with decentralized characteristics is built. The consensus mechanism and smart contracts are integrated into the multi-subgroup hybrid growth model. Under certain parameters, the characteristics of the complex network structure of the blockchain e-commerce consumer network is studied. Empirical research shows that, even in a decentralized and trustless blockchain e-commerce environment, consumer networks will still show the characteristics of node power law distribution and small world characteristics. In the multi-subgroup mixed growth model, the division of consumer associations is obvious, and the Matthew effect of "the rich get richer" will appear among consumers. However, compared with the traditional online e-commerce consumer network, its association structure is looser, with antitrust characteristics. Experiments show that consumers consumption behavior is more free and transparent in this environment.

A binary vulnerability search method based on multi-granularity semantic analysis#br# #br#

LIU Hao, MA Hui-fang, GONG Nan, YAN Cai-rui

2021, 43(12): 2169-2176. doi:

Abstract ( 564 )

PDF (708KB) ( 618 ) 　　

Similarity detection of binary files aims to judge whether the two binary files from different platforms, compilers, optimized configurations, and even different software versions are highly similar. Binary vulnerability search is one of its applications in the field of information security. The emergence of binary vulnerabilities has brought many problems to modern software applications, such as the vulnerability of operating systems to attacks, and the vulnerability of private information to theft. The main reason that codes are reused in the software development process without strict supervision. Based on this, a binary vulnerability search method Taurus based on multi-granular semantic analysis is proposed. This method uses three granular semantic features to search for potential cross-platform binary vulnerabilities. Given a binary file to be detected and a vulnerability database, it is necessary to search each binary vulnerability in the vulnerability database one by one. Firstly, semantic extraction is performed on two binary files respectively to obtain the semantic features of the two at three granularities of basic block, function and module, and similarity calculation is conducted. Secondly, the similarities of semantic features at the three granularities are integrated to calculate the overall similarity scores of the three files. Finally, the similarity score results of the binary files to be detected and all the vulnerabilities in the vulnerability database are sorted in descending order, and the search result report of the binary file is obtained. Comparative experiments under reasonable configuration show that the proposed Taurus method is better than the baseline method in terms of accuracy.

Hybrid encryption design of LoRa data transmission network

ZHANG Zhi, WEI Jia-xin, WANG Lin

2021, 43(12): 2177-2182. doi:

Abstract ( 679 )

PDF (466KB) ( 416 ) 　　

With the growth of the breadth and depth of LoRa network applications, the demand for information security is gradually increasing, so ensuring the security of data transmission network is a necessary condition for the development and application of LoRa network. The traditional LoRa network has its own encryption mechanism in the process of data transmission, which uses AES-128 encryption algorithm to encrypt data messages. However, in the process of encryption and decryption, the two keys are relatively symmetric, so that their parameters are roughly the same. When any one of the keys is leaked, it is easy to calculate the other key. In view of the existing security problems, a hybrid data encryption scheme is proposed: RSA asymmetric encryption algorithm is introduced into LoRa network, and the key used in the process of encryption and decryption of data by AES encryption algorithm is encrypted by taking advantage of RSA encryption algorithm's high security, so as to reduce the security risks of network data transmission and improve the security. Then, on this basis, AES and RSA encryption algorithm are optimized to ensure the efficiency of data transmission. Finally, the design is tested. The test results show that the proposed method can not only ensure the efficiency of data transmission, but also enhance the security of key against stealing, thus greatly improving the security of data transmission in LoRa network.

Charging path planning of multiple mobile-chargers based on space-time collaboration

YIN Ling, XIE Zhi-jun

2021, 43(12): 2183-2189. doi:

Abstract ( 697 )

PDF (840KB) ( 349 ) 　　

In order to solve the problem of real-time charging of nodes in a large wireless rechargeable sensor network, a network with multiple mobile chargers is studied. Based on the fair division of the network into multiple clusters, a real-time charging algorithm for multiple mobile chargers (STMA) based on space-time cooperation is proposed: the charging path of the mobile charger is planned by jointly considering the spatial location of the node and the charging deadline. The latest charging request is obtained in time during the charging process, and the charging path is adjusted in time according to the urgency of the charging request. The simulation results show that, compared with the algorithm that simply considers the space-time requirements, the STMA algorithm increases the energy utilization rate by about 14% and the node survival rate by about 9%, which is more suitable for the real-time charging requirements of the nodes.

Research progress on deep learning based sketch retrieval

JI Zi-heng, WANG Bin

2021, 43(12): 2190-2205. doi:

Abstract ( 977 )

PDF (1413KB) ( 718 ) 　　

Sketch retrieval (SBIR) is an extension of content-based image retrieval (CBIR), which is a flexible and convenient way to retrieve target images. How to minimize the difference between the sketch domain and the image domain is crucial to SBIR. The traditional methods extract the manual features to achieve the approximate conversion between the sketch field and the image field, so as to reduce the domain difference. However, these methods cannot effectively fit the content of the two domains, resulting in low retrieval accuracy. Deep learning methods break through the limitations of traditional methods, which extract high-dimensional features from a large amount of data and have been proved to effectively solve the cross-domain modeling problems. This paper focuses on deep learning-based sketch retrieval methods, and covers several aspects such as the deep feature extraction model, public dataset, coarse-grained and fine-grained retrieval based on deep learning, deep hashing technology, category generalization, etc. Related works are reviewed and commented on. Then, a comparative experiment is conducted. For one hand, three existing public SBIR datasets such as Sketchy, TU-Berlin and QuickDraw are used for suitability evaluation. For the other hand, three latest SBIR deep learning models such as GRLZS model, SEM-PCYC model and SAKE model are selected for performance analysis and comparison. Finally, current challenges and future research trends of SBIR are summarized.

Overview of human behavior detection methods based on deep learning

LU Wei-zhong, SONG Zheng-wei, WU Hong-jie, CAO Yan, DING Yi-jie, ZHANG Yu

2021, 43(12): 2206-2215. doi:

Abstract ( 913 )

PDF (825KB) ( 768 ) 　　

Behavior detection is a research hotspot in the field of video understanding and computer vision, which attracts the attention of scholars at home and abroad. It has been widely used in many fields such as intelligent surveillance and human-computer interaction. With the development of techno- logy, deep learning has made a great breakthrough in image classification. The application of the recognition methods based on deep learning to human behavior detection has become a hotspot. Therefore, the paper firstly introduces several datasets commonly used in behavior detection, and the research status of deep learning in the field of behavior detection in recent years. Then, the basic process of behavior detection methods and recognition methods based on deep learning are analyzed. Finally, the future development trend and possible shortcomings are analyzed from the aspects of method performance and application prospect.

An image raindrop removal method based on self-attention and multi-scale generative adversarial network

LI Ran, ZHOU Zi-hao, ZHANG Yue-fang, LUO Dong-sheng, DENG Hong-xia

2021, 43(12): 2216-2222. doi:

Abstract ( 518 )

PDF (1031KB) ( 694 ) 　　

In order to remove raindrops from images taken on rainy days, aiming at the issues that the area covered by raindrops is unknown, most of the background information in the raindrop area has been lost, and the image clarity and global information attention are needed to improve, a self-attention layer is added to the self-encoding structure, and a multi-scale discriminator is introduced into the discriminant network. Guided by the attention distribution map, the optimization of the self-attention layer and the evaluation of the multi-scale discriminator, the generating network considers the global information more under the premise of paying attention to the raindrop area. The multi-scale discriminator can better distinguish the gap between the raindrop image and the clear image from coarsely to finely. The experiment completed the comparison between the proposed method and other methods, the self- comparison, and the evaluation with the peak signal-to-noise ratio and structural similarity, which proves that the proposed method is effective and its quality and index values are higher than other methods.

A fault diagnosis model of distributed photovoltaic power stations based on deep residual network

XIE Xiang-ying, LIU Hu, WANG Dong, LENG Biao

2021, 43(12): 2223-2230. doi:

Abstract ( 730 )

PDF (911KB) ( 602 ) 　　

The deployment environment of distributed photovoltaic power plants is relatively complicated, and many kinds of faults inevitably occur during the actual operation. In order to solve the above problem, this paper proposes a fault diagnosis model of distributed photovoltaic power stations based on deep residual network. It analyzes and processes the sequence data of equipment operation, and achieves rapid and accurate judgment of fault categories. This model applies a one-dimensional convolution kernel to perceive the characteristics of time series data. Then, it uses a multi-level convolution structure to increase the diagnostic ability. Finally, the residual network is utilized to solve the problem of gradient disappearance caused by the increase of model depth, and accelerate the training of the deep model. The experimental results based on the power station test data show that the residual network model achieves higher fault diagnosis accuracy than several state-of-the-art intelligent models. The application of this model can not only greatly reduce the investment in fault inspection of photovoltaic power plants, but also improve the efficiency of fault diagnosis of photovoltaic power plants.

A weighted generalized multi-granulation rough set model of multi-source covering information system and its applications

LUO Gong-zhi, CHEN Jia-xin

2021, 43(12): 2231-2237. doi:

Abstract ( 540 )

PDF (403KB) ( 348 ) 　　

Considering the complexity of data in multi-source covering information systems and the inequality between individual information systems, this paper introduces the induced covering rough set and assigns weight values to each single information system’s attribute, and, consequently, proposes a weighted generalized multi-granulation rough set of multi-source covering information systems. Firstly, it defines the calculation method of attribute’s weight. Then, it gives complete upper and lower approximations of the model, and acquires the corresponding decision rule. Finally, it verifies the validity of the model through an example analysis. The experimental results show that the model has a better ability to classify the target set, and its fault tolerance can be further improved with appropriate adjustment of the threshold.

Thai sentence segmentation based on Siamese recurrent neural network

XIAN Yan-tuan, ZHANG Zhi-ju, WANG Hong-bin, WEN Yong-hua,

2021, 43(12): 2238-2242. doi:

Abstract ( 421 )

PDF (416KB) ( 355 ) 　　

Thai rarely use punctuation, and there are no obvious separators between sentences. Sentences need to be segmented by semantics, which brings extra difficulties to natural language processing tasks such as lexical analysis, syntactic analysis and machine translation. This paper proposes a sentence segmentation method based on dual-path neural network. Compared with the traditional Thai sentence segmentation method, this method does not need to define the feature manually, but uses a unified circular neural network to encode the sequence of words before and after the candidate interval. Then, the coding vector of the sequence before and after the sequence is used as the feature to construct the Thai segmentation classification model. Experimental results on the Orchid97 Thai corpus show that the proposed method is superior to the traditional Thai sentence segmentation method.

A density clustering algorithm of boundary point division based on local center measure

ZHANG Mei, CHEN Mei, LI Ming

2021, 43(12): 2243-2252. doi:

Abstract ( 465 )

PDF (2815KB) ( 381 ) 　　

Aiming at the shortcomings of clustering algorithms in detecting arbitrary clusters, such as low recognition accuracy, large number of iterations, and poor detection effect, this paper proposes a density clustering algorithm of boundary point division based on local center measure (named DBLCM). Under the limitation of the local center measure, data points are divided into core areas or boundary areas. The points of core region are grouped together to form initial clusters according to the allocation mode of the priority of mutual nearest neighbors, and the points in the boundary region are allocated according to the clusters of the nearest points among its mutual nearest neighbors to obtain the final cluster structure. To verify the algorithm effectiveness, DBLCM is compared with three classic algorithms and three outstanding algorithms newly proposed in recent years on the two-dimensional datasets con
-
taining arbitrary shapes and arbitrary densities as well as the multi-dimensional datasets with arbitrary dimensions. In addition, to verify the sensitivity of the parameter k in the DBLCM algorithm, a correlation test is conducted between the k value and the cluster quality on different types of datasets. The experimental results show that the DBLCM algorithm has the advantages such as high recognition accuracy, good ability to detect arbitrary clusters, and no iteration, and it has better comprehensive performance than the six comparison algorithms.

Gait planning of quadruped robots based on an improved ant colony algorithm

HU Ping-zhi, LI Ze-tao

2021, 43(12): 2253-2262. doi:

Abstract ( 498 )

PDF (1166KB) ( 410 ) 　　

quadruped robots have many joints and complex motion modes. Gait planning is the basis of the motion control of the quadruped robot. Traditional algorithms are mostly based on the principle of bionics and lack wide adaptability. Based on the establishment of kinematics equations, this paper proposes a gait planning algorithm based on an improved ant colony algorithm. The algorithm takes advantage of the linear independence of four legs of the robots, and classifies the gait planning problem as a problem of how to plan the longest path in a four-dimension space. After simulation, the algorithm obtains all gaits meeting the constraints. Finally, the robot prototype was tested to verify the validity and rationality of the results obtained by the algorithm.

Current Issue

Author center

Review center

Online journal