Computer Engineering & Science

Deflection routing based bi-ring network on chip

QI Xing-yun, DAI Yi, LAI Ming-che, CHANG Jun-sheng, DONG De-zun

2021, 43(03): 381-388. doi:

Abstract ( 236 )

PDF (620KB) ( 332 ) 　　

In order to simplify the complexity of network on chip with medium scale and improve the efficiency of the networks, a bi-ring network on chip based on deflection routing is proposed. Its collision resolving mechanism is studied, and a simple and efficient routing algorithm is given. The bi-ring network on chip is implemented by Hardware Description Language (HDL) and its cycle-accurate simulation is constructed. The experimental results show that the bi-ring network has the similar performance with the YARC on-chip network with 100% throughput, but has far less hardware cost.

A CNN accelerator based on 3D scalable PE array

SU Zi-pei, YANG Xin, CHEN Di-hu, SU Tao

2021, 43(03): 389-397. doi:

Abstract ( 451 )

PDF (1122KB) ( 351 ) 　　

Convolutional neural networks have the characteristics of large parameters and large amount of calculation. When specifically applied to mobile devices, it is necessary to reduce the area of the chip as much as possible under the premise of the frame rate (speed). Considering the compatibility performance, area and other factors of the current mobile terminal network, a CNN accelerator based on a 3D scalable PE array is designed. The accelerator is compatible with 3×3 convolution, 3×3 deep separable convolution, 1×1 convolution, and fully connected layer, and its PE array can set the optimal parallelism parameters in three dimensions according to the network and hardware constraints of the specific application to achieve more excellent performance. The proposed CNN accelerator runs yolo-v2 on 512 PEs to achieve 76.52 GOPS (74.72% performance efficiency), and runs mobile-net-v1 on 512 PEs to achieve 78.05 GOPS (76.22% performance efficiency). The CNN accelerator is used to build up a real-time target detection system on ZC706 FPGA board. Running yolo-lite on the board shows that the CNN performance can achieve a frame rate of 53.65 fps.

Analyzing SAR imaging application feature and discussing hardware design space

KONG Xi-chang, WEN Mei, LAN Qiang

2021, 43(03): 398-406. doi:

Abstract ( 234 )

PDF (779KB) ( 274 ) 　　

SAR (Synthetic Aperture Radar), or synthetic aperture radar, is an active earth observation system. In recent years, SAR has gradually developed towards multi-platform, and has appeared on small mobile platforms such as unmanned aerial vehicles and probe vehicles. SAR imaging is an imaging program running on SAR. Due to the emergence of a new special operating environment, it has stricter requirements for low energy consumption and high computing power. How to provide high- performance, low-power application support for a specific platform has become a core point. This paper analyzes the characteristics of SAR imaging calculation and memory access, and specifically optimizes the program and tests the performance of the program on the x86 platform to obtain a reliable performance reference. On this basis, oriented to the hardware structure of the DSP+FFT accelerator, a mathematical model of computing power ratio is constructed to provide a solution for hardware design.

Identity calibration of E-commerce big data based on long short-term memory network

2021, 43(03): 407-415. doi:

Abstract ( 188 )

PDF (1282KB) ( 245 ) 　　

Due to the variety of products and the lack of uniform writing format, the e-commerce big data under the government procurement platform uses the traditional model to mark the same product with low accuracy, slow speed, low sample utilization rate and insufficient generalization ability. An identity calibration model based on Long Short-Term Memory Network (LSTM) is proposed, which consists of three sub-models in series, such as word segmentation, importance ranking, and similarity calculation. Firstly, the word segmentation sub-model preprocesses the e-commerce big data to obtain a differentiated keyword sequence. Next, the LSTM importance ranking sub-model screens the most important keyword sequences that characterize the product information. Finally, the LSTM similarity calculation sub-model accurately calibrates the same commodity in the given big data. In addition, binary search, GloVe word vectorization, and word sequence semantic verification technology are introduced to improve the calibration speed, training sample utilization rate, and high calibration generalization ability, respectively. The experimental results show that, when dealing with big data of different types of government procurement e-commerce, the accuracy of calibrating the identity of confusing samples is high.

Performance analysis of distributed deep learning communication architecture

ZHANG Li-zhi, RAN Zhe-jiang, LAI Zhi-quan, LIU Feng

2021, 43(03): 416-425. doi:

Abstract ( 375 )

PDF (1381KB) ( 413 ) 　　

In recent years, advances in deep learning technology have pushed artificial intelligence into a new era of development. However, massive training data and large-scale models have brought increasingly serious challenges to deep learning. Distributed deep learning is an effective method to meet this challenge. An efficient synchronization algorithm is the key to ensuring the performance of distributed deep learning. Aiming at the problem of parallel training of traditional distributed deep learning model synchronization algorithms on large-scale nodes, firstly, the principles and performance of two mainstream parameter communication architectures, centralized Parameter Server and decentralized Ring Allreduce, are analyzed. Secondly, a comparative test environment of two distributed training frameworks was constructed based on TensorFlow on the Tianhe high-performance GPU cluster. Finally, using the Parameter Server architecture as the baseline, the comparative performance of Ring Allreduce architecture for training AlexNet and ResNet-50 in a GPU cluster environment was tested. The experimental results show that, with 32 GPUs, the expansion efficiency of Ring Allreduce architecture can reach 97%. Compared with Parameter Server architecture, it increases the distributed computing performance by 30%, which verifies that Ring Allreduce architecture has better scalability.

A genetic-based multi-site collaborative computation offloading algorithm

JI Zi-hao, JIANG Ling-yun

2021, 43(03): 426-434. doi:

Abstract ( 332 )

PDF (786KB) ( 405 ) 　　

Edge computing, extending computing resources and enhancing storage capacity for resource-constrained Internet of Things (IoT) devices, can improve the performance of IoT applications. In an IoT environment, most applications will be deployed at multi-sites in a distributed architecture, and these sites will need to collaborate to finish a service. In order to solve the cost optimization problem of multi-site collaborative computation in the IoT environment, a genetic-based multi-site collaborative computation offloading algorithm (GAMCCO) is proposed. The algorithm models the application into a task relation graph and analyzes the dependencies among tasks. Afterwards, the multi-site collaborative offloading problem is formulated in terms of execution cost, and genetic algorithm is used to find the best offloading scheme. Experimental and evaluation results show that the proposed GAMCCO algorithm can effectively reduce the delay of IoT applications and the energy consumption of terminal devices.

Study on edge offloading mechanism of sensing data in Internet of Things

YUAN Pei-yan, GENG Li-juan, ZHANG Hao

2021, 43(03): 435-441. doi:

Abstract ( 193 )

PDF (673KB) ( 264 ) 　　

The pervasive of Internet of Things and the large amount of data generated by computation- intensive applications such as augmented reality have placed a heavy burden on the current backbone network. In order to improve user experience, it is necessary to migrate the sensing data to the edge server closer to users. Considering the relatively limited resources of the edge server, it is a key problem to determine the appropriate proportion of data offloading in advance. Firstly, this paper expresses the problem as a sequential quadratic program with two key performance indicators: delay and energy consumption. Secondly, sequential quadratic programming algorithm is used to solve the problem, and the optimal workload offloading ratio is obtained under a certain constraint of energy consumption. Finally, through Matlab simulation experiment, it is verified that the proposed scheme has the best delay performance when the data arrival rate is far greater than the edge server processing rate.

An energy-balanced multi-hop multi-path cognitive hierarchical routing algorithm

WANG Jun-xi, CHEN Gui-fen

2021, 43(03): 442-448. doi:

Abstract ( 170 )

PDF (909KB) ( 262 ) 　　

In order to alleviate the current shortage of spectrum resources, the energy consumption balance of the cognitive wireless sensor network is improved, and the energy consumption of the network is reduced. An energy-balanced multi-hop multi-path cognitive hierarchical (EMMCH) protocol is proposed, which is suitable for heterogeneous cognitive wireless sensor networks. Firstly, the cluster head election probability is improved based on the remaining energy of the node, the location of the node, and the density of the neighbor nodes. Secondly, the concept of the competition radius is used to balance the energy consumption of the cluster heads. Then, the optimal cluster head is selected based on the channel availability and the remaining energy The number of cluster heads changes dynamically. Finally, the cluster head node selects the node with high residual energy, close to the convergent node, and an idle channel for multi-hop transmission path planning, and then selects the optimal path in combination with the consumption along the way and the degree of imbalance. Simulation results show that the EMMCH algorithm has a relatively longer life cycle, higher stability, more data transmission volume, and more balanced network energy consumption.

Design and implementation of the core network of space-ground integrated network based on network slicing technology in 5G

GU Ju-juan, ZHANG Ya-sheng, PANG Jin-kun

2021, 43(03): 449-455. doi:

Abstract ( 238 )

PDF (940KB) ( 296 ) 　　

To solve the problems of insufficient flexibility and low resource utilization of the core network of space-ground integrated network, an architecture design scheme using network slicing technol- ogy in 5G is proposed. Then, a prototype system based on the Docker platform is implemented. Three slicing services such as IMS multimedia communication network, Internet of Things information collection network, and web network are loaded on the prototype system for testing. The results show that different network slices can flexibly allocate network resources and independently provide services. The design scheme can effectively improve the networking flexibility of the core network of the space-ground integrated network.

An image encryption algorithm based on Strcmp decomposition and hyperlorenz chaotic system

JIN Xu-wen, LI Guo-dong

2021, 43(03): 456-464. doi:

Abstract ( 191 )

PDF (3043KB) ( 278 ) 　　

Due to the constant initial parameters, the performance of the chaotic system is weak, and the combination of encryption chaotic source sequence is single. Therefore, a Strcmp sequence decomposition method based on mathematical logic inversion is proposed. It combines hyperlorenz chaotic system to set up the entropy feedback mechanism of ciphertext, and resets the chaos generator several times. The decomposed sequence is used to optimize the image block scrambling, and add and extract mode diffusion encryption. Simulation experiment shows that the decomposition sequence has good randomness. The NPCR value of the simulation object is 99.62%, the UACI value is 33.48%, and the entropy value is 7.9994, which was better than other combined source sequence algorithms. Besides, the algorithm also has good encryption effect on special images. Therefore, it can effectively resist the selected plaintext attacks.

Identification of visual imagery based on EEG microstate method

LI Zhao-yang, FU Yun-fa

2021, 43(03): 465-472. doi:

Abstract ( 245 )

PDF (761KB) ( 251 ) 　　

Motor imagery (MI) is a common task in brain computer interaction (BCI), but MI is not easy to acquire and control, and there is a phenomenon of "BCI blindness", which limits the practicality of this type of BCI. This paper aims at the identification of Visual Imagery (VI) tasks that are easier to acquire and control, and aims to build VI-based BCI (VI-BCI). 15 subjects were recruited to participate in two kinds of dynamic picture VI tasks, and their EEG data were collected. Then, the EEG microstate method is used to study the differences in microstate time parameters between the two VI tasks, and the eigenvectors are constructed by microstate time parameters with significant differences. Finally, support vector machine (SVM) is used to classify the two kinds of VI tasks. The results show that the highest, the lowest and the average classification accuracy of microstate are 90%, 56% and 80.6 2.58%, respectively. This study shows that the microstate method can effectively extract VI-related EEG features and obtain comparable accuracy. The work is expected to provide ideas for the construction of a new online VI-BCI.

Image recognition of moldy tobacco leaves based on convolutional neural network

LI Ya-zhao, YUN Li-jun, YE Zhi-xia, WANG Kun, ZHAI Nai-qi

2021, 43(03): 473-479. doi:

Abstract ( 332 )

PDF (597KB) ( 470 ) 　　

In view of the shortcomings such as low efficiency and easy miss detection when manually selecting moldy tobacco leaves online, this paper proposes a method for screening, classifying and recognizing moldy tobacco leaf images based on convolutional neural network model. Firstly, the tobacco leaf data set is built up. Secondly, a convolutional neural network model is built to initially extract features, screen and extract the main features, then summarize the features of each part, and finally realize the image classification, thereby achieving fast and accurate identification of moldy tobacco leaf images and normal tobacco leaf images. The experimental results show that compared with the artificial moldy tobacco leaves selection method and the traditional image classification algorithm of tobacco leaves, the convolution neural network not only has a high recognition accuracy, but also simplifies the complex process of artificial extraction of image features.

A video tracking algorithm based on dual-thread LSTM online update

ZENG Shang-you, JIA Xiao-shuo, LI Wen-hui

2021, 43(03): 480-485. doi:

Abstract ( 183 )

PDF (1199KB) ( 249 ) 　　

Aiming at the problem of inaccurate positioning of the tracking algorithm based on the twin network when tracking and locating objects with obstructions or sudden changes in motion, an online update network video tracking algorithm, TripLT, is designed, a recurrent neural network is used to predict the target position, and a full convolutional neural network is used to determine the similarity of the target. The TripLT algorithm can predict the target position of the next frame to get rid of the influence of occluders, and it uses an online update mechanism to avoid the interference effects of sudden changes in motion. Experiments on the data set VOT and OTB100 show that the TripLT algorithm shows better performance than other algorithms.

A plant leaf classification method based on multi feature fusion and extreme learning machine

2021, 43(03): 486-493. doi:

Abstract ( 181 )

PDF (748KB) ( 289 ) 　　

The classification of plants is mostly realized through the classification of plant leaves. In order to improve the accuracy of plant leaf classification, a plant leaf classification method based on multi feature fusion and extreme learning machine is proposed. Firstly, the color image of plant leaves is preprocessed to get the binary image and gray image in order to remove the color and background of leaves. Secondly, the shape feature and invariant moment feature of plant leaves are extracted from the binary image, and the gray level co-occurrence matrix parameter is extracted from the gray level image as the texture feature of leaves, so a total of 28 dimensional feature vectors are obtained. Finally, the classification strategy of the extreme learning machine is used to train and test the eigenvectors. Experiments on the open plant leaf dataset Flavia show that the accuracy of training classification is more than 99%, and the test accuracy is more than 98%. Experimental results show that this method can effectively improve the accuracy of plant leaf classification.

Research on human joint point positioning algorithm under the scene of self-service shelves

LI Meng-yao, ZHOU Ya-tong, WEI Chuang, LI Min

2021, 43(03): 494-502. doi:

Abstract ( 171 )

PDF (1250KB) ( 285 ) 　　

In the new retail scene, there are a large variety of products on self-service shelves, which are susceptible to external factors such as light. In addition, customers' hands or bodies will block the key information of products when they hold the products. Therefore, only using the image recognition algorithm in the natural scene cannot meet the application requirements of the self-service shelves.
Aim- ing at the characteristics of the actual application scene of self-service shelves, a handheld products re-cognition solution is proposed based on the human joint point positioning algorithm and the image classification algorithm in deep learning. Firstly, the human joint point positioning algorithm is used to accurately locate the joint points of the upper body of the customers. Secondly, the image classification algorithm mainly identifies the images containing the main features of the products, which are centered on the joint points of the arm. In order to improve the practicality of the algorithm, L-CPM and EP-L-CPM are designed to improve the Convolutional Pose Machine (CPM) from the two aspects of the speed and accuracy of joint point positioning. The public dataset and the human posture dataset of actual self- service shelves scene are used to verify the performance of algorithms. Experimental results show that the proposed algorithm can accurately and efficiently locate the human body joints.

A compressed sensing measurement matrix construction algorithm based on generalized variable parameter Fibonacci chaotic system

GUO Yuan, WANG Chong, DU Song-ying

2021, 43(03): 503-510. doi:

Abstract ( 164 )

PDF (2512KB) ( 305 ) 　　

Aiming at the problems that common chaotic mapping has low randomness, sequence elements has strong correlation, and constructing the measurement matrix elements requires interval sampling to satisfy the independence of data statistics, a new composite chaotic system is constructed by cascading quantum Logistic chaotic system and generalized Fibonacci sequence. In terms of information entropy, spatial characteristics and correlation coefficients, different chaotic measurement matrices are quantitatively analyzed to verify that the proposed chaotic system has ergodicity and high chaos. More- over, the sequence elements have low correlation, which satisfies the independence of data statistics. At the same time, it is proved that the compressed sensing measurement matrix constructed by the proposed chaotic system satisfies the RIP condition. Simulation and discussion on one-dimensional sparse signal and two-dimensional image show that, compared with other measurement matrices, when the sampling rate is 1/2, the proposal increases the success rate of reconstructing one-dimensional sparse signals by 4%, and increases the SNR rate of reconstructing two-dimensional images by 0.2dB. It improves the data utilization rate and overcomes the great waste of data resources caused by the interval sampling of other chaotic measurement matrices.

Recommendation based on users’ long- and short-term preference and knowledge graph convolutional network

GU Jun-hua, SHE Shi-yao, FAN Shuai, ZHANG Su-qi

2021, 43(03): 511-517. doi:

Abstract ( 297 )

PDF (552KB) ( 299 ) 　　

The recommendation based on knowledge graph can improve the accuracy, diversity and interpretability of the recommendation. In this paper, a recommendation medel (LSKGCN) based on the convolution network of knowledge graphs and users’ long- and short-term preference is proposed. Based on the recommendation algorithm of knowledge graph, a user representation method combining long-term preference with short-term preference is proposed. According to the time, the recent history items are screened and the vector representation of the historical items is obtained by the convolution network algorithm of the knowledge graph, and the short-term interest expression is obtained by the attention mechanism. The long-term expression of interest is based on the minimum Euclidean distance from all historical items. Finally, real data sets Movielens-20, Amazon Music, Last.FM are used to test the validity of the algorithm.

Hidden Markov model based multi-truth discovery algorithm

WANG Hui-ju, LI Meng-xuan, HUANG Wei-wei, ZHOU Qiu-yi

2021, 43(03): 518-524. doi:

Abstract ( 241 )

PDF (536KB) ( 323 ) 　　

The increase in data size has caused the difficulty of obtaining information. How to get accurate information from a large amount of data is a hot topic. Inspired by the Hidden Markov Model, a multi-truth discovery algorithm (Graph Truth Discovery, GraphTD) based on the graph model is proposed. With the help of the credibility transition matrix described in each data source, the probability that the data value is true is calculated. Meanwhile, an improved method for determining the initial true value is proposed, which can effectively improve the accuracy of GraphTD and avoid many shortcomings in the multi-truth discovery of the voting method. Experimental results on the book author dataset show that GraphTD can effectively improve the recognition accuracy of truth value, and CVote can significantly improve the discovery accuracy of truth value through the optimized selection strategy of initial truth value.

A cost-sensitive imbalanced data classification algorithm based on KPCA-Stacking

CAO Ting-ting, ZHANG Zhong-lin

2021, 43(03): 525-533. doi:

Abstract ( 200 )

PDF (699KB) ( 256 ) 　　

Cost-sensitive learning is an important strategy to solve the problem of imbalanced data classification. The non-linearity of data characteristics also brings some difficulties to classification. In view of this problem, by combining cost-sensitive learning with kernel principal component analysis (KPCA), this paper proposes a cost-sensitive Stacking integration algorithm called KPCA-Stacking.
Firstly, the original data set is over-sampled by the adaptive synthetic sampling method (ADASYN) and KPCA dimensionality reduction is performed; Secondly, KNN, LDA, SVM, and RF are converted into cost-sensitive algorithms according to the Bayesian risk minimization principle as the primary learner in the Stacking integrated learning framework, and logistic regression is used as the meta-learner. Compa- rative experiments on 10 algorithms such as J48 decision tree in 5 public datasets show that the cost- sensitive KPCA-Stacking algorithm improves the recognition rate of a few classes to a certain extent, and is better than the overall classification performance of a single model.

A multi-objective optimization algorithm of switched reluctance generator based on fuzzy logic NSGA-Ⅲ

LI Yi-hui, LIU Zuo-jun, LI Jie

2021, 43(03): 542-550. doi:

Abstract ( 250 )

PDF (705KB) ( 255 ) 　　

A multi-objective optimization method based on fuzzy logic NSGA-Ⅲ is proposed to optimize torque ripple, efficiency of system operations and output power density of switched reluctance ge-nerator (SRG). The multi-objective optimization design model of 1kW four-phase 8/6 pole SRG is built. Then, the SRG optimization objectives models are established by using response surface methodology (RSM). Mamdani fuzzy reasoning system based on fuzzy logic is established, so that intensity value assignment of population individuals is achieved, and the decision maker's preference information is introduced into the algorithm. The optimization direction of the algorithm is guided through the value. Finally, optimal Pareto set of SRG optimization that satisfies the preferences of decision maker is generated based on improved elitist NSGA-III, and the solution with largest S is determined as the optimal solution for SRG multi-objective optimization. Experiments verify that fuzzy logic NSGA-Ⅲ is superior to NSGA-III when considering the preference of decision maker. SRG's finite element simulation results prove the validity and feasibility of the proposed multi-objective optimization design method.

Current Issue

Author center

Review center

Online journal