Computer Engineering & Science

A survey on performance optimization of cloud-of-clouds storage system

ZHU Liang-jie, , SHEN Jia-jie, , ZHOU Yang-fan, , WANG Xin,

2021, 43(05): 761-772. doi:

Abstract ( 341 )

PDF (959KB) ( 346 ) 　　

The cloud-of-clouds storage system provides a cloud storage resource management platform that is widely deployed in different online application scenarios. By encrypting and distributing user data to multiple clouds, the cloud-of-clouds storage system can ensure the safety and reliability of stored data. In order to efficiently manage various resources of the cloud-of-clouds storage system, the cloud-of-clouds storage system has designed different data distribution schemes to meet application requirements. From the perspective of storage performance optimization, this paper summarizes the main application scenarios, system functions and corresponding implementation schemes of current cloud-of-clouds storage. Firstly, this paper introduces the background of the cloud-of-clouds storage system and the current main data distribution schemes. Secondly, this paper compares the current mainstream inter-cloud storage system network transmission and resource management solutions. Among them, it includes the analysis of the main network transmission schemes in the data read and write and repair operations of the cloud-of-clouds storage system and the current cloud-of-clouds storage system’s strategy for client devices and cloud resources. Finally, this paper summarizes the current main application scenarios of cloud-of-clouds storage and corresponding system implementation schemes. On this basis, this paper analyzes the problems to be solved in the current cloud-of-clouds storage system and the challenges it brings, and gives possible system solutions.

OPS based automatic parallelization of a computational fluid dynamics software on multiple platforms

WANG Wei, CHE Yong-gang, XU Chuan-fu, WANG Zheng-hua

2021, 43(05): 773-781. doi:

Abstract ( 231 )

PDF (1038KB) ( 267 ) 　　

Current High Performance Computing (HPC) systems exhibit diverse architectures, posing great pressure to the development of parallel applications. This paper uses OPS (Oxford parallel library for structured mesh Solvers), a Domain Specific Language, to parallelize a Computational Fluid Dynamics (CFD) software named HNSC (High order Navier-Stokes simulator for Compressible flow). The code is restructured based on OPS API. The pure MPI, pure OpenMP, MPI + OpenMP and MPI+CUDA versions of executables are automatically generated based on OPS frontend and backend. Performance evaluation is performed on a server consisting of two Intel Xeon CPU E5-2660 V3 CPUs and one NVIDIA Tesla K80 GPU. The results show that the parallel codes generated automatically based on OPS have comparable or superior performance when compared to the manual versions of codes. Furthermore, the OPS generated GPU code achieves significant performance acceleration over the OPS generat- ed CPU codes. From the results, we can affirm that using OPS like DSLs for multiplatform oriented programming parallelizing of CFD application is feasible and efficient.

Performance evaluation and optimization of distributed and parallel deep neural network on the Tianhe-3 prototype system

WEI Jia, ZHANG Xing-jun, JI Ze-yu, LI Jing-bo, YUE Ying-ying

2021, 43(05): 782-791. doi:

Abstract ( 371 )

PDF (1031KB) ( 457 ) 　　

The Deep Neural Network (DNN) model is an important branch of the Artificial Neural Network (ANN) model and the foundation of deep learning. In recent years, due to the improvement of computer computing power and the development of high-performance computing technology, it has become possible to increase the DNN network depth and the model complexity to improve its feature extraction and data fitting capabilities. As a result, DNN has shown advantages in natural language processing, autonomous driving, face recognition and other issues. However, big data and complex models have greatly increased the training cost of deep neural networks. Therefore, accelerating the training process has become a key task. Its technical scope covers many aspects from the design of the underlying circuit to the design of distributed algorithms. The peak speed of the domestic Tianhe-3 aimed at one quintillion of times, and the huge computing power provides a potential opportunity for DNN training. Based on the characteristics of the ARM architecture of the Tianhe-3 prototype, using the PyTorch framework and MPI technology, this paper conducts a uniquely designed CNN training for a single FT-2000+ computing node, a single MT-2000+ computing node, and the multi-node cluster expanded through them. The performance of the above-mentioned processors in neural network distributed training has been optimized and evaluated, which provides experimental data and theoretical basis for further improving the performance of the Tianhe-3 prototype system in neural network distributed training.

A large-scale Infiniband interconnection network simulation system based on OMNet++

WANG Xin, LIN Fang, LIU Yi, QIAN De-pei

2021, 43(05): 792-798. doi:

Abstract ( 333 )

PDF (1000KB) ( 291 )

PDF（mobile） (768KB) ( 36 ) 　　

With the development of multi-core processors and the continuous growth of computation requirements, the scale of high-performance computer systems continues to increase. Using simulators to simulate high-performance computing systems plays an important role in system design and optimization, and the interconnection network simulation is an indispensable part of it. This paper designs and implements a large-scale Infiniband interconnection network simulation system based on OMNet++. The system drives the network simulation process through the recorded parallel program MPI messages. It can simulate the working state of the interconnection network during the program execution, and can be integrated with the message-driven high-performance computer simulators. By comparing with the communication delay between nodes in the real cluster, the simulation accuracy is verified and the simulation performance is tested.

Optimization of Gaussian filtering algorithm on FT-M7002

CHEN Yun, WANG Meng-yuan, CHAI Xiao-nan, SHANG Jian-dong,

2021, 43(05): 799-806. doi:

Abstract ( 229 )

PDF (1000KB) ( 307 ) 　　

With the application of domestically developed Feiteng series high-performance DSP processors in the field of image processing, there is a strong demand for high-performance image processing algorithms on this platform. As the basic algorithm of image processing, Gaussian filtering can effectively filter out Gaussian noise in images, and it has been widely used in the field of image processing. According to the architectural characteristics of FeiTeng high-performance DSP and the characteristics of Gaussian filtering algorithm, the optimization of Gaussian filtering algorithm on Feiteng high performance DSP is realized. Optimization methods such as manual vectorization, control flow elimination, and loop unrolling are adopted to take full advantage of data-level and instruction-level parallelism, thereby reducing the number of data accesses and improving instruction efficiency. According to the DMA hardware and vector memory structure characteristics in the FT-MT2 core, optimizations such as ping-pong cache and DMA array transposition are performed to reduce the data transmission time and improve the data locality. Test results under various filter kernel sizes and image matrix scales show that, compared to the serial implementation of the Gaussian filter algorithm, the parallel optimization implementation achieves a speedup of 1.3~1.41. With cache enabled, compared with the running performance of the Gaussian filtering algorithm in the dsplib library on the TMS320C6678 platform, the acceleration effect is 1.15~1.71 times.

A data interleaving and reinforcement technology for non-out-of-order storage

WANG Dan-ning, LIU Sheng, LI Zhen-tao

2021, 43(05): 807-813. doi:

Abstract ( 139 )

PDF (718KB) ( 193 ) 　　

Storage reinforcement introduces interleaving to improve storage reliability. Interleaving can disrupt the original data sequence and weaken the correlation of the data sequence before and after the interleaving, thereby reducing the impact of multiple consecutive bit errors of the data on the storage, and improving the system’s error correction capability. Because the original data is scrambled, interleaving also brings about the problem of disorder of stored data information, which affects data access during hardware debugging and reduces debugging efficiency. To solve the problem of stored information disorder caused by interleaving, this paper proposes a data interleaving reinforcement technology for non-out-of-order storage. By improving the original interleaving encoding and decoding problem, interleaving is integrated into the encoding and decoding module to solve the problem of stored information disorder. The final verification results show that the technology cannot only make full use of the advantages of interleaving to correct consecutive multi-bit errors, but also ensure that the stored data sequence is the same as the original data sequence.

A radiation emission suppression method of high-performance FPGA

WANG Xia, ZHENG Long-fei, WANG Meng-jun, ZHANG Hong-li, WU Jian-fei,

2021, 43(05): 814-819. doi:

Abstract ( 222 )

PDF (834KB) ( 264 )

PDF（mobile） (834KB) ( 69 ) 　　

With the continuous development of semiconductor technology, the circuit speed, integration density and the number of I/O ports of integrated circuits have greatly increased. The miniaturization and high-density integration of FPGAs will cause electromagnetic compatibility problems. Electromagnetic shielding is the most effective way to suppress the electromagnetic radiation. Choosing efficient electromagnetic shielding materials can achieve good shielding effects. At present, electromagnetic shielding materials are rarely used in FPGA, so we select a representative high-performance FPGA as the research object, and study its electromagnetic emission in different program states through near field scanning experiment. According to the characteristics of the chip, the composite metal shield and the absorbent waveguide electric sponge are selected as the electromagnetic shielding materials to suppress the radiation emission of FPGA. Through further verification and analysis in experiments, the results show that the shield made of metal composite has better shielding effectiveness, and reaches 10 dBm. In contrast, the compressibility and structural stability of the wave-absorbing conductive sponge are more conducive to the application of FPGA in multiple scenarios.

Parallelization and optimization of Saint-Venant solver on Sunway many-core processor

DING Zhe-zhao, CHU Gen-shen, HU Chang-jun, LI Yang

2021, 43(05): 820-829. doi:

Abstract ( 279 )

PDF (1406KB) ( 293 ) 　　

The Saint-Venant equations can be used to describe the confluence process of unsteady flows in open channels. In large-scale hydrological simulation software, solving the numerical solution of the equations is the biggest bottleneck restricting the running time of the program. This paper analyzes the structure of the serial solving program and the hotspots of calculation, and explores the parallelism of the single-step simulation loop calculation section and instruction arrangement in the calculation- intensive program. A master-slave asynchronous parallel scheme is designed for the heterogeneous many-core architecture of the Sunway-TaihuLight supercomputer. The solver is ported, paralleled and accelerated based on MPI and athread libraries. SIMD technology is used to vectorize the slave-core’s calculation section, and double buffering strategies are used to optimize the communication process. The tests show that the performance of hot functions can be increased by more than 3 times on average compared with the computation before optimization. Within a million-unit scale, the speedup of parallel programs using many-core optimization can maintain growing linearly, which shows strong scalability in Sunway’s multi-nodes.

SDN multi-controller deployment and traffic load balancing

CHEN Jun-yan, LI Yue, LIANG Chu-xin, LEI Xiao-chun

2021, 43(05): 830-835. doi:

Abstract ( 219 )

PDF (728KB) ( 257 ) 　　

With the development of network, a single controller cannot meet the control needs of a large number of switches, which needs to be handled by multiple controllers. This paper uses the improved k-means++ algorithm to divide the topological graph into an undirected graph and the shortest path problem of the undirected graph. The graph is divided by the weight of the edge, and the weight is weighted by link bandwidth and transmission delay. The load balance and cost of the two methods are compared to obtain the multi-controller deployment strategy. Subsequently, by adopting a traffic balanc- ing strategy for multiple paths in the network, the data is reasonably distributed on different paths, so that the network traffic is distributed more evenly and the network performance is higher. Experimental verification shows that when multiple paths are available for data packets, the transmission path can be selected reasonably to balance the load of each path in the network.

A multi-agent Q-learning based selection method for heterogeneous vehicular network

NIE Lei, LIU Bo, LI Peng, HE Heng,

2021, 43(05): 836-844. doi:

Abstract ( 203 )

PDF (955KB) ( 278 ) 　　

How to select an access network in heterogeneous vehicular network environment is crucial for the service experience of vehicular terminal users. The current Q-learning based network selection method uses the interaction between the agent and the environment to iteratively learn network selection strategies and further realize better network resource allocation. However, this kind of methods usually have the problems of inefficient iterations and slow convergence caused by oversized state space. Besides, overestimations caused by the updates of Q tables lead to unreasonable utilization of network resources. Aiming at above problems, a Multi-agent Q-learning based Selection Method (MQSM) is proposed for heterogeneous vehicular network with 5G communication. The above method adopts the multi-agent cooperative learning idea and gets the total return value of action selection by alternate update of double Q tables. Finally, it achieves a long-term effective optimal network selection decision set in heterogeneous vehicular network environment. Experiment results show that, compared with similar methods, MQSM has better performance in terms of total system handovers, average discount values and network resource utilization.

Sensor fault diagnosis and data reconstruction based on improved LSTM-RF algorithm

LIN Tao, ZHANG Da, WANG Jian-jun

2021, 43(05): 845-852. doi:

Abstract ( 353 )

PDF (1156KB) ( 329 ) 　　

Aiming at the problem of sensor fault diagnosis and fault data reconstruction, a hybrid algorithm model based on improved Long Short-Term Memory (LSTM) and Random Forest (RF) is proposed. Firstly, the improved LSTM is used to predict the output sequence of the sensor, and the residual sequence is obtained by the difference between the predicted value and the actual value. Secondly, the residual sequence is classified by the RF algorithm to identify the fault state of the sensor. When the sensor is in fault state after diagnosis, the fault data is reconstructed by using the prediction value of the improved LSTM. The improved LSTM-RF algorithm cannot only diagnose the sensor fault, but also reconstruct the fault data. The experimental results show that the accuracy of the proposed algorithm is more than 97% on different data sets, and the RMSE of fault data reconstruction is less than 4%. Compared with the standard LSTM-RF, the improved LSTM-RF algorithm improves the convergence speed and the accuracy of fault data reconstruction by 0.4%.

An indoor positioning method based on CSI and SVM regression

DANG Xiao-chao, RU Chun-rui, HAO Zhan-jun,

2021, 43(05): 853-861. doi:

Abstract ( 222 )

PDF (1802KB) ( 293 ) 　　

In order to study the application of indoor positioning technology in complex environments, using stairs and laboratories as experimental scenarios, an indoor positioning method based on channel state information (CSI) and SVM regression is proposed. The method removes signal noise by density-based spatial clustering (DBSCAN) and extracts the fingerprint features that contribute the most using principal component analysis (PCA), while reducing the CSI fingerprint dimension. The SVM regression is used to establish a non-linear relationship between the CSI fingerprint and the target position, so as to achieve the purpose of estimating the target position based on the measured CSI fingerprint. The experimental results show that the positioning system can achieve a positioning accuracy of 1 m with a probability of more than 90% in the complex environment of staircases with strong multipath effects, and a positioning accuracy of 0.8 m with a probability of 82% in a laboratory environment. It shows that the indoor positioning method based on CSI and SVM regression has high efficiency and feasibility.

Image segmentation of retinal fundus vessels based on ensembled classified deep neural network

JIANG Yun, WANG Fa-lin, ZHANG Hai

2021, 43(05): 862-871. doi:

Abstract ( 213 )

PDF (909KB) ( 319 ) 　　

深度学习；卷积神经网络；图像分割；集成学习

An age and gender recognition model based on CNN-SE-ELM

CHEN Wen-bing, LI Yu-lin, CHEN Yun-jie

2021, 43(05): 872-882. doi:

Abstract ( 256 )

PDF (843KB) ( 352 ) 　　

Recognizing age and gender based on facial images is one of the current hot spots in artificial intelligence research. This paper proposes a hybrid model that integrates Convolution Neural Network (CNN), Squeeze-Excitation Network (SENet) and Extreme Learning Machine (ELM). The con-volutional layer in the model is used to extract facial features from the face image, the SEnet layer is used to optimize the features extracted by the convolutional layer, and the error minimization extreme learning machine (EM-ELM) is used as a classifier to realize the age and gender recognition of facial images. Compared with the existing popular models, the proposed model adopts the CNN+SENet architecture so that it can extract more representative and optimal feature maps from facial images, and the extremely fast calculation of EM-ELM makes the model faster and more efficient. Experimental results on multiple unrestricted face datasets show that the proposed model has higher recognition accuracy and speed than other recent related models based on deep learning.

Fall detection of old people based on video and human posture estimation

HUANG Zhan-yuan , LI Bing, LI Geng-hao

2021, 43(05): 883-890. doi:

Abstract ( 483 )

PDF (716KB) ( 529 ) 　　

The problem of elderly care services brought about by the aging population is a serious problem faced by modern society. For example, in many countries, falls are the biggest cause of death due to injuries among the elderly. Therefore, how to perform automatic fall detection for the elderly has become an urgent problem to be solved in elderly care services. At present, in the field of indoor fall detection, mainstream fall detection methods based on wearable devices and environmental sensors are facing problems such as complex equipment and high cost. In view of this, this paper introduces human body posture estimation into the field of fall detection, and proposes a fall detection method based on two-dimensional video. Firstly, the OpenPose data set is used to extract the positions of human joints in the original data. Secondly, these data with enhanced features are used to build static classification models and dynamic classification models. Finally, model training and fall detection are tested on three public fall data sets, achieving good results. The results of this research can provide a certain reference for the related research of fall detection.

Color image enhancement technology based on improved Retinex algorithm

LUO Jia-hang, ZHANG Xu

2021, 43(05): 891-896. doi:

Abstract ( 219 )

PDF (766KB) ( 306 ) 　　

In order to solve the problem of haze phenomenon and image detail expression in single-scale Retinex image enhancement algorithm due to lack of light, a color image enhancement algorithm based on improved single-scale Retinex algorithm is proposed. Firstly, the weighted least square method is used to enhance the details of the original color image. Secondly,
optimization of the original image is carried out. The gain coefficient is constructed for the processed image layer and the detail image layer, and a new merged image is reconstructed and output. The experimental results show that the proposed algorithm can effectively remove the haze phenomenon, make the details and contract of the image more prominent, and enhance the brightness. Compared with other traditioanl algorithms, the objective eva- luation index of image processing has been greatly improved, and the image enhancement ability has been significantly improved.

A singular blending Bézier curve with shape parameters

ZHANG Gui-cang, TUO Ming-xiu, SU Jin-feng, MENG Jian-jun, HAN Gen-liang

2021, 43(05): 897-906. doi:

Abstract ( 159 )

PDF (1673KB) ( 238 ) 　　

Weighting idea and singular blending technology are used to extend the traditional quasi-Bézier curve, and a singular blending quasi-Bézier curve with shape parameters is constructed. Firstly, the singular blending function and the quasi-cubic Bézier basis function of the triangular polynomial space are combined to obtain the definition of the singular blending quasi-Bézier curve, and the singular blending quasi-Bézier basis function is deduced according to the definition of the singular blending quasi-Bézier curve. Secondly, we discuss the singular blending quasi-Bézier basis functions and the properties of their corresponding curves, and explore the influences of singular blending and parameters on them. Finally, an example of a singular blending quasi-Bézier curve and surface design is given. The experimental results show that the curve constructed in this paper has the flexible shape adjustability while having the practical properties of the traditional Bézier curve. The new curve can not only accurately represent conic curves such as elliptical arc, circular and parabola arc, but also achieve G1 and G2 continuity under certain conditions. Extending the curve to the surface using the tensor product method can also accurately represent the ellipsoid and the spherical surface. A large number of analysis and examples prove that the curves constructed in this paper are very effective in geometric design.

An industrial smoke image segmentation method based on FCN-LSTM

ZHANG Jun-peng, LIU Hui, LI Qing-rong

2021, 43(05): 906-916. doi:

Abstract ( 211 )

PDF (880KB) ( 246 ) 　　

In industrial production, the pollution level of industrial smoke and dust is often judged based on Ringelmann scale. An effective method is to monitor the industrial smoke using computer vision system. The accurate segmentation of smoke targets is the key to this system. Since the shape of industrial smoke is variable and similar to cloud, the existing algorithms do not work well in complex scenes, so the accuracy of segmentation needs to be improved. Aiming at this problem, this paper proposes an industrial smoke image segmentation method based on FCN-LSTM. On the basis of using fully convolutional network (FCN) to extract spatial features of the image, the time information of the image sequence is extracted by long short-term memory network (LSTM). The dynamic features of smoke and dust are used to distinguish the moving smoke and background, so as to enhance the anti-interference ability in complex scenes. Experiments show that, compared with the FCN, the proposed model can significantly improve the anti-interference ability in complex scenes. The model can effectively overcome the interference from the cloud, and solve the problem of interference points in the segmentation results of FCN. The IoU indicator is increased by up to 8.04%.

A multi-granularity ensemble classification algorithm for imbalanced data

CHEN Li-fang, DAI Qi, ZHAO Jia-liang

2021, 43(05): 917-925. doi:

Abstract ( 186 )

PDF (1504KB) ( 386 ) 　　

To address the problems of low accuracy, poor stability and weak generalization ability used in the traditional model when solving the problem of imbalanced data classification, a sequential three-way decision multi-granulation ensemble classification algorithm is proposed. A binary relationship is adopted to realize the dynamic division of the granular layer. The threshold value is calculated according to the cost matrix and a multi-layer granular structure is constructed. The data of each granular layer is divided into a positive domain, a boundary domain, and a negative domain, and the division on each granular layer is recombined according to positive and negative domains, positive and boundary domains, and negative and boundary domains to form a new data subset. A base classifier is built on each data subset to achieve the ensemble classification of imbalanced data. Simulation results show that the algorithm can effectively reduce the imbalance ratio of data subsets and improve the difference of the base classifier in ensemble learning. Under the two evaluation indexes of G-mean and F-measure1, the classification performance is better or partially better than other ensemble classification algorithms. The new algorithm effectively improves the classification accuracy and stability of the classification model, and provides new research thoughts for ensemble learning of imbalanced data sets.

A fraud group detection algorithm based on behavior and structure features reasoning

ZHANG Yi-rui-chen, LI Yun-feng, GU Xu-yang , JI Shu-juan

2021, 43(05): 926-935. doi:

Abstract ( 150 )

PDF (704KB) ( 162 ) 　　

Online reviews have an important influence on users' shopping decisions. This has resulted in that some malicious merchants hire a large number of review spammers in an organized and strategic way to promote some target products for increasing sales and earning greater profits, and to demote some target products for reducing their sales. In order to detect the organized spammer groups, this paper proposes a detection algorithm that combines behaviour and structural features reasoning. This algorithm consists of two parts. The first part uses the frequent item mining method to generate candidate groups, then uses behaviour indicators to calculate the cooperative fraud suspicion for each member of the group, and regards this suspicious degree as a priori probability. The second part first constructs a weighted reviewer-commodity bipartite graph for each group, and then uses the loopy belief propagation algorithm to infer the posterior probability. The posterior probability obtained after inference is taken as the final cooperative fraud suspicion of the member. Finally, the entropy method is used to determine whether it is a collusion group or not. Experimental results on real datasets show that the proposed algorithm has better performance than the comparison algorithm.

Current Issue

Author center

Review center

Online journal