Computer Engineering & Science

Optimization of sort join algorithms on DRAM/NVM-based hybrid memory architecture

YANG Liu, JIN Pei-quan

2021, 43(02): 191-198. doi:

Abstract ( 356 )

PDF (897KB) ( 462 ) 　　

With the rapid development of computer technology, the application scale of data is also expanding, and all walks of life have higher and higher requirements for data access speed. To meet this demand, the idea of in-memory database has been proposed, whereas the traditional mainstream main memory DRAM are integrated and extended on a large scale due to the density and energy consumption constraints. At the same time, Non-Volatile Memory (NVM) makes up for DRAM with its high performance, high density, and low power consumption. A hybrid memory system combining DRAM and NVM can deliver better performance and greater scalability, while also being more cost-effective. Under this new hybrid memory architecture, traditional algorithms face huge challenges because they have to be optimized to meet the new architecture. Therefore, this paper starts from the sort join algorithm commonly used in database system, and explores its optimization on the hybrid memory system. We propose a new sort join algorithm with key-value separation, and three different C-Join algorithms based on it. Experimental results show that our scheme achieves the expected goal, which not only reduces the usage of DRAM but also improves the time performance of the algorithm.

A lightweight processor core performance analysis framework

LEI Guo-qing, MA Chi-yuan, WANG Yong-wen, ZHENG Zhong

2021, 43(02): 199-204. doi:

Abstract ( 311 )

PDF (753KB) ( 303 ) 　　

For the actual needs of improving the processor core performance of domestic processors, aiming at the performance defects that may appear in the RTL design of processor cores, this paper proposes a lightweight processor core performance analysis framework based on RTL simulation. The performance analysis framework is designed based on lightweight directional and random tests. Through the rapid simulation of the RTL design of the base processor core (Base Core) and the new generation processor core (New Core), and the comparative analysis of the simulation results, the performance defects that New Core may introduce in the RTL design process are found quickly. Based on the performance analysis framework, combined with actual application scenarios, the test methods and test results are given. The test results show that the performance analysis framework proposed can timely find the performance defects introduced by the new processor core in the RTL design process, and effectively accelerate the development of the domestic processor core.

Power consumption optimization based on standard cell replacement

JIA Qin, MA Chi-yuan, PENG Shu-tao

2021, 43(02): 205-210. doi:

Abstract ( 251 )

PDF (912KB) ( 222 ) 　　

With the continuous improvement of integrated circuit design technology, it has brought about higher power consumption while bringing higher performance. How to balance high performance and low power consumption has become a key issue that needs to be solved in the current high-performance VLSI design. Standard cell replacement is such an effective way to reduce power consumption. This paper compares two different standard cell replacement strategies, and experimentally analyzes their power consumption optimization effect under different strategies and the corresponding impact on performance. A suitable standard cell replacement strategy is proposed to optimize the power consumption.

A workload-adaptive data reconstruction strategy in SSD pool

WEI Deng-ping, CHEN Hao-wen, XIE Xu-chao, YUAN Yuan, GAO Wen-qiang

2021, 43(02): 211-217. doi:

Abstract ( 251 )

PDF (660KB) ( 265 ) 　　

In big data era, various applications bring increasingly higher demands on storage system' capacity, performance and reliability. New storage media brings good opportunity in improving the performance of storage system, and Redundant Array of Independent SSDs (RAIS) has been widely used in various storage systems. Data reconstruction, however, takes long time and affects its ability of provid- ing I/O access services to upper level applications, when a disk fails in RAIS. A storage pool architecture supporting multithreading and concurrent processing is designed and implemented, in which I/O requests evenly distributed in all SSDs can be processed concurrently and the access performance of user I/O and data reconstruction I/O are improved. A workload-adaptive I/O scheduling strategy is proposed, which can guarantee the quality of user I/O service and improve the efficiency of data reconstruction. Experimental results show that the multithreaded concurrent I/O processing architecture based on storage pool can improve data reconstruction performance. The workload-adaptive I/O scheduling strategy can adjust the scheduling ratio of user I/O and data reconstruction I/O dynamically, ensure the service quality of user I/O and improve the efficiency of data reconstruction.

A multi-instance service chain online deployment algorithm for edge environments

SONG Hu, GAN Rang-xing, XIA Fei, ZOU Hao-dong,

2021, 43(02): 218-227. doi:

Abstract ( 278 )

PDF (975KB) ( 257 ) 　　

The limited resources of edge devices make it necessary to deeply understand the resource consumption of network functions to deploy edge services. Through the deployment experiment of containerized network functions on wireless routers, it is concluded that in addition to the computational overhead of processing business flows, communication between network function instances will also consume a lot of CPU resources. Based on this observation, the distributed deployment of network function instances on peer-to-peer edge devices at close range and relatively low load is considered, and the traffic under the condition of satisfying delay constraints are balanced, thereby minimizing the edge device load. Therefore, a fine-grained service chain load model is proposed, and on this basis, a multi-instance service chain online deployment algorithm for edge environments is designed and implemented. The algorithm includes three components: the delay satisfaction path search based on the pruning search strategy, the deployment path selection based on the nested Top K strategy, and the network function deployment based on the greedy strategy. Simulation experiments verify the effectiveness of the algorithm. The experimental results show that compared to the deployment of network function chains without considering communication overhead, the algorithm proposed in this paper can reduce the CPU load of edge devices by 10%, which is close to the theoretical optimal deployment result.

Research on virtual-physical address translation architectures of multi-GPU system

WEI Jin-hui, LI Chen, LU Jian-zhuang

2021, 43(02): 228-234. doi:

Abstract ( 282 )

PDF (616KB) ( 271 ) 　　

In recent years, with the development of big data, the dataset size of GPU applications has increased significantly, which raises challenges for current GPUs. However, as Moore's Law reaches its limit, it is not easy to improve the performance of single GPU any further; Instead, multi-GPU systems have been shown to be an effective solution due to its GPU processor-level parallelism. The support for memory virtualization in multi-GPU systems further simplifies the programming and improves the resource utilization. Memory virtualization requires the support for address translation, and the overhead of address translation has important impact on system’s performance. This paper studies two common address translation architectures in multi-GPU systems, that is, distributed address translation architecture and centralized address translation architecture. Through simulation experiments, this paper ana- lyzes and compares the advantages and drawbacks of two address translation architectures in-depth. On this basis, this paper proposes optimization suggestions for address translation in multi-GPU systems.

Design and implementation of a high performance computing user organization management system based on LAMP#br#

#br#

WU Jun-nan, OU Yang, LI Yan

2021, 43(02): 235-241. doi:

Abstract ( 221 )

PDF (753KB) ( 219 ) 　　

Aiming at the key issues such as poor user experience, large network overhead, and low access efficiency in the existing high performance computing user organization management system, a method for implementing a high performance computing user organization management system based on LAMP is proposed. This method adopts the B/S architecture and combines Twig and HTML, in order to reduce the burden on the server and improve the user experience. It uses the REST framework and the Cache mechanism to buffer massive temporary data, so as to reduce the development difficulty and network overhead. It uses trees structure to access the hierarchical data, which improves the data access efficiency and has good scalability.

Survey of cloud-edge collaborative architecture research based on software defined network

LI Bo, HOU Peng, NIU Li, WU Hao, DING Hong-wei

2021, 43(02): 242-257. doi:

Abstract ( 831 )

PDF (1598KB) ( 926 ) 　　

With the advent of the era of 5G and the IoT and the gradual increase of cloud computing applications, edge computing and cloud computing with their respective advantages are bound to integrate with each other to achieve the cloud-edge coordination, and realize the complementary advantages and collaborative linkage between cloud computing and edge computing. Due to its flexible, open and programmable network architecture, Software Defined Network (SDN) is considered to be an effective method to solve the current cloud computing and edge computing collaboration problems. Based on the advantages and disadvantages of cloud computing and edge computing, the necessity and specific connotation of cloud-edge collaboration are discussed, and the current impact of SDN on cloud computing and edge computing are summarized. Aiming at the problem of cloud-edge collaboration, a cloud-edge colla-boration network framework based on SDN is proposed to achieve the multi-dimensional collaboration between cloud computing and edge computing, such as network, storage, and computing. At the same time, the paper also points out the challenges it faces.

A hardware cost reduction scheduling algorithm of heterogeneous distributed embedded system

XING Hong-xing, WEI Ye-hua, LE Yi

2021, 43(02): 258-265. doi:

Abstract ( 187 )

PDF (781KB) ( 227 ) 　　

With the development of information technology, the scale of functions of industrial embedded systems has grown rapidly, which has greatly increased the cost of hardware. It is necessary to reduce the cost of hardware to increase profits. At the same time, in order to meet the functional safety constraints of the system, the problem of overall scheduling of tasks and messages also needs to be solved. This paper takes hardware cost reduction as the goal, establishes the hardware reduction cases, defines the timing constraints of task-to-processing unit mapping, between tasks and tasks, and between tasks and messages, and proposes an ILP based hardware cost reduction (IHCR) algorithm. With the premise of ensuring the function response time constraints, the number of processors is reduced as much as possible. Simulation experiments verify the effectiveness of the algorithm in hardware cost savings under the task schedulability.

A containerization method for reinforcement learning based on RISC-V architecture

XU Zi-chen, CUI Ao, WANG Yu-hao, LIU Tao

2021, 43(02): 266-273. doi:

Abstract ( 330 )

PDF (748KB) ( 287 ) 　　

As the hottest open-source instruction set architecture in recent years, RISC-V is widely used in a variety of domain-specific microprocessors, especially for modular customization in the field of machine learning. However, existing RISC-V applications require recompilation or optimization of legacy software or models on the RISC-V instruction set. Therefore, how to rapidly deploy, run, and test machine learning frameworks on RISC-V architectures is a pressing technology challenges. The use of virtualization technology can solve the problem of deploying and running models across platforms. However, traditional virtualization techniques, such as virtual machines, are often not applicable to RISC-V architecture scenarios due to their high performance requirements for native systems, high resource footprint, and slow operational response. Discussion of reinforcement learning virtualization on resource-constrained RISC-V architectures. Firstly, by adopting containerization technology, reducing the cost of virtualization for upper-level software builds, removing redundant middleware, and customizing namespaces to isolate specific processes, we effectively improve the resource utilization for learning tasks and achieve the rapid execution of model training. Secondly, the features of the RISC-V instruction set are used to further optimize the upper neural network model and optimize the reinforcement learning efficiency. Finally, a system prototype of the overall optimization and containerization method is implement- ed and the performance evaluation of the prototype is completed by testing multiple benchmark test sets. Containerization techniques enable the rapid deployment and operation of more complex and deep learning software frameworks at a relatively small additional performance cost, compared to traditional methods of cross-compiling deep neural network models in RISC-V architectures. RISC-V based models have approximate deployment time and reduce substantial performance losses compared to the hypervisor VM method. Preliminary experimental results demonstrate that containerization and the optimization method on it are an effective way to achieve the rapid deployment of software and learning models based on RISC-V architecture.

Design and improvement of self-adaptive value predictor based on real history feedback

SUI Bing-cai

2021, 43(02): 274-279. doi:

Abstract ( 243 )

PDF (784KB) ( 226 ) 　　

Instruction-level parallelism obtained by the out-of-order superscalar processors is more and more limited. In order to obtain higher instruction-level parallelism, more out-of-order execution and control resources must be added. As the processor architecture changes, the value prediction technology can achieve higher data parallelism with less hardware overhead based on the existing mainstream processor micro-architecture, and further improve the processor's out-of-order execution performance. This paper proposes a context value predictor (RH-VTAGE) based on real historical feedback, which controls the prediction accuracy of RH-VTAGE by setting a failure list and a prediction accuracy table, and reduces the pipeline recovery overhead when the prediction fails. Meanwhile, at the final stage of the value predictor, a real historical feedback control counter is added, and an adaptive confidence control logic is designed to dynamically adjust the confidence according to the probability for different types of instructions. The actual test results show that, compared with other predictors, RH-VTAGE's integer prediction performance has no obvious effect, but the performance of floating-point programs is improved by 31.2% at most.

Multi-label node classification based on generative adversarial network

CHEN Wen-qi, WANG Ying, WANG Xin, WANG Hong-ji

2021, 43(02): 280-287. doi:

Abstract ( 283 )

PDF (481KB) ( 291 ) 　　

Node classification is widely used in social network and other network data. In order to study node classification, generative adversarial network (GAN) is used to obtain node representation, so as to obtain a good node classification effect. On this basis, a node classification-generative adversa- rial network (NC-GAN) model is proposed. This model uses GAN to conduct a binary game, considers the connectivity distribution in the network and the similarity between nodes to obtain the node representation that better fits the network, and then classifies the node representation to obtain a good classification effect. In order to verify the effect, the proposal is compared with DeepWalk, GraphGAN and other node representation model and graph convolutional network model in terms of link prediction and node classification. The model is only weaker than the GraphGAN model in link prediction, but it is better than other models in node classification.

Research and design of 6LoWPAN network multicast communication scheme

SUN Jia-hao, WANG Cheng-cheng, TANG Dao-xian, LI Yue-hua

2021, 43(02): 288-294. doi:

Abstract ( 193 )

PDF (891KB) ( 192 ) 　　

With the increasing demand for new applications in wireless sensor networks (WSN), the low-speed wireless personal area network standard 6LoWPAN, which implements IPv6 communication based on IEEE 802.15.4, is an ideal solution for connecting WSN to the Internet for full IP communication. This paper proposes a multicast communication scheme based on 6LoWPAN network, which adds support for multicast communication to the existing 6LoWPAN network by self-organizing MAC address. The 6LoWPAN network multicast communication scheme is designed. It reduces the delay of receiving gateway data by the nodes in the group and the processing consumption of irrelevant data by the nodes outside the group. The results show that the node delay under the multicast communication scheme is 15.13% of that under unicast communication, and the data processing efficiency of out-of-group nodes is 39.02% higher than that of out-of-group nodes under broadcast communication. This communication scheme can obtain the expected function and performance, and the 6LoWPAN node can join and exit the multicast group dynamically and receive information within the multicast group.

Exploration and research on energy saving of wireless sensor networks

ZHANG Hua-nan, JIN Hong, WANG Feng

2021, 43(02): 295-303. doi:

Abstract ( 251 )

PDF (1040KB) ( 404 ) 　　

Wireless sensor nodes are often powered by batteries. Because batteries can only store li- mited energy, the network is destined to have a short lifespan. Therefore, maximizing the service life of sensor devices is an important research direction and topic. In this exploration and research on energy saving of wireless sensor network, energy collection and management strategies are analyzed. Energy collection mainly collects environmental energy, such as solar energy, and realizes energy storage through super capacitor. The energy management mainly makes the sensor nodes in the energy neutral region through the energy budget. In order to reduce the energy consumption of wireless sensor nodes, ultra-low power is adopted to wake up the receiver to continuously listen to the channel under the condition of low power consumption to reduce the power consumption related to communication. The star Asynchronous MAC protocol and ultra low power wake up receiver can be used in combination to improve the energy efficiency of sensor networks. The experimental results show that compared with the traditional scheme, this scheme has great improvement in energy efficiency, power consumption and throughput.

A master-slaver multi-agent communication modeling method for compound Petri nets

WANG Shuai-hui, YUAN Jie

2021, 43(02): 304-311. doi:

Abstract ( 177 )

PDF (1373KB) ( 306 ) 　　

Aiming at the problems of high hardware cost and large-scale occupation of computing resources in the current master-slaver MAS communication field, a compound Petri nets is introduced to establish its communication model. According to the types of MAS resources, a place of the compound Petri nets, which is related to the effective time of the place, is designed as three types of state, beha-vior and time delay. A compound Petri nets model with different priority messages and communication exception handling is established, and a communication model of master-slaver multi-agent compound Petri net is constructed. Based on the task scenario experiment of master-slaver car formation, the reliability and effectiveness of multi-agent compound Petri net correlation model are verified, which relieves the pressure of communication messages on system computing resources and reduces the dependence on communication hardware.

Image copy-move forgery detection algorithm based on superpixel shape features

WEI Wei-yi, WANG Li-zhao, WANG Wan-ru, ZHAO Yi-fan

2021, 43(02): 312-321. doi:

Abstract ( 218 )

PDF (2418KB) ( 219 ) 　　

Aiming at the problem that the excessively large number of divided sub-blocks in the traditional image copy-move forgery detection methods cause high algorithm time complexity and weak ability to resist geometric transformation, an image copy-move forgery detection algorithm based on superpixel shape features is proposed. Firstly, an adaptive division method of superpixels based on wavelet contrast is proposed to segment the image, and the stable feature points are extracted. Secondly, a novel shape coding scheme is proposed to extract superpixel shape features, which are merged with the feature points to estimate the suspected forged regions. Finally, the suspicious forged regions are segmented into superpixels again and matched to accurately locate the tampered areas. Experimental results show that the proposed method has the ability to resist geometric transformation, noise, blur and JPEG compression.

A shape recognition algorithm for traffic sign identification

DENG Xiang-yu, ZHANG Yi-nan, YANG Ya-han

2021, 43(02): 322-328. doi:

Abstract ( 288 )

PDF (988KB) ( 265 ) 　　

Traffic sign classification is the basic link of traffic sign recognition system, and traffic sign shape recognition is the core part of traffic sign classification. This paper studies traffic signs and analyzes them into three categories: ban signs, warning signs and instruction signs, respectively. A new algorithm is proposed, which uses the statistical feature of edge trend to reflect the feature of target shape. It is combined with BP neural network to identify the shape of traffic signs. Firstly, the color information is used to realize the segmentation of traffic signs. Secondly, the edge trend of traffic signs is recorded and the proportion is counted. Finally, BP neural network is used for classification to realize the identification of the shape of traffic signs. This method has good recognition effect and speed for traffic sign images with different tilt angles and shooting angles.

Person re-identification based on multi-branch feature fusion

XIONG Wei, YANG Di-chun, AI Mei-hui, LI Min, LI Li-rong

2021, 43(02): 329-339. doi:

Abstract ( 248 )

PDF (764KB) ( 339 ) 　　

This paper proposes a new person re-identification (ReID) method based on multi-branch feature fusion, in order to solve the problem that current person ReID cannot make full use of effective feature information for identification. Firstly, each of the last 3 convolution blocks is connected to a respective branch. Secondly, approaches such as attentional mechanism and batch feature erasing (BFE) are used to deal with the feature of each branch. Finally, the feature of each branch is fused to obtain the high fine-grained representational feature. The 3 branches monitor each other during training. Single-domain and cross-domain experiments have been conducted to evaluate the performance of our proposed method on Market1501、DukeMTMC-reID、CUHK03 and MSMT17 benchmark datasets. Results show that the proposed method outperforms other state-of-the-art techniques. Rank-1 and mAP on CUHK03 are 76.6% and 72.8%, respectively.

Analysis and research on the pairwise alignment Needleman-Wunsch algorithm based on dynamic programming

GAN Qiu-yun

2021, 43(02): 340-346. doi:

Abstract ( 372 )

PDF (521KB) ( 256 ) 　　

Sequence alignment is one of the most fundamental research problems in bioinformatics. The pairwise alignment Needleman-Wunsch based on dynamic programming mainly uses the iterative algorithm and the vacancy penalty rule to compare gene sequences one by one, calculates their similarity score, and finally obtains the best alignment between sequences through backtracking analysis. Although the algorithm can get the best result, it has high time and space complexity. Firstly, the original algorithm is analyzed and improved from the aspects of calculation score and backtracking. Secondly, two experiments are designed. In the experiments, Staphylococcus aureus is used as the target sequence, and Staphylococcus aureus is used as the counterpart sequence. Five groups of experiments with the same and different sequence length range are conducted. Finally, the novel coronavirus and SARS virus sequences are compared to verify the effectiveness of the algorithm. The experimental results show that the improved algorithm can reduce the sequence alignment time and improve the efficiency of sequence alignment.

Object detection based on multi-scale feature fusion and residual attention mechanism

LI Ben-gao, WU Cong-zhong, XU Liang-feng, ZHAN Shu

2021, 43(02): 347-353. doi:

Abstract ( 528 )

PDF (620KB) ( 449 ) 　　

As a multi-task learning process, object detection requires better features than classification task. Detectors that predict different scale objects based on multi-scale features have greatly surpassed detectors based on single-scale features. In addition, the feature pyramid structure is used to build advanced semantic feature maps of all scales, thereby further improving the performance of the detector. However, such feature maps do not fully consider the complementary role of contextual information to semantics. Based on the SSD baseline network, a feature fusion method based on residual attention mechanism is used to make full use of the context information. Not only can the high-resolution feature representation capabilities be enhanced by feature fusion, which is more helpful for detecting small-scale objects, but also the attention mechanism is used to strengthen the key features required for prediction. The performance of the model is evaluated on benchmark data set PASCAL VOC, the map of the model with input image sizes of 300 × 300 and 512 × 512 is 78.8% and 80.7%.

Current Issue

Author center

Review center

Online journal