Computer Engineering & Science

Performance comparison between FT-1500A and Intel XEON

FANG Jianbin，DU Qi，TANG Tao，CHEN Xuhao，HUANG Chun，YANG Canqun

2019, 41(01): 1-8. doi:

Abstract ( 376 )

PDF (1020KB) ( 523 ) 　　

Review attachment

We give an indepth performance comparison between FT-1500A and Intel XEON processors. At the micro benchmarking level, we measure the maximum performance (FLOPS, memory access latency, and bandwidth) that the two platforms can achieve. At application level, we select a typical ocean forecasting numerical simulation software, and study how to port an open source code to FT-1500A processor and commercial Intel XEON processor, discuss the singlecore performance and the multi-core performance of the software on the two platforms, analyze the reasons for performance difference, and propose corresponding optimization suggestions. Overall, we conclude that the FT-1500A processor already has a good ecosystem basis including operating system, compiler and the related tools, which facilitates the porting process of classical scientific programs. Although there is a noticeable performance slowdown compared to the commercial Intel XEON processor, we argue that FT-1500A processor is still a promising candidate for future applications especially when power consumption is taken into account.

An instruction mapping optimization method

based on dataflow architecture

LI Yi1，CHANG Chengjuan1，LU Shengjian1，JIANG Daozhong2，FAN Dongrui1，YE Xiaochun1

2019, 41(01): 9-13. doi:

Abstract ( 192 )

PDF (595KB) ( 317 ) 　　

Review attachment

In the field of high-performance computing, dataflow computation is an important computing architecture and shows good performance and applicability in many practical scenarios. In the dataflow computation mode, the program is represented by a dataflow diagram. A key problem of dataflow computation is how to map dataflow diagrams to multiple execution units. By analyzing the instruction mapping methods of current dataflow architecture and their shortcomings, we propose a new instruction mapping optimization method based on dataflow architecture, and achieve code implementation and verification. We optimize the instruction mapping method according to the characteristics of multi-address shared data packets, delay the splitting of multiaddress shared data packets, and reduce network congestion.

Key words:

Design and simulation of a deep learning SoC architecture

CUI Haoran，LI Han，FENG Yujing，WU Meng，WANG Chao，TAO Guanliang，ZHANG Zhimin

2019, 41(01): 14-23. doi:

Abstract ( 309 )

PDF (1033KB) ( 484 ) 　　

Review attachment

The explosive growth of information volume in the Internet era and the popularization of deep learning have made traditional generalpurpose computing unable to meet largescale, highconcurrency computing requirements. Heterogeneous computing can release greater computing power for deep learning, satisfy higher performance requirements, and be applied to a wider range of computing scenarios. We design and simulate a complete heterogeneous SoC architecture for deep learning. Firstly, we analyze the computational features of commonly used deep learning algorithms such as GoogleNet, VGG and SSD, and summarize them into a limited number of deep learning common operator classes which are displayed in charts and structure diagrams. At the same time, the pseudo instruction stream at the minimum operator level is generated. Then, based on extracted algorithm features, a hardwareaccelerated AI IP core for deep learning is designed, and a heterogeneous computing SoC architecture is constructed. Finally, experimental verification on the simulation modeling platform shows that the performance to power ratio of the SoC system is greater than 1.5 TOPS/W. The 10channel 1080p 30fps video can be processed frame by frame by the GoogleNet algorithm, and the end-to-end processing time of each frame is no more than 30ms.

A Multi-concurrent programmable

parser based on FPGA

YANG Hui，FENG Zhenqian，LI Junnan

2019, 41(01): 24-30. doi:

Abstract ( 185 )

PDF (746KB) ( 380 ) 　　

Review attachment

The protocol type and level of traditional packet parser are fixed, which lacks support for the new network protocols, and restricts the programmability of network devices. We abstract the formalization of the parsing process and implement a protocolindependent programmable parser based on the FPGA. The support for the new protocol does not require hardware changes, except remapping the parsing graph. Based on this mechanism, a series of optimization techniques are introduced to overcome the inherent serialization of packet parsing, save storage resources, and provide an effective solution to the realization of high speed programmable message parsing. We evaluate hardware cost and performance on the platform of generalpurpose multicore and high performance FPGA. Experimental results show that the programmable parser can greatly improve the performance of message parsing, quickly analyze general network protocols and potential network protocols, and effectively support the rapid development of customized network protocols.

A collaborative computation offloading algorithm

based on fiber-wireless networks

GUO Jinlin，WU Jigang，CHEN Long，SHI Wenjun

2019, 41(01): 31-40. doi:

Abstract ( 169 )

PDF (755KB) ( 400 ) 　　

Review attachment

With the development of passive optical networks, the fiberwireless (FiWi) network is envisioned to be a promising network architecture for supporting the centralized cloud computing (CCC) and mobile edge computing (MEC) simultaneously. However, existing work that focuses on collaborative computation offloading (CCO) based on FiWi network, aims at minimizing the energy consumption of mobile devices, and ignores the demands of high realtime tasks. Therefore, regarding high realtime tasks, we present a centralized cloud and edge cloud collaborative computation offloading problem, which aims to minimize the total processing time of the computation tasks, as well as its formalized description. It proves to be NPhard by reducing it to a bin packing problem, and the offloading strategy with shortest time will be selected. Two algorithms are proposed as our solutions, i.e., a heuristic collaborative computation offloading algorithm (HCCOA) and a genetic algorithm for collaborative computation offloading (GA4CCO). The HCCOA chooses the strategy with the shortest time via comparing the total processing time among different computation offloading strategies. And the GA4CCO is to get an optimal or suboptimal strategy. Experimental results show that the HCCOA and GA4CCO can reduce the total processing time of the obtained task offloading strategy by 4.34% and 18.41% on average, respectively. In addition, the GA4CCO can reduce the total processing time of the obtained task offloading strategy by 14.49% in comparison to the HCCOA.

Secure checkpointing on non-volatile processors

LIU Zimo，QIU Zongdi，JIA Tianyuan，XU Yuanchao

2019, 41(01): 41-46. doi:

Abstract ( 193 )

PDF (571KB) ( 261 ) 　　

Review attachment

Non-volatile processors (NVPs) can be quickly restored in an energy harvesting environment and are ideal for applications such as internet of things. Checkpointing is the core safeguard technology of NVPs. However, existing backup strategies assume that NVPs work in an ideal working environment, which only consider factors such as unstable energy input, but not the impact of outside malicious attacks on the security of NVPs. For example, the attackers tamper the contents of registers during backup process, which makes the system run away; or they tamper the contents written to non-volatile storage during backup process, making the data untrustworthy. These attacks hinder the application of NVPs in safetycritical areas such as wearable medical equipment. We sort out the latest security threats during the backup process of NVPs with retention state, and propose corresponding coping mechanisms.

A relevance-aware cloud service trust model

based on convex evidence theory

LIU Wei1,ZOU Lukun2,BA Yuanjie2,LI Guangli3,4,ZHANG Zhigang1

2019, 41(01): 47-55. doi:

Abstract ( 142 )

PDF (988KB) ( 351 ) 　　

Review attachment

Aiming at distinguishing service credibility in the largescale distributed cloud computing system, we propose a relevanceaware cloud service trust model based on convex evidence theory. The model formally describes the trust relationship among service providers, services and users in the cloud computing system. Then, the correlation between different services from the same service provider is revealed. Furthermore, the model uses the convex evidence theory to deal with ordered propositions and finally builds a credible service recommendation method for the cloud computing system, which can provide reasonable and reliable cloud services for users according to user needs. Through the comparison with the classical evidence theory, the experimental results demonstrate the validity and robustness of the relevanceaware cloud service trust model based on convex evidence theory while can make full use of correlation information among cloud services in the cloud computing system and provide reasonable cloud services according to user requests.

A key node identification algorithm in P2P streaming networks

LONG Jun1,2,WANG Yulou1,2,YUAN Xinpan3,ZHANG Huachao1,2

2019, 41(01): 56-64. doi:

Abstract ( 143 )

PDF (767KB) ( 318 ) 　　

There are some key nodes in P2P streaming networks, which play an important role in network security and network communication. So identifying key nodes in the network is very crucial. The traditional method has huge time overhead for key node identification in largescale networks and cannot guarantee realtime performance. We propose a key node identification algorithm in P2P streaming media networks. Combining with the network structure characteristics of the hybrid mode, we use a regionbased computing model to solve the huge timeconsumption problem caused by the excessive network scale. The importance of nodes is quantitatively described according to the contribution and propagation capacity of nodes. Simulation results show that the proposed algorithm can quickly obtain the results of node importance ranking and effectively identify the key nodes in P2P streaming networks.

Optimizing intrusion detection of BP neural networks

by a modified harmony search algorithm

DING Hongwei,WAN Liang,DENG Xuankun

2019, 41(01): 65-72. doi:

Abstract ( 180 )

PDF (754KB) ( 396 ) 　　

In intrusion detection based on traditional BP networks, the model of the BP network algorithm is easy to fall into local optimum and the initial value is random. So how to choose initial value directly affects the training effect of BP networks. We therefore propose an improved harmony search (HS) algorithm to optimize the initial value of BP neural networks, which gets the BP neural network a set of better initial values. Experimental results show that the improved HS algorithm has higher fitness function value, and the detection rate and convergence rate of the algorithm are improved when the BP network optimized by this algorithm is used in intrusion detection.

A new NP-CSMA random multiple

access protocol based on VANETs

ZHAN Gang，DING Hongwei,LIU Qianlin,YANG Zhijun，ZHOU Shengjie

2019, 41(01): 73-79. doi:

Abstract ( 146 )

PDF (1026KB) ( 274 ) 　　

With the emergence of intelligent transportation, VANETs have received more and more attention. The dynamic topological structure of VANETs changes sharply, which makes high demands on network performance, such as the throughput rate and transmission rate. We propose an adaptive multichannel dual clock NPCSMA random multiple access protocol with RTS/CTS mechanism. We first define two kinds of PCSMA protocols. The RTS/CTS mechanism solves the hidden terminal problem, the dual clock mechanism reduces average idle time, and the multichannel mechanism increases channel number, identifies priorities of users and improves the throughput rate. The adaptive mechanism enables the system to maintain stable throughput rate under high load. We also analyze transmission rate, which is proved to be relatively high. The calculation formulas of throughput rate and transmission rate are derived through the average period method, and the simulation results are consistent with the theoretical derivation.

An anonymous heterogeneous aggregate

signcryption scheme for vehicular networks

NIU Shufen,LI Zhenbin,WANG Caifen

2019, 41(01): 80-87. doi:

Abstract ( 201 )

PDF (612KB) ( 414 ) 　　

Heterogeneous aggregate signcryption can not only achieve the confidentiality and unforgeability of information transmission between different cryptosystems, but also reduce the cost of data communication. Combining with the characteristics of vehicular networks, we present a heterogeneous aggregate signcryption scheme from certificateless cryptography to identitybased cryptography, and adopt different system parameters to enhance the security of the system. The scheme can simultaneously verify multiple messages and keep the sender's identity anonymous, which can effectively secure the privacy of users. Moreover, we prove the indiscernibility and unforgeability of the scheme under the random oracle model. Data comparison experiments show that compared with the schemes of the same type, the proposed scheme is effective in reducing communication cost and more suitable for vehicular networks.

Model checking of generalized possibilistic

computation tree logic with multi-valued decision process

YUAN Shen,WEI Jielin,LI Yongming

2019, 41(01): 88-97. doi:

Abstract ( 149 )

PDF (574KB) ( 302 ) 　　

Model checking is an effective technique for automatically verifying the behavior of hardware and software systems. To formally verify a concurrent system that contains non-deterministic and inconsistent information, based on the possibility theory and multivalued logic, we propose a generalized possibilistic multivalued model checking algorithm for computation tree logic with multi-valued decision process (MvDP), and apply it to non-deterministic system verification. Firstly, an MvDP is constructed to specify the behaviour of the system as a formal system model. Then a multi-valued computation tree logic (MvCTL) is introduced to describe system properties and the model checking algorithm of generalized possibilistic MvCTL with MvDP is presented. The algorithm converts the model checking problem into polynomialtime fuzzy matrix calculations. Finally, we give a specific application example to deal with model checking problems of nondeterministic multi-valued systems.

An improved OL resolution: SOL resolution

LIU Qinghua，XU Yang

2019, 41(01): 98-103. doi:

Abstract ( 154 )

PDF (365KB) ( 257 ) 　　

Review attachment

In 1973, Chang and Lee combined linear resolution with ordered resolution and proposed an ordered linear resolution, namely OL resolution. It greatly improves the efficiency and mechanicality of linearization. However, the OL resolution is not a complete resolution method. We propose a concept of strong reduction based on the reduction condition of the OL resolution. Strong reduction further restricts the reduction condition of ordered informational center clauses, and the strong reduction is a special case of the reduction condition. Based on strong reduction, we also put forward an improved OL resolution, called SOL resolution, and prove its completeness.

null

GUO Pengfei,FENG Lin，SUN Siyu

2019, 41(01): 104-112. doi:

Abstract ( 218 )

PDF (776KB) ( 332 ) 　　

Combining redundant discrete wavelet transform (RDWT) with matrix Schur composition, we propose a new digital watermarking algorithm for gray image blind detection. The algorithm firstly performs secondary RDWT on the carrier image and extracts the low frequency part of the approximate subgraph. Then Schur decomposition is performed on the low frequency coefficients of each block after blocking procession, and the maximum energy elements on the diagonal of the upper triangular matrix are modified to complete the embedding of the watermark image which is encrypted by the Fibonacci transformation. Therefore, the watermark extraction process can achieve blind extraction. Experimental results show that the watermark embedding time of the proposed watermarking algorithm is only 0.97s, and the average watermark extraction time is 0.211s. Compared with similar blind watermarking algorithms, the PSNR value of invisibility is increased by 14.12%, and the resistance to Gaussian noise, salt and pepper noise attack and shear geometric attack by 21.32%, 17.31% and 27.83% respectively.

Image retrieval based on CNN feature

weighting and region integration

YUAN Hui1,LIAO Kaiyang1,3,ZHENG Yuanlin1,2,CAO Congjun1,3,TANG Ziwei1,DENG Xuan1

2019, 41(01): 113-121. doi:

Abstract ( 270 )

PDF (870KB) ( 409 ) 　　

Review attachment

Compared with traditional features, the features extracted by convolutional neural networks have a stronger image description capability, and the convolutional layer is more suitable for image retrieval than the full connected layer. However, convolution features which are highdimensional consume huge time and memory to do image matching. We propose a new method to improve and integrate convolution features to construct a singledimensional feature vector, and then use it for image matching. Firstly, the 3D feature extracted from the last convolution layer is reweighted to highlight the edge information and location information of the image. Then, we integrate several regional feature vectors which are processed by sliding windows, into a global feature vector. Finally, we get the initial ranking results of the search by using the cosine distance to measure the similarity between the query graph and the test image, and conduct reranking with the extended query method to get the final mean average precision (mAP ). We conduct experiments on the Paris6k and Oxford5k databases and two extended databases Paris106k and Oxford105k which are extended by100k images. On Paris databases, the proposed method improves the mAP by about 3% compared with the crossdimensional weighting features (CroW) method on Paris. On Oxford databases, our method is about 1% better than the CroW method. The results show that the global features extracted by our new method can better describe the image.

Vertical multi-layer interaction

technology in 3D interactive interface

CHEN Miaoyun,YIN Jibin

2019, 41(01): 122-129. doi:

Abstract ( 141 )

PDF (893KB) ( 287 ) 　　

In the vertical operation areas of Leap Motion hardware, we use different heights of hands to manipulate the multilayer discrete target selection task mapped on the screen to get the number of layers suitable for user operation, and then analyze the corresponding human factors and make a discussion. Method: First of all, we can get user’s commonly used vertical operation scope through experiments, perform layered experiments on the basis of the common areas, and carry out linear analysis according to the time cost and index of difficulty (ID) in task, to prove its suitability of the Fitts' law model. Result: Through experiments we find that the linear fitting degrees(R2) of ID to task time under full visual feedback and partial visual feedback are 0.766 and 0.771, respectively, so the Fitts' law model is not suitable for multilayered discrete target selection tasks under layered interface. We also find that the maximum number of layers that can be operated by users under full visual feedback and partial visual feedback is 20 and 18 respectively through analysi on time and error rate. Our conclusion can provide technical design reference for application scenarios based on multilayer 3D interface.

Key words:

A data sampling method

based on double decision tree

CHEN Li1，FEI Hongxiao2,DING Hailun2,CHENG Lin2,ZHAI Jiyu2

2019, 41(01): 130-135. doi:

Abstract ( 201 )

PDF (538KB) ( 306 ) 　　

In data mining, a basic assumption is that the data distribution of training set samples are consistent with that of test set samples. But as data volumes increase, how to find out representative data in huge amounts of data becomes particularly difficult. By studying existing data selection methods, we find that it is difficult to evaluate their sampling effect because they are not integrated with the data mining tool, such as simple random sampling and progressive sampling. Due to contingency factors and uncertainty, it is difficult to guarantee the basic assumptions of data mining, which also makes the generalization error of the model larger. In order to solve these problems, we put forward a structured data sampling method based on double decision tree. Firstly, we generate a decision tree with the C4.5 algorithm, which is used to select appropriate data and data collection points in the data source. Then, we generate another decision tree to evaluate the quality of the selected data set and achieve data sampling of high efficiency and high quality. Experiments show that compared with random sampling, the accuracy of the model based on our sampling is improved obviously.

A dealer on-site management architecture

solution based on container technology

DAI Jiashun

2019, 41(01): 136-143. doi:

Abstract ( 150 )

PDF (1198KB) ( 351 ) 　　

With the rapid development of Internet and cloud computing, new technologies such as container are emerging. Faced with the shortage of service capability, management capability and operational capability of the traditional car dealer DMS system, we propose a new dealer architecture construction solution. Through system decoupling, we use the RESTful service to implement the architecture, and conduct containerization to build a new containerbased dealer management system, which makes the architecture more flexible and reduces the operating costs of subsequent systems.

An improved bat algorithm based on

cross-border relocation and Gaussian mutation

LI Yongheng，ZHAO Zhigang

2019, 41(01): 144-152. doi:

Abstract ( 141 )

PDF (873KB) ( 294 ) 　　

Aiming at the problem that individuals cross border and suffer from premature convergence in the bat algorithm, we propose an improved bat algorithm based on crossborder relocation and Gaussian mutation. The algorithm pulls the individuals which cross the solution boundary back into the solution space, and uses the crossborder relocation strategy to relocate. We then use the Gaussian mutation strategy to control the search range of individuals, and the population is radically searched around the optimal solution as the center, which enhances the local search and global optimization ability of the bat algorithm. Since the loudness and pulse frequency of the bat algorithm are inconsistent when bats approaching the target solution, which affects the continuous evolution ability of the algorithm, we introduce the linear gradient strategy to ensure that the updates of loudness and pulse frequency are compatible with the continuous evolution of the algorithm. We compare the optimization ability of the new algorithm with other algorithms under different position relationships in the solution space, and analyze the convergence stability of the new algorithm with the experimental data. Experimental results show that the proposed algorithm has better convergence speed and accuracy. In addition, the global optimization ability and high dimensional problem optimization ability of the algorithm demonstrate good robustness.

An intelligent recommendation system for

optimization algorithms based on multi-classification

support vector machine and its empirical analysis

CUI Jianshuang,CHE Mengran

2019, 41(01): 153-160. doi:

Abstract ( 150 )

PDF (829KB) ( 481 ) 　　

Intelligent algorithm recommendation is an important branch of the research field of hyperheuristic algorithms. Its goal is to automatically select the most suitable algorithm for the problem to be solved from many "online" algorithms, thereby greatly improving the efficiency of problem solving. We propose and validate an intelligent optimization algorithm recommendation system, whose theoretical basis is the No Free Lunch theorem and Rice’s algorithm selection framework. It assumes that there is a potential correlation between problem features and algorithm performance, thus the algorithm recommendation problem can be converted into a multiclassification problem. In order to verify the assumption, the multimode resource constrained project scheduling problem is chosen as the test sample data, a number of metaheuristic optimization algorithms such as the particle swarm optimization, simulated annealing, tabu search, and artificial bee swarm, are used as the recommended algorithms, and the multiclassification strategy of support vector machine is applied to achieve algorithm classification recommendation. Crossvalidation results show that the recommendation accuracy exceeds 90% and the evaluation indicators perform well.

Current Issue

Author center

Review center

Online journal