Computer Engineering & Science

GPU-accelerated pulse Doppler radar signal processing

GONG Hao, LIU Ying, FENG Jian-zhou, ZHAO Ren-liang, LENG Jia-xu,

2021, 43(07): 1141-1149. doi:

Abstract ( 395 )

PDF (1066KB) ( 375 ) 　　

High-performance radar signal processing is a key technique in radar systems. Previous research on high-performance radar signal processing algorithms usually rely on specialized signal processing devices, such as FPGAs or DSPs, which are not only expensive but also difficult to program. GPU is a general-purpose device, especially suitable for processing large-scale data, such as radar signals, etc. At present, most of the existing GPU-accelerated radar signal processing research is on SAR imaging related applications, with little research on target detection radar, such as pulsed Doppler radar. In order to fulfill the requirement of radar echo data on throughput and real-time processing, a novel fine-grained parallelization by grid stride, a coarse-grained parallelization by multi-CUDA streams, and a data preprocessing method based on parallel scan are proposed. The real-time performance and accuracy of the algorithm are evaluated from multiple perspectives such as performance testing and error analysis. Compared with the state-of-the-art implementation on CPU, the hardware platform has achieved a speedup of more than 300 times, and is superior than other existing CUDA accelerated pulse Doppler radar signal processing algorithms.

A fast modeling method of computable models for large-scale cities

WANG Chun, YU Chang-hua, TANG Hao, LI Hai-feng

2021, 43(07): 1150-1159. doi:

Abstract ( 154 )

PDF (2034KB) ( 225 ) 　　

To provide computable models of large-scale cities for numerical simulation, it is necessary to have the ability of fast modeling the computable models for cities with large scale, massive data and frequent updating. However, due to the errors in the data collecting process and the essential simplification operations in model processing, in directly modeled digital city models there exists large amount of CAD model deficiencies such as non-closeness, joint, intersection and isolation, which makes the city models cannot be used in numerical simulation. The cost of repairing these deficiencies using commercial software is extremely high. To solve this problem, this paper proposes a fast modeling method of computable models for large-scale cities. By analyzing the CAD model deficiencies involved in LoD1 city models, we propose the computable conditions that must be satisfied and the corresponding measurements. Our method can directly build the computable city models without CAD model deficiency. To improve the efficiency of computable processing, we also present graphic retrieval acceleration algorithms for large-scale cities. The experiments show that it takes us only about 25 minutes to build a computable city model with 710 km2, 300 thousand buildings and 2.17 million facets, which satisfies the requirements of numerical simulation on large-scale city models.

Scheduling of heterogeneous tasks for distributed training

YANG Jian-wei, MENG Min, HUANG Jia-le, WU Ji-gang

2021, 43(07): 1160-1167. doi:

Abstract ( 218 )

PDF (657KB) ( 279 ) 　　

Workers in distributed machine learning often need to deal with heterogeneous tasks during the training process. However, the task publisher may not be able to determine which workers in the cluster of edge server (ES) are currently in training based on effective prior knowledge. To tackle the problem that the ES cluster cannot fulfill the maximization of the training performance and the quality of service at the same time, a scheduling algorithm of heterogeneous tasks is proposed. Firstly, the factors influencing the convergence performance of distributed training are analyzed under the constraints about cluster’s resources. Secondly, the optimization objective for maximizing training performance is established. Finally, the optimization problem is transformed into a multidimensional multiple-choice knapsack problem. The simulation results show that the proposed scheduling algorithm of heterogeneous tasks can maximize the performance of distributed training and simultaneously ensure the quality of ser- vice.

A dual-port issue queue and its performance optimization

SUI Bing-cai, SUN Cai-xia, WANG Yong-wen, GUO Hui

2021, 43(07): 1168-1172. doi:

Abstract ( 214 )

PDF (697KB) ( 344 ) 　　

Issue queue is the out-of-order control unit of superscalar processors, and it is also a key component in the processors, which plays a decisive role in the performance of the entire processors. The paper proposes a dual-port issue queue structure that can effectively improve the performance of out-of-order superscalar processors. The queue can estimate the issue time of the instructions according to the correlation between the instructions, and allocate the instructions to different queues. The impact of two different launch strategies on performance is compared, and the strategy of marking the execution pipeline at the input end can obtain higher IPC performance, which can reach 10.68% at the maximum. At the same time, when the same launch strategy is adopted, the impact of the number of launch queue entries on performance is compared. Compared with 24 launch queues, 32 launch queues can improve the IPC performance by 2% on average, and 8.59% at most.

Task scheduling optimization of Flink in container environment

HUANG Shan, , FANG Liu-yi, , XU Hao-tong, DUAN Xiao-dong,

2021, 43(07): 1173-1184. doi:

Abstract ( 211 )

PDF (1551KB) ( 269 ) 　　

With the rapid development of Internet technology, human beings are moving towards the era of big data and cloud computing. As the latest generation of big data computing engine, Flink is favored by academia and industry for its advantages such as low latency and high throughput. When Flink is deployed in the cloud environment, its default task scheduling will lead to uneven load distribution due to the inability to obtain container deployment distribution information. To solve this problem, this paper proposes a Flink task scheduling load balancing algorithm in container environment to obtain the performance information of each node and the distribution information of the container on the node, give priority to the container of nodes with more free resources, and avoid the uneven load caused by the frequent selection of containers. The evaluation results show that the proposed algorithm can more evenly allocate tasks and improve resource utilization and computing speed when deployed in container environment

A novel circuit layout and routing algorithm

YUAN Ye, WANG Gang, LIU Xiao-guang, LI Yu-sen

2021, 43(07): 1185-1191. doi:

Abstract ( 541 )

PDF (613KB) ( 487 ) 　　

This paper investigates the research status of automatic circuit layout and routing technology at home and abroad, and designs a layout and routing algorithm for medium-sized circuits. It is mainly used in the module test of large-scale layout design software, and provides users with preliminary wiring and layout results of each module, which is convenient for users to find and correct error points efficiently, and fills the gap in related fields in China. In this paper, a hypergraph model is established and transformed into a graph model. The Stoer-Wagner algorithm is improved, and the algorithm and the Fiduccia-Mattheyses algorithm are used to divide the graph based on the minimum cut theory, and construct a partition tree. Based on this tree, this paper designs a binary relative moving algorithm to determine the location of each circuit component, which greatly reduces the layout congestion and improves the aesthetics. For a circuit with hundreds of components, its layout can be obtained within 0.5 seconds. Based on the A* algorithm, the routing is improved in many aspects, and the routing speed is improved. For a circuit with <1000 wires, the routing results can be obtained within 0.1 s to 60 s, achieving 100% routing rate and uniform layout and routing effect.

Fault-tolerant design for daisy chain of MEDA biochip

ZHANG Ling

2021, 43(07): 1192-1199. doi:

Abstract ( 194 )

PDF (1655KB) ( 228 ) 　　

Micro-electrode-dot-array biochip (MEDA biochip) is a new type of digital microfluidic chip that combines microelectronics and microfluidics. Based on the idea of microarrays, each microcell includes a drive circuit and a detection circuit to realize real-time monitoring of biochemical tests. To reduce external pins, all microcells of MEDA biochip are connected in series by a daisy chain to achieve accurate control of the biochip. As the key data path of the MEDA biochip, even if only one unit fails in the daisy chain, the entire chain will fail. Therefore, an effective fault-tolerant design must be carried out on the daisy chain. In this paper, a daisy chain structure with self-test and fault-tolerant functions is designed for MEDA biochip. The test response triggers the automatic fault tolerance of the daisy chain’s fault unit. When a certain unit of the daisy chain fails, its test response is abnormal, which triggers the automatic repair of the failed unit. If the repair fails, the abnormal test response will trigger the bypass of the unit again, thereby achieving automatic fault tolerance. Experimental results show that the structure can perform effective fault tolerance while testing and diagnosing faults, and bypass it permanently when fault tolerance fails.

A scale-adaptive multi-scale quantum harmonic oscillator algorithm and its parallelization

JIAO Yu-wei, WANG Peng, XIN Gang,

2021, 43(07): 1200-1209. doi:

Abstract ( 189 )

PDF (743KB) ( 318 ) 　　

Multi-scale quantum harmonic oscillator algorithm (MQHOA) is a meta-heuristic algorithm based on the theory of Quantum wave function. In the traditional MQHOA optimization process, the sampling scale of different individuals is not different. This mechanism limits the diversity of solutions. Aiming at the sampled individuals with different fitness levels, a scale adaptive strategy is proposed. This strategy reasonably expands the scale of individuals with poor sampling conditions and increases the diversity of sampling scales used by different individuals in the iterative process. In addition, a parallelization framework is proposed based on the scale difference. Seven groups of test functions are selected to test the improved algorithm (MQHOA-PS) and MQHOA on the Huawei Kunpeng 920 processor and AMD EPYC 7452 processor. The experiments show that the improved algorithm has higher accuracy and success rate and less time.

An improved DV_ Hop location algorithm based on improved wolf colony algorithm

SONG Ling, HUANG Da-sheng

2021, 43(07): 1210-1218. doi:

Abstract ( 165 )

PDF (708KB) ( 230 ) 　　

DV_hop algorithm is one of the classical range-free sensor node location algorithms. However, due to the uneven distribution of nodes, the distance between the unknown nodes and the anchor nodes calculated by the average hop distance is far from the actual distance, resulting in the lack of positioning accuracy. To solve this problem, an improved DV_Hop algorithm based on improved wolf colony algorithm (IWCA) is proposed by virtue of the fact that wolf colony algorithm needs less calculation parameters and has good optimization accuracy. Firstly, the estimated distance of DV_hop algorithm is optimized. For the unknown node with 1 hop from anchor nodes, the distance between itself and anchor nodes is directly calculated by RSSI method, so as to reduce the error of estimated distance. Secondly, because the wolf colony algorithm is easy to fall into the local optimum, an improved wolf colony algorithm (IWCA) is proposed. The idea of simulated annealing is adopted. When the position of N iterations of probe wolf is not changed, it is allowed to swim in the direction of poor effect with a certain probability. The way of swimming is chaos mapping. Finally, The IWCA is applied to the calculation of node location, so as to reduce the error of DV_Hop algorithm when calculating node location. Theoretical analysis and simulation experiments show that compared with similar algorithms, IWCADV_Hop can improve the accuracy of sensor network node location.

Attribute-based encryption supporting conjunctive keyword

CHEN Si-qi, HUANG Ru-wei

2021, 43(07): 1219-1225. doi:

Abstract ( 194 )

PDF (505KB) ( 327 ) 　　

The attribute-based encryption mechanism enables fine-grained access control and supports multi-user data sharing. Aiming at the problems of inefficiency, easy key leakage and only supporting single keyword search in most attribute-based searchable encryption schemes, an attribute-based encryption scheme supporting conjunctive keyword search is proposed. The scheme uses a linear secret sharing scheme to implement access control, and performs secret sharing and recovery operations in a matrix associated wit-h the attributes of the participants, which reduces the amount of calculation through matrix operations. In the trapdoor generation stage, the user key is not directly submitted to the cloud server, thus ensuring the security of the user key. Based on the polynomial equation, conjunctive keyword search is realized to narrow the search scope and improve the user’s search experience. Strict security analysis proves that the scheme can achieve security against keyword attacks.

Coverage probability analysis of three-layer heterogeneous networks with hybrid spectrum allocation

HU Hai-xia, JIA Xiang-dong, YE Pei-wen, JI Peng-shan, JING Le-tian

2021, 43(07): 1226-1235. doi:

Abstract ( 123 )

PDF (981KB) ( 227 ) 　　

Aiming at the problem of traffic surge caused by business intensive deployment in hot areas of urban environment, a three-layer cluster heterogeneous network model is constructed by densely deploying pico base stations and femto base stations on macro base stations to achieve the effect of diversion. To restrict the severe interference from inter-tier and intra-tier transmitters, an improved shared spectrum separation and allocation scheme is proposed based on cluster classification. The Poisson cluster process and the Poisson hole process are used to model the deployment of the base station, a statistical description of the interference experienced at a typical receiver is given, and then the coverage probabilities based on the validity criterion is derived. At the same time, two cascade cases such as ordered FBS association scheme and non-ordered FBS are considered. The simulation results verify the correctness of the theoretical analysis and reveal the influence of network parameters on network performance. It is concluded that, compared with non-ordered FBS association scheme, ordered FBS association scheme’s performance gain greatly depends on the coverage radii of PBS and FBS.

An efficient batch verification algorithm for SM2 signatures

RUAN Ou, CHEN Ji-chen, MAO Hao

2021, 43(07): 1236-1242. doi:

Abstract ( 470 )

PDF (534KB) ( 385 ) 　　

Digital signature has many important applications such as identity authentication and non-repudiation. For example, in the e-cash system, merchants or consumers sign/verify e-cashes with digi- tal signatures to ensure the security and correctness of e-cashes. When a large number of e-cash signatures are simultaneously verified, the efficiency of the system is obviously reduced. Therefore, it is important to use the batch verification algorithms. Although many batch verification algorithms for standard digital signatures have been proposed, such as DSA, RSA and ECDSA, there isn’t a batch verification algorithm for SM2 digital signature. An efficient SM2 batch verification algorithm is firstly proposed. It reduces the most time-consuming point multiplication operation from n times to 1 time (n is the number of signatures) and remarkably shortens the running time for verification. The correctness and security of the SM2 batch verification algorithm are proved and verified by experiments. Experimental results show that the proposed batch verification algorithm is more efficient than the single verification algorithm. If the number of signatures is 1 million(220), the single verification algorithm takes about an hour, while the batch verification algorithm only needs 2 seconds.

A color image robust watermarking algorithm based on Schur decomposition and chaotic scrambling

LI Wei-an, XIONG Xiang-guang, XIA Dao-xun

2021, 43(07): 1243-1249. doi:

Abstract ( 135 )

PDF (1068KB) ( 271 ) 　　

Aiming at the problem that the traditional color image watermarking algorithm usually only embeds a random signal or binary image into the color carrier image, by combining the advantages of chaotic system and Schur decomposition, a robust watermarking algorithm is proposed, which embeds a color watermarking image into a color cover image. Firstly, the Arnold scrambling and chaotic encryption are performed on the color watermarking image to improve the security performance of the color watermarking image. Secondly, the color cover image is separated into three channels, each channel is divided into 4×4 non-overlapping blocks, and Schur decomposition is done on the selected block. Finally, the pre-processed watermarking signal is embedded into the maximum energy element after Schur decomposition. The experimental results show that the proposed algorithm has better imperceptibility, and has better robustness to filter attacks, noise attacks, JPEG compression attacks, and multiple combined attacks. Compared with similar watermarking algorithms based on Schur decomposition, this algorithm has better anti-attack performance for most attacks.

A deep Q-Learning based resource allocation algorithm in indoor wireless networks

Lv Ya-ping, JIA Xiang-dong, LU Yi, JING Le-tian

2021, 43(07): 1250-1255. doi:

Abstract ( 215 )

PDF (728KB) ( 205 ) 　　

To overcome the serve energy consumption problem in indoor wireless networks, a deep Q-Learning based transmit power allocation algorithm for home base station is proposed. Firstly, a deep learning network (DLN) is built to optimize the energy efficiency of indoor wireless networks. Then, the energy consumption rating is regarded as the rewards, the batch gradient descent method is used to continuously train the weights of DLN. Finally, the simulation results show that the proposed algorithm can dynamically adjust the transmit power, and is significantly superior to Q-Learning and water-filling algorithms in terms of convergence speed and EC optimization, which can effectively reduce the energy consumption of indoor wireless networks.

An edge detection algorithm based on anisotropy and edge strength correction factor

LI Kai, ZHANG Yong-sheng, TONG Xiao-chong, LI Feng

2021, 43(07): 1256-1263. doi:

Abstract ( 193 )

PDF (1025KB) ( 279 ) 　　

The edge detection algorithm IAGK combining isotropic and anisotropic Gaussian filters has edge stretching effect, which causes false edges at complex edges. The introduction of isotropic Gaussian derivative filters also declines the robustness of the algorithm. At the same time, the anisotropic factor of IAGK algorithm is non-optimal value, and theoretically does not have optimal SNR and positioning performance. In this paper, the optimal anisotropic factor is selected based on the automatic anisotropic Gaussian kernel, and the edge strength correction factor is used to modify the edge strength formula of the IAGK algorithm to suppress the generation of false edges and the influence of noise. The proposed edge detection algorithm is tested on classical edge detection datasets. The experimental results show that, compared with Canny, AAGK and IAGK algorithms, the proposed algorithm has better noise robustness and better weak edge detection capability. The pseudo edge effect can also be further reduced. In the case of noise, the figure of merit (FOM) of the proposed algorithm is 3%, 4% and 7% higher than the Canny, AAGK and IAGK algorithms, respectively.

A segmentation method of high myopia atrophy lesions based on multi-scale deep supervision

ZENG Zeng-feng, HUAN Yu-xiang, ZOU Zhuo, ZHENG Li-rong

2021, 43(07): 1264-1272. doi:

Abstract ( 193 )

PDF (1446KB) ( 258 ) 　　

In order to improve the segmentation accuracy of atrophic lesions of high myopia in fundus images, aiming at the problems of poor quality of the fundus images of different individuals and the difficulty of segmentation due to the blurred border between atrophic lesions and adjacent tissues, a segmentation algorithm of high myopia atrophy lesions based on multi-scale depth supervision is proposed. Firstly, an optimization algorithm is developed to make the fundus image organization structure clear and uniform in style, reducing the difficulty of distinguishing complex features. Because V-Net can only obtain lower segmentation accuracy, the MS-V-Net that combines multi-level and low-level features to form multi-scale feature learning can extract semantic information in images of different scales. More importantly, the deep supervision of each multi-scale module of MS-V-Net eventually forms a closely supervised MSS-V-Net. Compared with the original V-Net segmentation method, it improves the discrimination of important semantic information by the network and generalization ability. The experimental results show that the Dice box-plot of the proposed method exhibits a trend of fewer outliers, larger median, shorter box length, smaller upper and lower interval, and shorter two lines outside the box, effectively improving the segmentation of atrophic lesion images of high myopia.

SAG-Net: A new skip attention guided network for joint disc and cup segmentation

JIANG Yun, GAO Jing, WANG Fa-lin

2021, 43(07): 1273-1282. doi:

Abstract ( 199 )

PDF (1070KB) ( 246 ) 　　

Learning the semantic information and location information of feature map is essential to produce ideal results in retinal image segmentation. Recently, convolutional neural networks have shown strong capabilities in extracting valid information from feature maps. However, convolution and pooling operations filter out some useful information. This paper proposes a new skip attention guided network (SAG-Net) to save feature map's semantic and location information and guide the expansion work. In SAG-Net, the Skip Attention Gate (SAtt) module is first introduced, which is used as a sensitive extension path to pass the semantic information and location information of previous feature maps, not only helps eliminate noise, but further reduces the negative effects of the background. Secondly, the SAG-Net model is further optimized by merging image pyramids to preserve contextual features. On the Drishti-GS1 dataset, the joint disc and cup segmentation task proves the effectiveness of our proposed method. Comprehensive results show that the proposed method is superior to the original U-Net method and other recent methods for disc and cup segmentation.

No-reference quality assessment of night-time images based on global and local features

ZHAO Yue, WANG Lai-hua, QI Su-min, WANG Wei-sheng, LIU Chen

2021, 43(07): 1283-1290. doi:

Abstract ( 177 )

PDF (1380KB) ( 233 ) 　　

To solve the problems of dark light and difficult feature extraction of night-time images, a no-reference night-time image quality evaluation method based on global and local features is proposed. Firstly, the contour principle is used to divide the image into light region and dark region, and the proportion of the bright region is taken as feature1. Secondly, the global brightness information of the night-time image is extracted and used as feature2. Then, the differential operator method is adopted to obtain the edge of the image as feature3. Finally, the night-time image is converted from RGB to HSI, and the hue, saturation and brightness components are extracted as feature4, feature5 and feature6. Combining the above features, an evaluation model is established by BP neural network to evaluate the quality of night-time images. The test results on the public database show that the proposed method is more consistent with the subjective score and better than the existing image quality evaluation methods.

A microblog friend recommendation algorithm based on SSD and timing model

MA Han-da, JING Di

2021, 43(07): 1291-1298. doi:

Abstract ( 161 )

PDF (612KB) ( 268 ) 　　

The exponential growth of social network users makes it difficult for users to find suitable friends in the network. Therefore, this paper proposes a microblog friend recommendation algorithm based on SSD and timing model. Firstly, the multi-target detection algorithm SSD is used to extract the information of the collected users’ images. Secondly, the timing model is used to further process the extracted information in time dimension. Then, JS divergence formula is used to calculate the similarity between users. Finally, the weighted fusion is carried out with the similarity based on the user’s personal information to obtain the comprehensive user similarity, and the Top-K idea is used for user re- commendation. The verification on Sina microblog users’ data set shows that the weight value of refe- rence factors will affect the recommendation results and the precision of this method is higher than that of the algorithm only considering users’ attributes or users’ pictures.

A time series prediction model based on temporal point process

GUO Quan-sheng, LI Dong, ZHANG Lei, WEI Chu-yuan

2021, 43(07): 1299-1307. doi:

Abstract ( 296 )

PDF (920KB) ( 274 ) 　　

The effective prediction of the urban events can provide the decision support for the government to avoid, control or mitigate the relevant social risks. Firstly, a conditional strength function based on integral derivative method is proposed to improve the precision of sequence prediction. Secondly, a temporal point process model based on recurrent neural network and cumulative hazard function is constructed. The nonlinear dependence of historical events is captured by recurrent neural network, and the cumulative hazard function is obtained by fully connected network. Finally, the representative synthetic data sets and real data sets are selected to compare the performance of several models. The experimental results show that the proposed method can better predict the time series of urban events, and is superior to the traditional temporal point processes in the aspects of mean absolute error and mean negative log-likelihood, which exhibits the superiority of the model.

Current Issue

Author center

Review center

Online journal