Computer Engineering & Science

Analysis and optimization of the connected component algorithm for large-scale graph computing

BAI Hao, GAN Xin-biao, YANG Wen-xiang, JIA Meng-han, TU Xu-ping, ZHANG Yi-ming, GUO Min, LAI Le, ZHANG Yi, ZHU Chun-ping

2022, 44(02): 191-198. doi:

Abstract ( 292 )

PDF (803KB) ( 375 ) 　　

In recent years, graph computing has played an increasingly important role in many fields. The connected component algorithm is an important basic algorithm for graph computing, which can be applied to multiple scenarios such as reachability query. This paper is oriented to the large-scale graph traversal Graph500 standard test, and optimizes the connected component algorithm and data structure. The main innovations are as follows: (1) A shortcutting-vector algorithm is proposed for the union-find set, and the degree of coordination between the algorithm and the data structure is tested. (2) The multi-threaded iterative rotation algorithm is used to achieve the parallel acceleration of the algorithm. (3) The advantages and disadvantages of different implementation methods are compared from multiple dimensions.
Based on the optimization method, the performance is evaluated and analyzed. When scale =25 (including 225 vertices), the speedup ratio of the shortcutting-vector algorithm to the rank-merging algorithm based on two-dimensional vectors and linked lists is 1.38 times and 1.40 times, respectively. The speedup ratios of BFS and DFS are 4.76 times and 4.70 times respectively, and the space occupation is 4.1%~4.6% of the two algorithms. In addition, the speedup ratio of parallel to serial is 1.57 times.

Design and analysis of high-bandwidth memory channel on silicon interposer

LI Chuan, ZHENG Hao, WANG Yan-hui

2022, 44(02): 199-206. doi:

Abstract ( 285 )

PDF (2150KB) ( 389 ) 　　

High-Bandwidth Memory (HBM) memory has broad application prospects in big data, intelligent computing and other fields due to its ultra-high storage bandwidth. The silicon interposer that supports ultra-fine-pitch technology is the main carrier to realize the HBM signal interconnection between the memory and the chip. From HBM1.0 to HBM2E, the signal rate reaches 3.2 Gbps, and the signal integrity problem cannot be ignored. Starting from HBM ball map, the signal line distribution, line-width constraint and line-gap constraint are analyzed. A two-layer signal line transmission model is established. The frequency domain impedance analysis method and the total crosstalk calculation method are constructed. The influence of structural parameters on the transmission parameters of electrical properties is analyzed from the frequency domain and verified in the time domain. The result shows that the sum of the HBM signal line width and gap should be less than 6.8 μm. In the application frequency range, the impedance of the signal line at the far silicon layer is 6~8 Ω higher than the impedance of the signal line near the silicon layer, and the impedance value of the 3 μm line width is closer to 50 Ω. Line width and line length are sensitive parameters for insertion loss, and gap and wiring layers have little effect on loss. For the inherent loss in the low frequency area, the line width influence is dominant. For the line loss in the high frequency area, the line length has a greater influence. Line gap is a sensitive factor for crosstalk. However, due to space constraints, spacing optimization is limited. The introduction of ground shielding wire is an effective solution.

A super high-radix router based on Chiplet integration technology

LIANG Chong-shan, DAI Yi, XU Wei-xia

2022, 44(02): 207-213. doi:

Abstract ( 254 )

PDF (1034KB) ( 289 ) 　　

High-radix router with low latency and high bandwidth plays an important role in constructing large-scale interconnection networks. However，limited by the continuous increasingly design complexity and the slowdown and stagnation of Moores law and Dennard scaling, it will become more and more difficult to expand a higher number of ports on a single routing chip. Chiplet integrates multiple dies in a high-level package to form a large chip with specific functions, so as to solve the problems of scale, cost and period in chip design. According to the idea of Chiplet integration technology and using existing routing chips, this paper proposes a 128-port high-radix router.The internal structure of this high-radix router is a a two-level fat-tree network composed of multiple switch dies. The actual RTL-level code simulation test shows that, compared with the single-chip high-radix router, the designed high-radix router achieves better performance while expanding more ports.

A sliding window timing analysis method of CDC signals

MA Chi-yuan, LEI Guo-qing

2022, 44(02): 214-219. doi:

Abstract ( 159 )

PDF (875KB) ( 179 ) 　　

Timing analysis and closure of CDC signals in asynchronous clock domains is an important guarantee to the function correctness of high frequency large scale digital circuits. In order to reduce the design area, a sliding window timing analysis method of CDC signals is proposed, which can execute timing analysis and closure by setting appropriate timing constraint window for each CDC path under each corner. This method can avoid the increase of cell area caused by fake timing violations and unnecessary timing fixing under over-constraint, which often happened in the fixed constraint method. The proposed method can also reduce the backend workload of CDC circuits. The experiment under 16 nm process shows that the proposed method can significantly save the design area compared with the fixed constraint method.

Design of aggregation-based FlowRadar acceleration model for network data collection

Lv Gao-feng, WANG Yu-peng, YANG Rong-jia, TANG Zhu

2022, 44(02): 220-226. doi:

Abstract ( 206 )

PDF (918KB) ( 240 ) 　　

Network big data collection is the basis of network behavior research, which can provide real and effective data basis for understanding the characteristics of network behavior. The existing FlowRadar measurement methods can encode and decode traffic for all streams, with less memory and constant count update time. However, each packet needs to be hashed, which has the problems of high memory access times and high computation cost. To solve this problem, this paper designs and implements the packet aggregation module and the flow evict module with P4 language based on Protocol Independent Switch Arch (PISA). Simulation results verify the optimization performance of the acceleration model in terms of throughput and network delay.

A non-uniform clustering QoS routing algorithm based on software-defined wireless sensor networks

GOU Ping-zhang, YUAN Chen, ZHANG Fen

2022, 44(02): 227-236. doi:

Abstract ( 154 )

PDF (846KB) ( 255 ) 　　

Aiming at the problem of limited node resources and inability to dynamically manage the non-uniform clustered QoS routing in traditional wireless sensor networks, a software-defined wireless sensor network non-uniform clustered QoS routing algorithm (SDNUCQS) is proposed. The controller considers node energy, distance between nodes and QoS indicators, uses entropy weight method to elect high-quality cluster heads, and performs non-uniform clustering of the network. The cross classification method is used to divide the data to be transmitted into different types through delay and loss rate. In the inter-cluster routing, the controller uses the link QoS index and node load as parameters, and uses a centralized method to calculate the best path for QoS data and ordinary data transmission. Simulation experiment results show that SDNUCQS algorithm can significantly reduce network delay and loss rate. Compared with LEACH, EEUC, CRIPSO and tPSOEB algorithms, it can reduce cluster head energy consumption and prolong network life cycle.

A practical Byzantine fault tolerant consensus algorithm based on role management

LI Teng, CHENG Zhe, JIA Dong-li, JIA Yao-qing

2022, 44(02): 237-243. doi:

Abstract ( 219 )

PDF (540KB) ( 268 ) 　　

In view of the problem that the existing practical Byzantine Fault Tolerance (PBFT) consensus algorithm applied to the alliance chain has poor scalability, high energy consumption, low efficiency, and random selection of master nodes, a practical Byzantine Fault Tolerant (RPBFT) consensus algorithm based on role management is proposed. Firstly, the nodes in the system are divided into three types of role nodes with different responsibilities: managers, candidates and ordinary nodes. Secondly, the candidate node has the right to vote, and the corresponding candidate node is elected as the manager by voting. Ordinary nodes can be transformed into candidate nodes after meeting the conditions. Finally, the conversion between different types of role nodes is managed by the reward mechanism. Different role nodes can be dynamically adjusted when the total number of network nodes changes, so that the algorithm can adapt to the dynamic network. The experimental results show that the RPBFT algorithm has high reliability, low latency, low energy consumption and good scalability.

GPS trajectory de-anonymization based on deep learning

BU Guan-hua, , ZHOU Li-liang, LI Hao, ZHANG Min

2022, 44(02): 244-250. doi:

Abstract ( 189 )

PDF (624KB) ( 292 ) 　　

The rapid development of mobile Internet and LBS technology allows location service providers to easily collect an ocean of user location trajectory data. Recent studies have shown that deep learning methods can extract user privacy such as user identity from trajectory datasets. However, the existing work mainly focuses on the check-in trajectories collected by social networks, and the de- anonymization research of GPS trajectories is relatively lacking. Therefore, the research on the de- anonymization technology of GPS trajectory based on deep learning is carried out. Firstly, a pre-training method of GPS trajectory data is proposed. After sub-trajectory division, location conversion and location embedding, the spatial distance and context information of GPS coordinates in original trajectories are embedded into fixed-length vectors, so that the GPS trajectory data can be used as the input of neural network. Secondly, a GPS trajectory de-anonymization method based on deep neural network training is proposed. Based on the pre-trained vector sequences obtained in data pre-training, neural networks such as LSTM and GRU are used as encoders to train and fit user identification to achieve trajectory-user link from anonymous trajectories. Finally, the above methods are verified on Geolife trajectory dataset. In the experiment, the accuracy and Top5-accuracy of trajectory de-anonymization reach 56.73% and 7348%. The experimental results demonstrate that the GPS trajectory de-anonymization method based on deep learning can obtain more accurate user identification from the anonymous trajectory data.

A differential privacy protection algorithm in social network based on spectral clustering

YUAN Quan, YAN Fei-yang, WEN Zhi-yun, ZHANG Zhen-kang,

2022, 44(02): 251-256. doi:

Abstract ( 191 )

PDF (1165KB) ( 250 ) 　　

Aiming at the problems of excessive noise addition and unbalanced privacy protection in the differential privacy protection algorithm of weighted social networks, a privacy protection algorithm combining spectral clustering algorithm and differential privacy protection model is proposed. Firstly, to solve the problem of excessive noise addition caused by the way of directly adding noise to the side weights of social networks by traditional differential privacy protection algorithms, combined with the spectral clustering algorithm, the weighted social networks are clustered into different clusters, and different clusters are randomly selected. The method of adding noise reduces the amount of noise added and improves the availability of data. Secondly, new privacy budget parameters are designed, and the amount of noise added is determined according to the weight of the social network side, so as to achieve a more balanced privacy protection. Finally, theoretical derivation and experiments prove that the data processed by the proposed algorithm have higher availability.

A surrogate-assisted multi-objective firefly algorithm for software defect prediction

CAO Liang-lin, BEN Ke-rong, ZHANG Xian

2022, 44(02): 257-265. doi:

Abstract ( 152 )

PDF (697KB) ( 226 ) 　　

Aiming at the complexity and imbalance of data dimensions in software defect prediction (SDP), a software defect prediction method based on the surrogate-assisted multi-objective firefly algorithm (SMO-MSFFA) is proposed. The proposed method employs the multi-group strategy firefly algorithm (MSFFA) to take minimizing the feature selection ratio and maximizing the the model evaluation AUC value as the two objective functions. Random forest (RF), support vector machine (SVM), and K-nearest neighbor classification algorithm (KNN) are used as the classifiers to construct the software defect prediction models. Considering the computational complexity of the evolutionary algorithm, the embedded surrogate-assisted model completes the calculation of partial individual evaluation function offline to reduce the computational cost. Experiments on PC1, KC1, and MC1 of NASA datasets verify that, compared with NSGA-II, our method increases the model evaluation AUC value by 0.17 on PC1, decreases it by 0.01 on KC1, and increases it by 0.09 on MC1, decreases the average feature selection ratio by 0.08, 0.17, and 0.05 on PC1, KC1, and MC1 respectively, and increases the computational time by 131 seconds, and decreases the time by 199 seconds and 431 seconds on KC1 and MC1 respectively. Experimental results show that the proposed method has obvious advantages in improving the model performance, reducing the feature selection ratio and reducing the computational time.

Model checking of fuzzy computation tree logic based on fuzzy decision process

LI Zhao-kai, MA Zhan-you, LI Jian-xiang, GUO Hao

2022, 44(02): 266-275. doi:

Abstract ( 188 )

PDF (689KB) ( 272 ) 　　

Aiming at the model checking problem of uncertain fuzzy systems generated by data representation, a model checking algorithm of fuzzy computational tree logic is given.First of all, the fuzzy decision-making process is introduced as the model of this type of system. Its biggest feature is the uncertain choice of actions and the ambiguity of state expression during the migration process. Then, based on the fuzzy decision-making process, the grammar and semantics of fuzzy computation tree logic are given. Finally, a model detection algorithm of fuzzy computational tree logic is given. The algorithm is a synthetic operation that converts the model detection problem of fuzzy computational tree logic into a fuzzy matrix. Its advantages are low time complexity and a relatively simple calculation process.

An object tracking algorithm based on mixed attention mechanism

FENG Qi-yao, ZHANG Jing-lei,

2022, 44(02): 276-282. doi:

Abstract ( 269 )

PDF (854KB) ( 339 ) 　　

To solve the tracking failure problem of fully-convolutional siamese networks algorithm (SiamFC) in complex scenes such as objects deformation, occlusion, and fast motion, a novel method (SiamMA) that uses the mixed attention mechanism to enhance the network identification ability is proposed. Firstly, in order to simulate the complex scenes and enhances the generalization performance of networks, an image stacking and cropping method is adopted in the network training stage to build the self-adversarial training sample pairs. Secondly, a mixed attention mechanism algorithm is proposed, which fuses spatial attention and channel attention modules in different branches of the network, so the background interference in the feature map can effectively be suppressed and the robustness of the algorithm is improved. 4 open test datasets such as Got-10k and UAV123, etc., are adopted to evaluate the algorithm performance. The experimental results show that our method outperforms 6 traditional algorithms such as SiamFC, KCF, etc., on the main performance indexes such as tracking success rate and precision. The average speed of the algorithm can reach 60 frames per second.

A large parallax image stitching algorithm based on feature clustering

XU Guang-yu, DING Jian

2022, 44(02): 283-290. doi:

Abstract ( 253 )

PDF (1162KB) ( 304 ) 　　

Aiming at the problems of dislocation and ghosting in the process of large parallax image stitching, an image stitching algorithm based on feature clustering is proposed. Firstly, the Tyson polygon is constructed in the overlapping area of the target image based on the distribution of matched feature points. Secondly, the improved AGNES hierarchical clustering algorithm is used to cluster the feature points, and merge the Tyson polygons represented by the feature points in the corresponding group, so as to get each sub-plane of the overlapping region of the target image. Finally, the homography matrix of the corresponding subplane is solved, and the homography matrix of the non-overlapping region is assigned according to the nearest principle, then the target image is transformed by projection to get the mosaic image. The experimental results show that the proposed algorithm has high registration accuracy and can effectively improve the misregistration and local distortion in large parallax image stitching.

Face recognition based on improved label consistent KSVD dictionary learning

YAN Chun-man, ZHANG Yu-yao, ZHANG Di

2022, 44(02): 291-297. doi:

Abstract ( 176 )

PDF (611KB) ( 253 )

PDF（mobile） (611KB) ( 6 ) 　　

Aiming at the problem of face recognition under complicated conditions such as noise pollution and illumination changes,
a face recognition algorithm based on improved label consistent KSVD dictionary learning is proposed. The algorithm changes the dictionary update way of the label consistent KSVD algorithm, uses the principal component analysis algorithm to decompose the error term, and the dictionary atom is modified by the eigenvector corresponding to the maximum eigenvalue. Then, through the dictionary learning process, the discriminative dictionary corresponding to atoms and category labels is obtained. The objective function combines reconstruction error, sparse coding error and classification error. Finally, in the classification stage, the learned dictionary and classifier parameters are used to classify the test samples. The average recognition rates of 99.01% and 97.94% are obtained on extended Yale B database with illumination conditions, and AR database with various facial expressions and occlusion effect, respectively. At the same time, the algorithm is more robust in the presence of noise.

Gesture recognition based on CGRU-ELM hybrid model in WiFi environment

ZHANG Xin, FENG Xiu-fang

2022, 44(02): 298-305. doi:

Abstract ( 183 )

PDF (1669KB) ( 281 ) 　　

Aiming at the problems of high energy consumption and difficult deployment in traditional gesture recognition methods , a WiFi-based gesture recognition method is proposed. By extracting the Doppler frequency shift components from the fine-grained channel state information collected from WiFi signal, the problem of unclear mapping relationship between the statistical features extracted by wireless gesture recognition method and specific gestures is solved. Meanwhile, a deep hybrid model of CGRu-ELM is proposed to extract and classify the extracted Doppler frequency shift components, and to recognize six commonly used human-computer interaction gestures. The experimental results show that the average
accuracy of this method for gesture recognition with WiFi signal as input parameter is 93.4%.

A dynamic graph partitioning method based on GN algorithm

LUO Xiao-xia, WANG Jia, LUO Xiang-yu, LI Jia-nan

2022, 44(02): 306-311. doi:

Abstract ( 185 )

PDF (593KB) ( 285 ) 　　

With the rapid growth of graph scale, the demand for real-time processing of dynamic graphs is increasing. Most of the existing algorithms are effective for static graph partitioning, but directly using them to process dynamic graphs will bring greater communication overheads. To solve this problem, a dynamic graph partitioning method based on GN algorithm is proposed. Firstly, the vertices to be inserted in the dynamic graph over a period of time are collected. Then, the GN algorithm is used to pre-partition these newly inserted vertices to generate several internally connected communities. Finally, the community results generated by the pre-partitioning are inserted into the current graph that has been partitioned. Through experiments, the proposed method is compared with the traditional streaming partitioning method in terms of crossed edges and load balance. The results show that, on the public datasets, The proposed method can reduce the number of crossed edges by 13%, and the load balance by 42.3%. It can be seen that, compared with the streaming partitioning method, the proposed method can significantly improve the quality of dynamic graph partitioning.

YOLOv3 improvement and its compression algorithm for human body detection

ZHANG Yu-jie, DONG Rui

2022, 44(02): 312-320. doi:

Abstract ( 162 )

PDF (961KB) ( 309 ) 　　

Aiming at the problems of low accuracy, large amount of parameters, large amount of calculation, and large model volume, difficulty of being implemented on embedded platforms with limited resources when YOLOv3 algorithm is applied to human body detection, an improved YOLOv3 and its model compression algorithm are proposed. By introducing dense connections and multi-branch structure in YOLOv3, increasing the network width and multi-scale receptive field, and strengthening feature reuse, the accuracy of the model is improved. For the improved YOLOv3, channel pruning is performed by jointly optimizing the weight loss function and the L1 regular term of the BN layer scaling factor, thereby reducing the amount of parameters and calculations, and the volume of the model is greatly compressed. Experimental results show that the accuracy of the improved YOLOv3 algorithm is increased by 6.01%, and the model volume is reduced by 38.46%. After compression, although the accuracy of the model is reduced by 3.16%, the model volume is only 3.31% of the original, only 4.77 MB. Therefore, the improved and compressed YOLOv3 still maintains a high accuracy rate, and the model volume is greatly reduced, which provides support for the YOLOv3 model to realize human body detection in embedded deployment.

A survey of fusion methods for multi-source image

ZHANG Li-xia, ZENG Guang-ping, XUAN Zhao-cheng

2022, 44(02): 321-334. doi:

Abstract ( 767 )

PDF (1038KB) ( 1066 ) 　　

Due to the different imaging mechanisms, the multi-source images are essentially diffe- rent, which makes the fusion process different. Based on a large number of Chinese and foreign literature, this paper classifies fusion methods, and focuses on the fusion processes and typical algorithms of various fusion methods, and elaborates its key techniques in detail. At the same time, an in-depth review of the current evaluation indicators and classifications is carried out. Finally, combining the influencing factors of key techniques and the technology development status, the future development trend of the fusion image field is prospected from the five aspects such as data characteristics, time efficiency, information extraction, evaluation angles and universal applicability of the method.

A review of graph auto-encoder recommendation

LI Fang, WU Guo-dong, TU Li-jing, LIU Yu-liang, ZHA Zhi-kang, LI Jing-xia

2022, 44(02): 335-344. doi:

Abstract ( 638 )

PDF (805KB) ( 621 ) 　　

Graph Auto-Encoder (GAE) is a learning framework derived from graph neural network. It introduces the idea of clustering neighborhood nodes into the encoder, and the decoder decodes the graph structure data and reconstructs the graph structure data. The introduction of supervisory module in the model can improve the integrity of graph structure data embedded in the model and the accuracy of data generation. Encoder and decoder can use different neural networks to take advantage of the advantages different neural networks. In recent years, GAE recommendation has become a hot topic in recommendation system research. The progress of GAE recommendation research is analyzed from the aspects of unsupervised learning and semi-supervised learning. The deficiencies of the existing GAE recommendations, such as cold startup for users, uninterpretability, high model complexity, and multi-source heterogeneity of data, are discussed. In addition, it looks into the future of GAE recommendation from the perspectives of cross-field recommendation, combined with traditional recommendation methods, introduction of attention mechanism, and integration of various scenarios.

Interval-valued q-Rung orthopair fuzzy Frank operator and its group decision method

WAN Ben-ting, FANG Di-chang

2022, 44(02): 345-354. doi:

Abstract ( 144 )

PDF (548KB) ( 272 ) 　　

Based on the interval-valued q-Rung orthopair fuzzy environment and the Frank operator, some operation rules of the interval-valued q-Rung orthopair fuzzy Frank operator are defined, the interval-valued q-Rung orthopair fuzzy Frank weighted average operator (IVq-ROFFWA) and the weighted geometric operator (IVq-ROFFWG) are proposed based on the research of their idempotency, boundedness and monotonicity. Then, a multi-attribute group decision method (MAGDM) based on the IVq-ROFFWA operator is proposed, which selects the value of q satisfying requirements and obtains the target interval-valued fuzzy numbers by the IVq-ROFFWA, compares their scores to obtain the optimal solution, and also proposes that different q values do not affect the optimal solution ranking. Finally, the IVq-ROFFWA operator based multi-attribute group decision making method is verified to be feasible and effective. It is also verified that different q values do not affect the optimal solution ordering. The comparative analysis shows that the group decision methods based on IVq-ROFFWA and IVq-ROFFWG operators are consistent with the group decision methods based on other operators.

Current Issue

Author center

Review center

Online journal