Computer Engineering & Science

ParM: A heterogeneous programming model for domestic processors

ZHU Wen-long, JIANG Jia-zhi, HUANG Dan, XIAO Nong

2023, 45(09): 1521-1531. doi:

Abstract ( 459 )

PDF (1154KB) ( 739 ) 　　

With the increasing demand for computing power, various domestically produced heterogeneous computing devices have emerged. These devices have their specialized programming models, and developers need to develop based on the architecture characteristics of different devices using these dedicated programming models. Therefore, the code developed is not portable across devices. In recent years, unified heterogeneous parallel programming models that support various computing devices have appeared overseas, but there is still relatively little research and implementation of heterogeneous programming models for domestically produced devices. To address this issue, a performance-portable heterogeneous programming model called ParM has been developed. This programming model is provided in the form of a C++ library and shields many low-level implementation details, reducing the difficulty of parallel programming. The current backend devices supported by this programming framework include x86 CPUs, NVIDIA GPUs, Huawei Kunpeng processors, and Huawei Ascend AI processors. Performance optimizations have been carried out for these backend devices, and performance test on various devices has shown that the ParM programming model can achieve over 90% performance compared to native code.

Performance optimization of RISC-V basic math library

LI Fei, GUO Shao-zhong, ZHOU Bei, SONG Guang-hui, HAO Jiang-wei, XU Jin-chen

2023, 45(09): 1532-1543. doi:

Abstract ( 346 )

PDF (2023KB) ( 612 ) 　　

The basic mathematical library is one of the fundamental software libraries in a computer system, and its performance is one of the key factors affecting the efficiency of higher-level applications. The existing RISC-V basic mathematical library can achieve correct calculations, but it contains a large number of memory access instructions and redundant instructions in its source code, which leads to suboptimal function performance. Additionally, the assembly code for RISC-V mathematical functions is large and involves complex branch conditions, which increases the difficulty of direct optimization. To address these issues, this paper follows a local-to-global optimization approach from and proposes a method for automatically detecting critical paths in RISC-V mathematical functions. This method focuses on solving the problem of other branch registers being easily modified when optimizing critical branches. By using a queue-based register allocation strategy, the registers within the same path are reallocated, which improves register utilization and minimizes the number of memory access instructions. Furthermore, redundant instructions are also combined and functionally refactored. Experimental results show that the average execution cycles for 67 RISC-V mathematical functions have been optimized from 144 cycles to 85 cycles, resulting in an average performance improvement of 29.61%.

A 6H parallel computing architecture for edge computing

LI Lei, ZHENG Li-ming, WANG Hong-yi, CHAI Yong-yi, LIU Pei-guo

2023, 45(09): 1544-1552. doi:

Abstract ( 212 )

PDF (966KB) ( 506 ) 　　

The current centralized cloud computing model has shortcomings in terms of latency, security, and utilization of environmental information. In recent years, the industry and academia have proposed various edge computing concepts such as fog computing, mobile edge computing, and mobile cloud computing to address these issues. The main idea is to move computing, storage, I/O, and other resources to the network edge in order to improve the service quality of various applications. However, existing edge computing architectures often directly adopt cloud computing architectures, leading to a series of problems such as poor interoperability, low resource utilization, insufficient granularity of resource management, and lack of dynamism. This paper deeply analyzes the characteristics of edge computing and proposes a 6H parallel computing architecture suitable for edge computing environments based on lightweight virtualization, software-defined networking, parallel computing, and other basic concepts. The 6H parallel computing architecture aims to achieve high performance, high availability, scalability, modularity, scalability, and ease of use. Subsequently, this paper implements a 6H computing framework using a Python/C++ hybrid programming model. The framework is tested under typical edge computing hardware conditions with typical IoT use cases. The results show that as the number of computing processes and computing node data increases, the computation time decreases nearly linearly, indicating good scalability and scalability of the framework. Under high-concurrency conditions, the framework performs well, demonstrating high performance. In case of abnormal situations on the edge servers, the framework has a fast recovery time, indicating good availability. In addition, the computing framework adopts the CMD-Worker-Handler programming model, which is highly modular and allows for easy secondary development, showing good usability.

A novel flexible gate control mechanism for time-sensitive networking

LIN Jia-shuo, LI Wei-chao, CHENG Jian, ZHAN Shuang-ping, FENG Jing-bin, WANG Tao, HUANG Qian-yi, TANG Bo, WANG Yi,

2023, 45(09): 1553-1562. doi:

Abstract ( 226 )

PDF (1116KB) ( 569 ) 　　

Time-sensitive networking flow scheduling algorithms often generate a large number of gate control events, exceeding the capabilities of network devices and making it difficult to deploy sche- duling algorithms in practical networks. To address this issue, a novel flexible gate control-based flow scheduling strategy is proposed, which relaxes the strict isolation constraints between real-time traffic flows and best-effort traffic flows, allowing some nodes to not enable the gate control mechanism for real-time flows. This strategy can flexibly select to enable the gate control for real-time business at various network device ports, reducing the gate control events required for scheduling real-time traffic flows by up to 91.6%. It even allows the existence of network devices on the transmission path of real-time traffic that do not support gate control scheduling, achieving a mixed deployment with ordinary networks.

Usability-enhanced thumbnail-preserving encryption

YE Xi, ZHANG Yu-shu, ZHAO Ruo-yu, XIAO Xiang-li, WEN Wen-ying

2023, 45(09): 1563-1571. doi:

Abstract ( 314 )

PDF (1908KB) ( 559 ) 　　

Traditional encryption protects the privacy of images stored in the cloud while completely depriving them of visual usability. Though thumbnail-preserving encryption (TPE) can balance image privacy and visual usability, none of the existing options consider the preservation of non-visual usability. In view of this, a usability-enhanced TPE scheme is proposed. This scheme takes advantage of the reversibility and flexibility of data hiding, freeing up part of the original image space for pixel adjustment to keep the ciphertext thumbnail similar to the original one. Before performing pixel adjustment, the image is traditionally encrypted for security. After the pixel adjustment is completed, the remaining space capacity is used to embed additional information that provides non-visual usability to users. Therefore, the proposed TPE solution not only balances privacy and visual usability, but also preserves non-visual usability to a certain extent. Finally, experimental results confirm the effectiveness and advancement of the proposed scheme.

A dual search strategy for modifying the longest matching rule in domain name service

ZHOU Cong, TAO Jing, ZHAO Bao-kang, LI An-yi

2023, 45(09): 1572-1577. doi:

Abstract ( 106 )

PDF (570KB) ( 434 ) 　　

Domain name service (DNS) plays an important role in the Internet. The combination of ENUM technology and DNS has been widely used in the three networks convergence, and the BIND system accounts for a high rate in the DNS. This paper compares the difference between the RFC and the BIND system on the longest matching rule. By comparing the results of the two types of DNS, a dual search strategy is proposed to modify the BIND system, so that the DNS conforms to the standard RFC. This can provide ideas for more standard service customization processing, and deploy an application in the real network.

A survey of target tracking algorithms based on Siamese network

MA Yu-min, QIAN Yu-rong, ZHOU Wei-hang, GONG Wei-jun, Palladium Turson

2023, 45(09): 1578-1592. doi:

Abstract ( 602 )

PDF (3012KB) ( 781 ) 　　

Siamese network is a coupled framework established by two or more artificial neural networks, which turns the regression problem into a similarity matching problem and has attracted much attention from researchers in the computer vision field. With the rapid development of deep learning theory, target tracking technology has been widely used in daily life. Siamese network-based target tracking algorithms have gradually replaced traditional target tracking algorithms with their relatively superior accuracy and real-time performance, becoming the mainstream algorithm for target tracking. Firstly, the challenges and traditional methods faced by target tracking tasks are introduced. Then, the basic structure and development of Siamese network are introduced, and the design principles of Siamese network-based target tracking algorithms in recent years are summarized. In addition, the performance of Siamese network-based target tracking algorithms is compared using multiple mainstream datasets for target tracking testing. Finally, the problems and prospects of Siamese network-based target tracking algorithms are proposed.

A Siamese attention-gated fusion encoding-decoding network for remote sensing image change detection

CHEN Hai-yong, L Cheng-jie, DU Chun, CHEN Peng

2023, 45(09): 1593-1601. doi:

Abstract ( 211 )

PDF (898KB) ( 452 ) 　　

To address the problems of reduced feature map resolution in deep convolutional neural networks, which leads to poor performance in detecting small changes in remote sensing images and difficulty in effectively distinguishing external interference to produce false changes, a Siamese attention-gated fusion encoding-decoding network for remote sensing image change detection is proposed. A triple attention network module is introduced in the encoding part to further solve the problem of false changes in the change detection image. An attention-gated fusion module is proposed to selectively fuse features from multiple levels. A deep supervision strategy is directly introduced in the decoding part to enhance the feature extraction capability of the change detection network. The effectiveness of the proposed network is verified through experiments.

An unsupervised video summarization algorithm based on deep and shallow feature fusion

ZENG Fan-feng, WANG Chun-zhen, LI Chen

2023, 45(09): 1602-1610. doi:

Abstract ( 167 )

PDF (869KB) ( 410 ) 　　

To solve the problem that the existing unsupervised video summarization algorithms do not accurately judge the importance of video frames, an unsupervised video summarization algorithm based on deep and shallow feature fusion is proposed. The deep features of video frames are extracted by a Convolutional Neural Network (CNN), while the shallow features are first extracted by the Speeded Up Robust Features (SURF) operator and then encoded using the Bag-of-Words (BOW) model. The deep and shallow features are fused to enrich the information of the feature descriptors as the input of the network model. A Bidirectional Long Short-Term Memory network (BiLSTM) is used to model the temporal information and output frame importance scores. The model is optimized using reinforcement learning. For generating static video summaries, a keyframe selection method based on local maxima is designed, which follows the temporal structure of the original video and avoids redundancy. Compared with several unsupervised video summarization algorithms on the SumMe and TVSum datasets, experimental results show that the proposed algorithm can make more accurate judgments on video content and generate higher-quality summaries.

Face liveness detection based on multi-adversarial discrimination network

REN Tuo, YAN Wei, KUANG Li-qun, XIE Jian-bin, CHEN Zhong-yu, GAO Feng, GUO Rui, SHU Wei, XIE Chang-yi

2023, 45(09): 1611-1620. doi:

Abstract ( 194 )

PDF (1737KB) ( 420 ) 　　

Face liveness detection is a key factor in ensuring the security of face recognition systems. In particular, disentangled learning methods can effectively address the problem of generalizing datasets in face liveness detection. However, existing disentangled learning methods often take the entire face image as input and parse out forged trace elements, ignoring the issue of local details of forged traces. To address this issue, this paper improves the existing forgery trace disentanglement network and proposes a multi-adversarial discriminative network model. The discriminator is designed with a primary discriminator and a regional discriminator. A facial mask module is introduced to generate facial skin and feature masks. Local facial information is integrated to make the generated images more closely resemble the distribution of face images in the dataset, while also disentangling an enhanced version of the forgery trace. The proposed multi-adversarial discriminative network effectively enhances the effect of forgery trace on forged face images and improves the accuracy of face liveness detection. Specifically, the detection error rate of our model on the OULU-NPU dataset in two experiments is only 0.8% and 1.4%, significantly lower than that of the STDN. At the same time, good detection results are achieved on the Idiap Replay-Attack dataset. To verify the transferability of our method, cross-domain experiments on the NUAA dataset and the Idiap Replay-Attack dataset also achieves good results.

An image dehazing algorithm using weighted fusion with the Sigmoid function

HUO Yuan-lian, ZHANG Qiao-sen, ZHANG Jin-shi, FAN Hong-dong,

2023, 45(09): 1621-1628. doi:

Abstract ( 254 )

PDF (1006KB) ( 457 ) 　　

To address the problems of uneven transition between foreground and sky regions, inaccurate transmission rate fusion, and unnatural color restoration in image dehazing, an image dehazing algorithm using weighted fusion with the Sigmoid function is proposed. This algorithm uses different attenuation constants in the RGB three channels, and uses Sigmoid as the weight function to fuse the estimated transmission rates of the sky region based on the color attenuation prior and the estimated transmission rates of the foreground region based on the dark channel prior. Finally, the haze-free image is restored through the atmospheric scattering model. Experimental results show that the dehazing effect of this algorithm is better for hazy images containing sky regions, and the restored images are clear and natural in color, improving the subjective visual effect of the images. Objective comparisons also confirm the effectiveness and applicability of this algorithm

A particle swarm optimization algorithm with centroid opposition-based learning and simplex search

ZHANG Wen-ning, ZHOU Qing-lei, JIAO Chong-yang, MEI Liang

2023, 45(09): 1629-1638. doi:

Abstract ( 165 )

PDF (1015KB) ( 474 ) 　　

The particle swarm optimization (PSO) algorithm often suffers from problems such as low population diversity and being trapped in local optimal solutions. To address these issues, a particle swarm optimization algorithm with centroid opposition based learning and simplex search (COLS-PSO) is proposed. During the initialization process, the search space is constructed based on a chaos strategy. During the evolution process, the particles that need to undergo centroid opposition-based learning are selected based on the Spearman coefficient to help the algorithm escape from local extreme value areas. Furthermore, a simplex search method with strong local search ability is introduced to enhance the development of the optimal particle's neighboring area and improve the search accuracy. The algorithm is tested on several standard test functions and then applied to software testing data generation problems. The experimental results show that the COLS-PSO algorithm performs well in terms of solution accuracy, convergence speed, and effectiveness, and can effectively balance the contradiction between population diversity and algorithm convergence.

Anomaly detection of intelligent trading behavior based on mixed model

ZHANG Nai, ZHANG Chen-liang, LIU Yong-xiang, CHEN Cong, HUANG Yan-ting

2023, 45(09): 1639-1647. doi:

Abstract ( 314 )

PDF (787KB) ( 735 ) 　　

As one of the important embodiments of intelligent finance, intelligent trading based on trading software is booming in the domestic financial market, significantly improving the efficiency of financial transactions. However, there are various types of intelligent trading software, and the design ideas and algorithm complexity of the trading strategies involved are diverse, leading to abnormal trading and non-compliance risks. Currently, the work of anomaly detection for intelligent trading behavior has not been fully developed. Therefore, aiming at the complexity and professionalism of the trading scenario, a method combining deep learning implicit representation learning and rule tree model explicit rule learning is proposed to model the timeliness and compliance of the trading data respectively. To verify the effectiveness of the proposed method, it has been compared with some representative benchmark methods in multiple scenarios such as stocks and futures, and the best performance has been achieved. In addition, further analysis of the model has been conducted to verify the impact of different features on the effectiveness of abnormal detection.

A multi-strategy improved chaotic Harris hawk optimization algorithm

HU Chun-an, XIONG Yu-ran

2023, 45(09): 1648-1660. doi:

Abstract ( 267 )

PDF (1838KB) ( 646 ) 　　

The Harris Hawk Optimization algorithm (HHO) is a recently proposed meta-heuristic algorithm that simulates biological population predation scheduling in the original hawk algorithm design. A Multi-strategy improved Harris Hawk Optimization algorithm (MHHO) is proposed to address the shortcomings of the Harris Hawk Optimization algorithm such as insufficient exploitation capability, decreasing population diversity, and easily falling into local optimality. Firstly, a chaotic local search strategy is introduced into the Harris Hawk to improve the exploitation ability of the algorithm. The advantages of chaotic mapping are exploited to find better individuals by performing local search around the current individual. Secondly, to enhance the population diversity, an elite alternative pooling strategy is proposed. In addition, the distribution estimation strategy is used to improve the convergence efficiency of the algorithm by sampling the dominant population information to better guide the direction of population evolution. Experimental tests on CEC2017 demonstrate that the improved algorithm achieves a balance between convergence speed and global search ability. Finally, the practicality of the improved algorithm is demonstrated by applying it to solve engineering constrained problems.

An optimized A* algorithm based on local obstacle rate pre-acquisition and bidirectional parent node change

ZHANG Zhi-yuan, CHEN Hai-jin, ZHANG Yi-ming

2023, 45(09): 1661-1669. doi:

Abstract ( 149 )

PDF (1230KB) ( 321 ) 　　

Aiming at the problems of poor path optimization, low search efficiency and low flexibility caused by the traditional A* algorithms failure to identify the environmental information effectively, an improved A* algorithm based on local obstacle rate pre-acquisition and bidirectional parent node change is proposed. Firstly, the local obstacle rate of each part of the grid map is obtained based on the drift matrix algorithm. Then, the pre-acquired local obstacle information is integrated into the improved A* algorithms evaluation function, and the search space is adaptively adjusted according to the different complexity of each region of the map. Finally, the improved parent node change method is used to further optimize the path and reduce the redundant points and inflection points of the generated path. The simulation results show that the algorithm has a significant improvement in the path length, the number of inflection points, search efficiency, running time and other indicators.

Blended MOOC video viewing pattern mining based on an improved self-adaptive DBSCAN

WANG Ruo-bin, GENG Fang-dong, ZHANG Yong-mei, SONG Wei, WANG Wei-feng, XU Lin

2023, 45(09): 1670-1678. doi:

Abstract ( 141 )

PDF (1130KB) ( 336 ) 　　

The DBSCAN (Density-Based Spatial Clustering of Applications with Noise) algorithm based on density clustering can automatically perform classification tasks according to data features, and is often used for clustering analysis of complex data sets with noise. However, it has the defects of difficult parameter determination and high degree of human participation, which limits the application of automatic and high-precision mining. To overcome these defects, an adaptive DBSCAN algorithm based on the k-dist graph slope (KSSA-DBSCAN) is proposed. The algorithm can automatically select the appropriate k-dist graph inflection point as the optimal neighborhood based on the slope of the k-dist graph, and automatically determine the optimal density threshold during the clustering iteration process according to the change in the number of clusters, which overcomes the defects of difficult parameter determination and high degree of human participation. KSSA-DBSCAN is compared with DBSCAN and KANN-DBSCAN on six data sets, and the experimental results show that the accuracy of the algorithm is better than that of other algorithms on the four data sets, and the accuracy is increased by up to 25% compared with DBSCAN. When it is applied to the pattern mining of blended MOOC videos viewing behavior data, the results show that the algorithm can effectively and automatically mine the video viewing patterns, further verifying the effectiveness of the algorithm.

An improved opposition-based learning equilibrium optimizer algorithm based on neighborhood searching

LI An-dong, LIU Sheng, GOU Ru-ru

2023, 45(09): 1679-1690. doi:

Abstract ( 126 )

PDF (2114KB) ( 413 ) 　　

To address the problems of low convergence accuracy and easy local optima trapping in the standard Equilibrium Optimizer (EO) algorithm, this paper proposes an Improved Opposition-based learning Equilibrium Optimizer Algorithm based on Neighborhood Searching (IOLEONS) that combines neighborhood topology search. Firstly, the hyperbolic tangent adaptive operator is used to modify the average concentration value in the balance pool to improve the convergence accuracy of the algorithm. Then, the Euclidean distance between particles is calculated, and a neighborhood search mechanism is introduced to further enhance the algorithm's local development ability, better balancing the algorithm's development and exploration stages. Finally, the dynamic symmetric opposite learning strategy with Chebyshev mapping is used to enhance the population's disturbance ability, improve the diversity of the population, and help the population escape from local optima. The convergence of the improved algorithm is analyzed, and eight benchmark test functions are selected in the simulation experiments. The results of Wilcoxon signed-rank test and Friedman rank test show that the improved algorithm has better optimization performance.

A knowledge tracing model fusing GA-CART and Deep-IRT

GUO Yi, HE Ting-nian, LI Ai-bin, MAO Jun-yu

2023, 45(09): 1691-1700. doi:

Abstract ( 294 )

PDF (924KB) ( 445 ) 　　

With the rapid development of deep neural networks, the advantages of knowledge tracing models based on deep learning are gradually emerging. Deep-IRT combines item response theory with dynamic key-value memory networks (DKVMN), which increases the connection between students and exercises but ignores the influence of learning features. DKVMN-DT adds behavior feature preprocessing based on CART decision tree to DKVMN, but the decision tree is still a greedy algorithm. To optimize the local optimum problem caused by CART and strengthen the connection between student ability and exercise difficulty, a model combining CART based on genetic algorithm and Deep-IRT is proposed. Firstly, CART is optimized twice based on genetic algorithm, and the learning behavior characteristics of learners are preprocessed. Then, the cross characteristics are calculated and integrated into the underlying model of DKVMN. Finally, item response theory is introduced to predict the completion probability according to students' ability and exercise difficulty. The experimental results show that DKVMN-GACART-IRT model has better AUC values than the original model, and have better prediction performance.

Clinical assisted diagnosis based on heterogeneous graph medical record attention network

LI Yong, FENG Li, WANG Xia

2023, 45(09): 1701-1710. doi:

Abstract ( 189 )

PDF (1164KB) ( 411 ) 　　

Automatically extracting useful information from electronic medical records (EMRs) and assisting in disease diagnosis has important theoretical and practical significance for clinical decision support and smart hospital construction. However, there is an imbalanced distribution of symptom data in EMRs, which leads to insufficient data volume for some diseases in assisted diagnosis. Moreover, traditional methods ignore the heterogeneity and multi-source contextual information of medical records, which can lead to poor disease prediction accuracy. This paper proposes a clinical assisted diagnosis prediction model HCAD based on heterogeneous graph medical record attention network. Firstly, the problem of imbalanced electronic medical record data is solved by constructing an external medical knowledge graph. Secondly, by effectively integrating patient condition descriptions and physiological records and designing node-level attention mechanisms and semantic relationship-level attention mechanisms, the importance of node and different semantic relationship information is identified. Finally, highly representative patient node vector representations are obtained through hierarchical aggregation, which accurately predicts diseases. Experiments on a real EMR dataset show that the proposed model has high feasibility, effectiveness, and interpretability, with an average F1 value improvement of 7.45% compared to the baseline.

Current Issue

Author center

Review center

Online journal