Computer Engineering & Science

DRM: A GPU-parallel SpMV storage format based on iterative merge strategy

WANG Yu-hua, HE Jun-fei, ZHANG Yu-qi, XU Yue-zhu, CUI Huan-yu

2024, 46(03): 381-394. doi:

Abstract ( 279 )

PDF (2000KB) ( 575 ) 　　

Sparse matrix vector multiplication (SpMV) is of great significance in the solution of linear systems, and is one of the core problems in scientific computing and engineering practice. Its performance highly depends on the non-zero distribution of sparse matrices. Sparse diagonal matrices are a special type of sparse matrices, whose non-zero elements are densely arranged in the form of diagonals. For sparse diagonal matrices, scholars have proposed various storage formats on the GPU platform, which have improved SpMV performance, but still suffer from zero padding and load imbalance issues. To address these issues, a DRM (Divide-Rearrange & Merge) storage format is proposed. This format uses matrix partitioning strategies based on fixed threshold values and matrix reconstruction strategies based on iterative merging to achieve sparse zero padding and load balancing between blocks. Experimental results show that on the NVIDIA Tesla V100 platform, compared to DIA, HDC, HDIA, and DIA-Adaptive formats, the time performance is accelerated by 20.76, 1.94, 1.13, and 2.26 times, respectively, and the floating point performance is improved by 1.54, 5.28, 1.13, and 1.94 times, respectively.

Machine learning prediction of timing violation under unknown corners

HUANG Peng-cheng, FENG Chao-chao, MA Chi-yuan,

2024, 46(03): 395-399. doi:

Abstract ( 177 )

PDF (549KB) ( 466 ) 　　

The increase of IC design complexity and the continuous reduction of process feature size bring new severe challenges to static timing analysis (STA) and chip design cycle. In order to improve the efficiency of STA and shorten the chip design cycle, this paper fully considers the FinFET process characteristics and the principle of STA, and predicts the timing characteristics of another part of corners by introducing machine learning methods based on the timing characteristics of some corners. The experiment is based on an industrial design, and the results show that the proposed method uses 5 corners to predict the timing of other 31 corners, which can achieve an average absolute error of less than 2 ps, far better than the 21 process angles required by traditional methods. Thus, the proposed method significantly improves the prediction accuracy and significantly reduces the workload of static time series analysis.

A survey of satisfiability modulo theories

TANG Ao, WANG Xiao-feng, HE Fei

2024, 46(03): 400-415. doi:

Abstract ( 500 )

PDF (1183KB) ( 885 ) 　　

Satisfiability modulo theories (SMT) refers to the decidability problem of first-order logic formulas under specific background theories. SMT based on first-order logic have a stronger expressive capability compared to SAT, with higher abstraction ability to handle more complex issues. SMT solvers find applications in various domains and have become essential engines for formal verification. Currently, SMT is widely used in fields such as artificial intelligence, hardware RTL verification, automated reasoning, and software engineering. Based on recent developments in SMT, this paper first expounds on the fundamental knowledge of SMT and lists common background theories. It then analyzes and summarizes the implementation processes of Eager, Lazy, and DPLL(T) methods, providing further insights into the implementation processes of mainstream solvers Z3, CVC5, and MathSAT5. Subsequently, the paper introduces extension problems of the SMT as #SMT, the SMTlayer approach applied to deep neural networks (DNNs), and quantum SMT solvers. Finally, the paper offers a per spective on the development of SMT and discusses the challenges they face.

A joint optimization strategy for compute offloading and resource allocation in mobile edge computing

LIU Xiang-ju, LI Jin-he, FANG Xian-jin, WANG Yu

2024, 46(03): 416-426. doi:

Abstract ( 282 )

PDF (1534KB) ( 737 ) 　　

In order to minimize the processing latency and energy consumption for user tasks in Mobile Edge Computing (MEC) and enhance user experience, this paper focuses on the computation offloading problem in a multi-user, multi-MEC server scenario under constraints on computational resources. With the objective of minimizing the weighted sum of user completion time and energy consumption, the problem is tackled by first decoupling it into two sub-problems: offloading decision and computation resource allocation. The Whale Optimization Algorithm is employed to solve the offloading decision problem, enhancing convergence speed by introducing a nonlinear convergence factor and inertial weight. A feedback mechanism is introduced to prevent local optima, yielding offloading decisions with higher probability of feasibility. The resource allocation problem is addressed using the Lagrange multiplier method to obtain the optimal computation resource allocation for each offloading decision. Finally, stable converged solutions are obtained through multiple iterations. Simulation results demonstrate that, compared to other benchmark solutions, the proposed approach reduces the system overhead by up to 44.6%.

An anomaly multi-classification model based on capsule network

YANG Yu-jin, WANG Kun, CHEN Zhi-gang, XU Yue, LI Bin

2024, 46(03): 427-439. doi:

Abstract ( 180 )

PDF (2943KB) ( 506 ) 　　

The increasingly large server clusters of state grid corporation generate a large amount of production operation data, and real-time analysis of the massive monitoring data generated by various devices and systems has become a new challenge in power IT operation and maintenance work. As a key technology of intelligent grid information operation and maintenance work, anomaly detection technology can effectively detect operation and maintenance faults and provide timely alarms to avoid damage to sensitive equipment. Currently, some traditional anomaly detection methods have few types of anomalies and low precision, resulting in delayed fault detection. To address this challenge, this article proposes a multi-dimensional time series anomaly detection method based on capsule networks, NNCapsNet. Firstly, the unsupervised algorithm is applied in combination with expert knowledge to preprocess and label the performance monitoring data of grid marketing business application servers. Secondly, the capsule network is introduced for classification and anomaly detection. Experimental results obtained through five-fold cross-validation show that NNCapsNet achieves an average classification accuracy of 91.21% on a dataset containing 15 types of anomalies. At the same time, compared with four benchmark models on the dataset containing 20 000 monitoring data, NNCapsNet achieves good results in key evaluation indicators.

Cache side-channel attack detection combining decision tree and AdaBoost

LI Yang, YIN Da-peng, MA Zi-qiang , YAO Zi-hao, WEI Liang-gen,

2024, 46(03): 440-452. doi:

Abstract ( 179 )

PDF (1520KB) ( 470 ) 　　

Cache side-channel attacks pose a serious threat to the security of various systems, and detecting the attacks can effectively block the attacks. Therefore, an AD detection model based on decision tree and AdaBoost is proposed to quickly and effectively identify cache side-channel attacks by matching system hardware event information features. Firstly, the characteristics of cache side-channel attacks are analyzed, and attack hardware event feature patterns are extracted. Secondly, the decision tree's rapid response capability is utilized, combined with AdaBoost's weighted iterative learning of data samples, to train the model on different load conditions. The model is optimized to improve the overall detection accuracy under different loads. Experimental results show that the detection accuracy of this model under different system load conditions is not less than 98.8%, and it can quickly and accurately detect cache side-channel attacks.

An identity-encryption multi-cloud multi-copy integrity auditing protocol

ZHANG Feng, WEN Bin, YAN Yi-fei, ZENG Zhao-wu, ZHOU Wei,

2024, 46(03): 453-462. doi:

Abstract ( 157 )

PDF (1054KB) ( 499 ) 　　

To solve the problems of existing provable data possession (PDP) protocols only applicable to single cloud storage servers and over-reliance on public key infrastructure, a new identity-based multi-cloud multi-replica PDP protocol is proposed. This protocol adopts identity encryption to simplify certificate management, and designs a double-layer Merkle hash tree as a new secure data structure to maintain the freshness and consistency of multi-replica. Security analysis and experimental results verify the security and efficiency of this protocol, which can achieve multi-replica integrity auditing on multiple cloud storage servers, and significantly outperform the efficiency of comparison algorithms in the three stages of tag generation, evidence generation, and evidence verification.

An interactive separation method for confusable defects in industrial defect classification

LUO Yue-tong, LI Chao, ZHOU Bo, ZHANG Yan-kong

2024, 46(03): 463-470. doi:

Abstract ( 121 )

PDF (1202KB) ( 388 ) 　　

In industrial production, defects are treated differently based on their severity, so it is necessary to classify defects. However, in actual production, the classification accuracy is often insufficient due to the presence of few easily confused defects, which requires conservative treatment of all defects in production practice, resulting in significant human and economic costs. To solve this problem, this paper proposes a method for interactive separating easily confused defects. This method separates few easily confused defects from other defects, ensuring that the classification results of the remaining majority of defects can be directly used. This method selects easily confused defects from the training data as one or more new defect categories, called virtual defects, so that the trained network can distinguish between virtual defects and other defects. This paper designs a visual interface to assist users in interactively selecting easily confused defects to construct virtual categories. CMOS defect data from actual industrial sites are adopted for effectiveness verification, and the results show that the proposed method can quickly classify few confusing defects and ensure that the classification accuracy of remaining defects meets the requirements of industrial applications.

A metal artifact correction algorithm for cone beam CT based on biharmonic equation interpolation

WANG Zhong-hao, XIA Jing, LI Shi-jie, CAI Zhi-ping

2024, 46(03): 471-478. doi:

Abstract ( 188 )

PDF (1233KB) ( 529 ) 　　

In computed tomography (CT), metal implants introduce severe artifacts, leading to degraded image quality and impacting diagnostic value. To correct metal artifacts in cone beam CT, a metal artifact correction algorithm based on the biharmonic equation is proposed. Firstly, the reconstructed image with metal artifacts is filtered using bilateral filtering and segmented using a metal threshold, obtaining metal and non-metal images. Secondly, forward projection is applied to both images, generating metal projection regions and prior projection images. Thirdly, the original projection is normalized using the prior projection image, and the metal regions are repaired using biharmonic equation interpolation, resulting in the repaired projection data. The repaired projection data is then denormalized and reconstructed using the FDK algorithm. Finally, the reconstructed image is fused with the metal image to obtain the final corrected image. To validate the performance of this algorithm, experiments on metal artifact correction were conducted using real acquired data. The results show that compared to commonly used linear interpolation and normalization correction algorithms, the root mean square errors within the region of interest (ROI) are reduced by 22% and 8% respectively. This algorithm effectively suppresses metal artifacts and outperforms commonly used metal artifact removal methods.

Self-supervised few-shot medical image segmentation with multi-attention mechanism

YAO Yuan-yuan, LIU Yu-hang, CHENG Yu-jing, PENG Meng-xiao, ZHENG Wen,

2024, 46(03): 479-487. doi:

Abstract ( 527 )

PDF (1132KB) ( 750 ) 　　

Mainstream fully supervised deep learning segmentation models can achieve good results when trained on abundant labeled data, but the image segmentation in the medical field faces the challenges of high annotation cost and diverse segmentation targets, often lacking sufficient labeled data. The model proposed in this paper incorporates the idea of extracting labels from data through self-supervision, utilizing superpixels to represent image characteristics for image segmentation under conditions of small sample annotation. The introduction of multiple attention mechanisms allows the model to focus more on spatial features of the image. The position attention module and channel attention module aim to fuse multi-scale features within a single image, while the external attention module highlights the connections between different samples. Experiments were conducted on the CHAOS healthy abdominal organ dataset. In the extreme case of the 1-shot, DSC reached 0.76, which is about 3%higher than the baseline result. In addition, this paper explores the significance of few-shot learning by adjusting the number of N-way-K-shot tasks. Under the 7-shot setting, DSC achieves significant improvement, which is within an acceptable range of the segmentation effect based on full supervision based on deep learning.

A texture image classification method based on adaptive texture feature fusion

Lv Fu, HAN Xiao-tian, FENG Yong-an, XIANG Liang

2024, 46(03): 488-498. doi:

Abstract ( 345 )

PDF (1155KB) ( 791 ) 　　

The existing image classification methods based on deep learning generally lack the pertinence of texture features, and have low classification accuracy, which is difficult to be applied to the classification of simple texture and complex texture. A deep learning model based on adaptive texture feature fusion is proposed, which can make classification decisions based on differential texture features between classes. Firstly, the texture feature image is constructed according to the difference between the largest categories of texture features. Secondly, the improved bilinear model is trained in parallel with the original image and the distinctive texture feature image to obtain the dual-channel features. Finally, an adaptive classification module is constructed based on decision fusion, the channel weight is extracted by the average pooling feature map connecting the original image and texture map. The optimal fusion classification result is obtained by fusing the classification vector of two parallel neural network models according to the channel weight. The classification performance of the algorithm was evaluated on four common texture data sets, namely KTH-TIPS, KTH-TIPS-2b,UIUC and DTD, and the accuracy rates are 99.98%, 99.95%, 99.99% and 67.09%, respectively, indicating that the proposed recognition method has generally efficient recognition performance.

Autonomous localization of quadruped robot in woodland environment

XIA Wen-qiang, WANG Shu-han, ZENG Li-zhan, LUO Xin

2024, 46(03): 499-507. doi:

Abstract ( 198 )

PDF (1879KB) ( 547 ) 　　

Woodland is a typical scenario for quadruped robots to operate in the field, where there are many trees and small spacing, which places higher demands on the positioning frequency and accuracy of quadruped robots for rapid navigation. Using leg odometry for positioning can achieve higher update rates, but soft and uneven woodland terrain can cause slippage at the foot ends, resulting in lower accuracy. On the other hand, although woodland environments are rich in features for LiDAR positioning, there are certain matching errors and low update rates, which also make it difficult to meet the requirements of rapid navigation. To address this issue, an autonomous positioning method suitable for woodland environments is proposed, which uses leg odometry to remove LiDAR point cloud distortion and extracts woodland ground and trunk features for matching to improve LiDAR positioning accuracy. Between two LiDAR positioning sessions, median and window filtering are used to fuse interpolated data from the leg odometry to increase the positioning frequency. In woodland experiments, the quadruped robot walked 110 meters with a final deviation of 0.09 m. Under the set route navigation, the final positioning value differs from the expected value by 0.2 m, with a positioning frequency of 500 Hz. The quadruped robot can accurately and successfully complete the navigation task.

Zero-shot cross-lingual event argument role classification with enhanced dependency structure representation

ZHANG Yuan-yang, GONG Zheng-xian, KONG Fang

2024, 46(03): 508-517. doi:

Abstract ( 145 )

PDF (908KB) ( 440 ) 　　

Event argument role classification is a subtask in event extraction, which aims to assign corresponding roles to candidate arguments in the event. Event corpus labeling rules are complicated and labor-intensive, and there is a lack of relevant labeling texts in many languages. Zero-shot cross-lingual event argument role classification can use source-side corpus with rich annotations to build a model, and then apply it directly to a target-side counterpart task where the labeled corpus is scarce. Focusing on the commonalities of the dependency structure of event texts between different languages, this paper further proposes a method that uses the BiGRU network to encode dependency paths connecting trigger words to candidate arguments. The proposed encoder can be flexibly integrated into several mainstream models in a deep learning framework for event argument role classification. The experimental results demonstrate that the proposed method is more effective in completing cross-lingual migration and improving the classification performance of multiple baselines.

Chinese-Urdu neural machine translation interacting POS sequence prediction in Urdu language

CHEN Huan-huan, WANG Jian, Muhammad Naeem Ul Hassan,

2024, 46(03): 518-524. doi:

Abstract ( 237 )

PDF (951KB) ( 414 ) 　　

At present, many research teams have conducted in-depth research on minority language machine translation for South and Southeast Asia. However, as the official language of Pakistan, Urdu has limited data resources and a significant gap from Chinese, resulting in a lack of targeted research on Chinese-Urdu machine translation methods. To address this issue, this paper proposes a Chinese-Urdu neural machine translation model based on Transformer and incorporating Urdu part-of-speech sequence prediction. Firstly, Transformer is used to predict the part-of-speech sequence of the target language Urdu. Then, the translation model’s prediction results are combined with the part-of-speech sequence prediction model's results to jointly predict the final translation, thereby integrating language knowledge into the translation model. Experimental results on a small-scale Chinese-Urdu dataset show that the proposed method has a BLEU score of 0.13 higher than the baseline model on the dataset, achieving significant improvement.

Moving trajectory destination prediction based on long short-term memory network

JIN Guang-yin, ZHAO Xu-jun, GONG Yi-xuan

2024, 46(03): 525-534. doi:

Abstract ( 267 )

PDF (1280KB) ( 683 ) 　　

Destination prediction of moving trajectories is an important part of location-based ser- vices. The existing prediction methods have two problems: one is that the historical trajectory cannot completely cover all possible query traces (data sparse problem), and the other one is that the difference in the influence of prefix trajectory points on the prediction results is not taken into account (long-term dependence problem). As a result, a trajectory distributed representation method is proposed. Firstly, the trajectory sequence is divided into grids, and the high-dimensional one-hot code vectors representing the location is reduced to generate low-dimensional embedding vectors which contain geographical topo- logical relationships; Secondly, the destinations are clustered, and the cluster centers are used as the labels of the trajectories in the cluster, which reduces the difference of similar trajectories, highlights the characteristics of dissimilar trajectories, and effectively overcomes the problem of data sparseness. In the destination prediction, the self-attention mechanism is introduced into the LSTM network, and a destination prediction model(SATN-LSTM) based on the LSTM network are proposed. Mining the key points from the sequence and assigning weights according to their importance, which solves the long-term dependency problem better. Finally, several experiments are carried out on the real trajectory datasets to verify the effectiveness of our model. Compared with the existing models, it is verified that our model gets higher accuracy.

A reliable response representation enhanced knowledge tracing method

ZHAO Yan, MA Hui-fang, WANG Wen-tao, TONG Hai-bin, HE Xiang-chun

2024, 46(03): 535-544. doi:

Abstract ( 167 )

PDF (1411KB) ( 466 ) 　　

Knowledge Tracing (KT) is a key task in educational data mining, aiming at modeling students changing knowledge states over time to infer students proficiency on concepts. However, most of existing knowledge tracing methods ignore the reliability and high-dimensional sparsity of the student-concept space based on the student-exercise-concept relationship, and do not combine the students response to the exercise to generate a reliable response representation. To address the above issues, a reliable response representation enhanced knowledge tracing method is proposed. Specifically, firstly, the student-exercise space is divided into fine-grained student-exercise spaces based on the student’s response records, and different student-concept spaces are obtained based on the exercise- concept space; secondly, the reliability of the student-concept space is obtained from both the relative reliability and absolute reliability of the student-concept space, and a reliable and low-dimensional student-concept space is obtained using dimensionality reduction methods; thirdly, the reliable response representation of the exercise is obtained by combining the students response to the exercise and the exercise representation method under two response conditions; finally, the students knowledge state at different timesteps is evaluated using a long short-term memory network and the obtained reliable response representation. Experimental results on four real datasets demonstrate the effectiveness and interpretability of the proposed method.

Rumor source localization based on deep learning for node representation

LIU Wei, YANG Jie, LUO Jia-li, WANG Sai-wei, CHEN Ling

2024, 46(03): 545-559. doi:

Abstract ( 260 )

PDF (2419KB) ( 530 ) 　　

With the popularity of the Internet, information on the web is spreading to the public at an astonishing speed. However, false information and rumors are also rapidly spreading due to the cascade effect, causing great harm to society. Finding the source of rumor spread on social networks plays a crucial role in suppressing the spread of rumors. Most of the traditional rumor source localization methods fail to integrate multi-source features and the accuracy of localization still needs to be further improved. Therefore, this paper proposes a deep learning-based rumor source localization method that identifies the rumor source based on multi-source features observed in nodes affected by rumors. This method first obtains the influence vectors of nodes based on the similarity of influence between the node and observed nodes. Then, it uses autoencoder networks to encode the influence vectors of nodes, obtaining new embedding representations that contain node information, diffusion paths, and propagation time information. Finally, it calculates the probability of nodes being the source based on their new influence vectors to locate the rumor source. Experimental results on two simulated datasets and four real datasets show that compared with other methods, the proposed method can locate the rumor source at a faster speed and improve the accuracy of rumor source localization by more than 25%.

Sequential recommendation based on dual-channel light graph convolution

LUO Xu, WANG Hai-tao, HE Jian-feng

2024, 46(03): 560-570. doi:

Abstract ( 210 )

PDF (1126KB) ( 481 ) 　　

The traditional sequential recommendation algorithm based on graph neural network ignores the transformation relationship of items in other user sequences during the graph construction stage. To solve this problem, a sequential recommendation algorithm based on dual-channel light graph convolution is proposed. Firstly, the neighbor user sequence is found for the target user, and the target user sequence and the obtained neighbor sequence are combined into a directed sequence graph, which makes full use of the potential collaborative information between users. Then, the information of the two sequences is propagated through dual-channel light graph convolution. Each channel combines the information of each layer in the form of exponential denominator, and the embedding obtained from the two channels is fused to generate the final item embedding. Finally, the short-term preference is extracted by averaging the last several item embedding, and the long-term preference is extracted by introducing the multi-head self-attention mechanism of squeeze-and-excitation networks, and the final preference of users is obtained by integrating the long-term and short-term preferences. Extensive experiments on two public datasets, Beauty and MovieLens-20M, demonstrate the effectiveness of the proposed algorithm.

Current Issue

Author center

Review center

Online journal