Computer Engineering & Science

Truthful auction mechanisms for multi-resource allocation based on NUMA architecture of cloud computing

XU Jia, ZHANG Ji-xian, WANG Zhe-min, LIU Lin-jie

2024, 46(05): 761-775. doi:

Abstract ( 195 )

PDF (1902KB) ( 540 ) 　　

As the internet continues to evolve, technologies such as cloud computing and virtualization are widely deployed. Designing truthful auction mechanisms for cloud service providers to maximize social welfare through virtual resource allocation is one of the current research priorities in the field of cloud computing. Meanwhile, as server scales expand, many mainstream data center servers are transitioning to a Non-Uniform Memory Access (NUMA) architecture. Its primary feature is that each server can consist of multiple computing nodes, each node containing several processors and storage units, which can function as independent computing units or communicate with each other. However, current research primarily focuses on traditional Uniform Memory Access architectures and cannot adapt to the application scenarios of NUMA architectures. Therefore, a multi-resource truthful auction mechanism applicable to NUMA architectures is proposed, which allocates various resources in the form of virtual machines. Specifically, in resource allocation issues, a monotonic heuristic algorithm is proposed, considering deployment constraints and user request density advantages under this architecture, effectively enhancing social welfare. In terms of price payment issues, a binary method is used to design a price payment algorithm that conforms to the key price theory, thereby ensuring the mechanisms truthful features. Through experimental testing, the overall performance of this mechanism in social welfare, user payments, resource utilization, etc., achieves approximately 96% of the optimal solution.

EMRI-Tree: A hierarchical data structure for multi-resolution visualization

ZHONG Quan, CHEN Zhi-guang, GAO Lan-guang

2024, 46(05): 776-784. doi:

Abstract ( 107 )

PDF (1110KB) ( 466 ) 　　

Visualizing large-scale scientific data requires high data transmission bandwidth and a large amount of memory. Efficient processing of visualization data poses a significant challenge. To improve the efficiency of scientific visualization, the most common and direct method is to reduce the amount of data that needs to be processed. A novel visualization scheme for large-scale volumes of data is proposed by designing a new data structure called EMRI-Tree as well as a flexible rendering workflow. The characteristics of our scheme can be summarized as follows. Firstly, the proposed EMRI-Tree supports memory-efficient data queries and ROI-data (ROI, region of interest) fetching on large 3D models, thus reducing the memory footprint significantly. Secondly, data blocks at different resolution levels in the EMRI-Tree are stored in a key-value (KV) storage system with variable-length indices, which improves the scalability of storage and the concurrency of reading. Lastly, a prefetching scheme is proposed, which supports progressive rendering based on ray marching to render a more accurate model as interact. By combining the above optimizations, the proposed scheme facilitates the visualization of large volumes of high-resolution data with limited memory overhead. Evaluating the approach by using 80 GB of synthetic data in 10 simulated read tests. The experimental results demonstrate that the scheme has the characteristics of 2 000+ QPS (queries per second) and linear growth in memory consumption, making it a robust and memory-efficient solution.

A multistage dynamic branch predictor based on Hummingbird E203

WEI Yi, YANG Zhi-jie, TIE Jun-bo, SHI Wei, ZHOU Li, WANG Yao, WANG Lei, XU Wei-xia

2024, 46(05): 785-793. doi:

Abstract ( 319 )

PDF (1390KB) ( 601 ) 　　

In recent years, open-source RISC-V microprocessors represented by Hummingbird E203 have received widespread attention and application in both academia and industry due to their low power consumption and good performance. To improve the performance of microprocessors and reduce pipeline stalls caused by branch instructions, branch prediction technology has become one of the important techniques widely used in modern microprocessors. However, the branch predictor currently used in the Hummingbird E203 is a lightweight static branch predictor, facing the challenge of low branch prediction accuracy. Since using a dynamic branch predictor with higher prediction accuracy can further reduce the overhead caused by mispredictions leading to redirecting fetching, various implementations of dynamic branch predictors have been explored based on the original microarchitecture to improve branch prediction accuracy while considering resource overhead. Experimental results show that among various dynamic branch predictors, the one achieving the best results is the adaptive dynamic branch predictor combining static branch prediction with Branch History Register (BHR). On the Dhrystone benchmark program, its branch prediction accuracy can be increased from the original 84.6% to 94.8%, and the score from 1.296 463 to 1.314 418. On the Coremark benchmark program, its branch prediction accuracy can be increased from the original 67% to 78.7%, and the score from 2.120 000 to 2.138 008.

Preparation and high-precision assembly technique of CCGA devices

LI Liu-hui, WANG Liang, YANG Chun-yan, CHEN Yi-long, CHEN Peng

2024, 46(05): 794-800. doi:

Abstract ( 177 )

PDF (1671KB) ( 566 ) 　　

CCGA devices have been widely used in many high-reliability fields due to the excellent ability of absorbing mismatch of thermal expansion. In order to resolve the problems including the preparation and high-precision assembly of CCGA devices, this paper studies the preparation of ceramic shells, design of high-precision column planting tools, depositing of solder paste, and column planting of CCGA devices. By using high temperature cofired ceramic technique, two types of CCGA ceramic shells with daisy chains are prepared successfully, which can be used to design complex electronic circuits flexibly. High-precision column planting tools with 1.6 mm height and 0.54 mm diameter holes are designed and prepared. Solder printing and solder paste jetting methods are compared, and the later one is adopted to obtain more accuracy paste volume. The relative deviations of paste volumes are less than 10%. Based on these conditions, high-precision column planting CCGA devices, with column tilt<1°, coplanarity<0.1 mm, and tolerance of position better than ±0.02 mm, are then obtained. This method can effectively improve the column planting accuracy and solder column symmetry of CCGA devices, and help to ensure the accuracy and reliability of subsequent board-level soldering.

Weakly-supervised IDS with abnormal-preserving transformation learning

TAN Yu-song, WANG Wei, JIAN Song-lei, YI Chao-xiong

2024, 46(05): 801-809. doi:

Abstract ( 170 )

PDF (1556KB) ( 438 ) 　　

Network intrusion detection systems are crucial for maintaining network security, and there is currently limited research on intrusion detection scenarios with only a few abnormal markers of network data. This paper designs a weakly-supervised learning intrusion detection model, called WIDS-APL, based on the anomaly retention of data. The detection model consists of four parts: data transformation layer, representation learning layer, transformation classification layer, and anomaly discrimination layer. By using a set of learnable encoders to map samples to different regions and compress them into a hypersphere, the label information of abnormal samples is used to learn the classification boundaries of normal and abnormal samples, and the abnormal score of the samples is obtained. Testing the WIDS-APL system on four datasets demonstrates the effectiveness and robustness of the system, with improvements in the AUC-ROC values of 4.80%, 5.96%, 1.58%, and 1.73% respectively compared to other mainstream methods. Furthermore, there are enhancements of 15.03%, 2.95%, 4.71%, and 9.23% in AUC-PR performance.

Push after delay:A delayed push synchronization strategy for low-power mobile end-to-end systems

ZHAO Yue, ZHOU Tong-qing, ZENG Hui, CAI Zhi-ping, XIAO Nong

2024, 46(05): 810-817. doi:

Abstract ( 106 )

PDF (684KB) ( 415 ) 　　

The rapid development of Internet technology has enabled mobile devices to play a more significant role in peoples lives. A user may possess multiple devices to meet various needs such as office work, socializing, and entertainment. In practical applications, multiple devices owned by the same user also face many data synchronization requirements to support distributed applications across devices (e.g., cross-device video playback continuation). However, there is currently limited research on end-to-end data synchronization across multiple devices. Therefore, this paper proposes a data push strategy, called Push After Delay (PAD), suitable for application synchronization between multiple mobile terminals of users. The strategy adopts a differentiated approach to active and sleeping devices and flexibly delays the transmission to sleeping devices. The delay decision is based on the dynamic adaptive adjustment method of AIMD (Additive Increase Multiplicative Decrease) and the scheduling enhancement mechanism oriented to application access frequency, which adaptively pushes synchronization according to user usage habits. Experimental results show that compared with the undifferentiated push synchronization scheme, the PAD push strategy can significantly reduce the number of wake-ups of data synchronization on mobile devices while ensuring a low data access error rate and achieving a balance between consistency and synchronization overhead.

SRv4: Design and implementation of segment routing data plane for IPv4

YUAN Yu-lei, ZHAO Bao-kang, Lv Gao-feng

2024, 46(05): 818-825. doi:

Abstract ( 329 )

PDF (1046KB) ( 482 ) 　　

Aiming to address the drawbacks in SRv6 technology, such as low network carrying efficiency caused by long packet header and inability to deploy in IPv4 networks, a segment routing technology SRv4 for IPv4 networks is proposed. The format of packets header in SRv4, the format of packet encapsulation of SRv4 in IPv4, the processing instructions and the forwarding process of SRv4 packets are designed. SRv4 is compatible with the IPv4 protocol and existing IPv4 network devices, and can be incrementally deployed in IPv4 networks. Compared to SRv6, SRv4 reduces the SID length by 75%, resulting in higher network carrying efficiency. The SRv4 module is developed in the Linux kernel using XDP technology to verify the feasibility of the SRv4. The function, performance and stability of the SRv4 module were tested using a network environment built in a server computer. The results show that the SRv4 module has the correct segment routing function and can run stably.

An anomaly detection model of time series based on dual attention and deep autoencoder

YIN Chun-yong, ZHAO Feng

2024, 46(05): 826-835. doi:

Abstract ( 337 )

PDF (1203KB) ( 680 ) 　　

Currently, time series data often exhibit weak periodicity and highly complex correlation features, making it challenging for traditional time series anomaly detection methods to detect such anomalies. To address this issue, a novel unsupervised time series anomaly detection model (DA-CBG-AE) is proposed. Firstly, a novel sliding window approach is used to set the window size for time series periodicity. Secondly, convolutional neural networks are employed to extract high-dimensional spatial features from the time series. Then, a bidirectional gated recurrent unit network with stacked Dropout is proposed as the basic architecture of the autoencoder to capture the correlation features of the time series. Finally, a dual-layer attention mechanism is introduced to further extract features and select more critical time series, thereby improving the accuracy of anomaly detection. To validate the effectiveness of the model, DA-CBG-AE is compared with six benchmark models on eight datasets. The experimental results show that DA-CBG-AE achieves the optimal F1 value (0.863) and outperforms the latest benchmark model Tad-GAN by 25.25% in terms of detection performance.

Multi-target domain facial expression recognition based on class-wise feature constraint

FAN Qi, WANG Shan-min, LIU Cheng-guang, LIU Qing-shan

2024, 46(05): 836-845. doi:

Abstract ( 167 )

PDF (825KB) ( 409 ) 　　

Facial Expression Recognition (FER) is usually affected by the collected environment, regions, race, and other factors. In order to improve the generalization of FER methods, Unsupervised Domain Adaption Facial Expression Recognition (UDA-FER) algorithms have attracted more and more attentions. Existing UDA-FER algorithms generally suffer from two issues: (1) they care more about the performance in the target domain, resulting in a sharp drop in the performance of the source domain after transferring from the source to the target domain; (2) They are just appropriate for the case of the single target domain. The UDA-FER methods will show terrible performance when applying it to multiple target domains directly. To solve the above issues, a Multi-Target Domain Facial Expression Recognition method based on class-wise feature constraint (MTD-FER) is proposed, which supports the FER methods transferring to multiple target domains in succession and ensures the methods retains a better recognition rate on each domain. To this end, MTD-FER designs the Class-Adaptive Pseudo Label methods (CAPL) and Class-Wise Feature Constraint mothods(CWFC), which learn pseudo labels for samples with high quality in target domains and align each class of features from disparate domains, so as to alleviate the issue of catastrophic forgetting resulting from domain transferring. Through extensive experiments using RAF-DB as the source domain and FER-2013 and ExpW as the target domains, the effectiveness of the MTD-FER algorithm is demonstrated. Experimental results show that, compared with the baseline method, MTD-FER improves the performance in the source domain by 6.36%, which is on par with the methods before transferring to target domains, and improves the performance by 27.33% and 3.03% in two target domains, respectively.

ELPVO: A ultra-low power visual odometry based on I/O optimization

ZHAO Qian-he, WANG Rui,

2024, 46(05): 846-851. doi:

Abstract ( 196 )

PDF (1127KB) ( 325 ) 　　

Visual odometry endows robots with the ability of autonomous positioning and building environmental maps, and is widely used in various unmanned devices. Visual odometry involves a large amount of image processing and calculation, but most of its deployment platforms only have extremely limited computational resources, limiting its application scope. In response to the I/O bottleneck of existing low-power visual odometry, this paper proposes a high-speed low-power visual odometry, named ELPVO, based on RGB-D cameras for the STM32F7 embedded platform. ELPVO fully considers the hardware resources of the STM32F7 platform, improves the processor utilization efficiency through DMA transmission, and further enhances the processing speed without changing the algorithm accuracy. On the STM32F767 embedded platform equipped with a 216 MHz ARM Cortex-M7 processor, with the TUM RGB-D dataset as the testing benchmark, ELPVO can achieve a processing speed of 26 frames per second for images with a resolution of 320×240, with an overall run speed improved by 84% and a run power consumption maintained at 0.7 watts.

A multi-person pose estimation correction algorithm based on improved YOLOv5

ZHAO Jin-yuan, JIA Di

2024, 46(05): 852-860. doi:

Abstract ( 223 )

PDF (2782KB) ( 463 ) 　　

Since the multi-person pose estimation in crowded scenes is still affected by the problems of small detection objects, resulting in low accuracy of pose estimation, this paper proposes a multi- person pose estimation correction algorithm based on improved YOLOv5. Firstly, in the backbone network of YOLOv5, a jump attention module is integrated to help the network find the region of interest in the image. Secondly, in the neck network, the weighted bidirectional feature pyramid is used to improve the feature fusion ability between feature maps of different scales, and the jump attention module and Transformer encoder are used jointly to enable the network to obtain global information and rich context information. Thirdly, a detection head is added to the detection part to make the network more sensitive to tiny objects. Finally, the key point object information obtained by network prediction is used to modify the attitude object information to obtain the final multi-person pose estimation result. Experimental results show that the proposed algorithm improves YOLOv5s AP50 by 2.2% and AP75 by 3.3% on the COCO dataset, validating the accuracy and robustness of this algorithm.

A single image reflection removal cascaded algorithm using non-local correlation and contrast constraint

LUO Chao, MIAO Jun, ZHENG Yi-lin, HUA Feng, Chu Jun

2024, 46(05): 861-871. doi:

Abstract ( 147 )

PDF (1671KB) ( 420 ) 　　

Reflection in the image not only significantly reduces the image quality, but also seriously affects the subsequent computer vision tasks. So proposed a single image reflection removal cascaded algorithm using non local correlation and contrast constraint. This algorithm utilizes a dual-branch approach for LSTM-based information propagation across cascades. It employs reflection and background features to complement each other and iteratively refine prediction accuracy, ensuring mutual enhancement of the two branches' prediction results. To facilitate training for multiple cascade steps, a positive-negative contrastive regularization loss is introduced. This loss treats background images and original images features as positive and negative samples, respectively. This ensures that the target image is brought closer to the background image while moving away from the original image in the representation space, narrowing the prediction range and effectively alleviating the ill-posed problem. Additionally, an efficient, low-computational-cost non-local correlation prediction module is proposed, capable of capturing contextual information for all pixels along cross paths. Through further cascade operations, each pixel captures long-distance dependencies across the entire image, enabling the use of surrounding point information to predict background information obscured by strong reflections. Experimental results demonstrate that, compared to current algorithms, the proposed algorithm achieves superior results and exhibits robust performance.

Point cloud classification and segmentation based on adaptive graph convolution and attention pooling

LIU Yu-zhen, ZHANG Dong-xia, TAO Zhi-yong

2024, 46(05): 872-880. doi:

Abstract ( 175 )

PDF (809KB) ( 477 ) 　　

In response to the limitation of existing point cloud classification and segmentation methods that use max pooling to aggregate local neighborhood features, which leads to the loss of important information beyond the maximum value, this paper proposes a point cloud classification and segmentation network that combines Adaptive Graph Convolution (AGConv) and Attention Pooling (AP). Firstly, a local graph structure of the point cloud is constructed using K-nearest neighbors algorithm, and adaptive convolution kernels are generated based on the features of the points, enabling flexible and accurate capturing of local neighborhood features. Secondly, to effectively enhance feature aggregation, attention pooling is utilized to define an energy function and obtain weight values, which are used to weight and aggregate more representative local features of the point cloud. Finally, adaptive graph convolution and attention pooling are stacked to extract global features layer by layer, thereby improving the accuracy of classification and segmentation. Experimental results demonstrate that compared with the benchmark network, the average class accuracy of point cloud classification is improved by 0.9%, and the average intersection over union of part segmentation and semantic segmentation is improved by 0.8% and 0.3% respectively. This demonstrates that the algorithm can effectively improve the accuracy of point cloud classification and segmentation, and has high robustness.

Research and application of whale optimization algorithm

WANG Ying-chao

2024, 46(05): 881-896. doi:

Abstract ( 364 )

PDF (901KB) ( 1547 ) 　　

The Whale Optimization Algorithm (WOA) is a novel swarm intelligence optimization algorithm that converges based on probability. It features simple and easily implementable algorithm principles, a small number of easily adjustable parameters, and a balance between global and local search control. This paper systematically analyzes the basic principles of WOA and factors influencing algorithm performance. It focuses on discussing the advantages and limitations of existing algorithm improvement strategies and hybrid strategies. Additionally, the paper elaborates on the applications and developments of WOA in support vector machines, artificial neural networks, combinatorial optimization, complex function optimization, and other areas. Finally, considering the characteristics of WOA and its research achievements in applications, the paper provides a prospective outlook on the research and development directions of WOA.

A backtracking algorithm with reduction for the threshold-minimum dominating set problem

CHU Xu, NING Ai-bing, HU Kai-yuan, DAI Su-yu, ZHANG Hui-zhen

2024, 46(05): 897-906. doi:

Abstract ( 150 )

PDF (820KB) ( 404 ) 　　

The threshold-minimum dominating set problem in graph theory is a NP-hard problem in combinatorial optimization, which is essentially an extension of the minimum dominant set problem. This paper studies the threshold-minimum dominating set problem for the given undirected graph G=(V, E) and threshold value r. Firstly, some mathematical properties and corresponding proof are studied, and these properties can be used to reduce the size of the given problem, thereby reducing the difficulty of the problem solving. Secondly, the upper and lower bound sub-algorithms and the reduced order sub-algorithm are designed. Based on these sub-algorithms, a backtracking algorithm with reduction(BAR) is proposed, which can reduce the problem size and get the optimal solution. Finally, an example analysis and some random examples verifies that the algorithm can effectively reduce the difficulty of problem-solving.

Drug-drug interaction prediction based on neighborhood relation-aware graph neural network

LEI Zhi-chao, JIANG Jia-jun, MA Chi-zhuo, ZHOU Wen-jing.WANG Chu-zheng

2024, 46(05): 907-915. doi:

Abstract ( 216 )

PDF (1026KB) ( 467 ) 　　

Research on drug-drug interaction (DDI) is conducive to clinical medication and new drug development. Existing research technologies do not fully consider the topological structure of drug entities and other entities such as drugs, targets, and genes in the drug knowledge graph, as well as the semantic importance of different relationships between entities. To solve these problems, this paper proposes a model based on neighborhood relation-aware graph neural network (NRAGNN) to predict DDI. Firstly, the graph attention network is utilized to learn the weights and feature representations of diffe- rent relationship edges, which enhances the semantic features of drug entities. Secondly, neighborhood representations for different layers around the drug entity are generated to capture the topological structure features of drug entities. Finally, the drug-drug interaction score is obtained by element-wise multiplication of the two drug feature representation vectors. Experimental results show that the proposed NRAGNN model achieves 0.899 4, 0.944 4, 0.956 7, and 0.902 3 in ACC, AUPR, AUC-ROC, and F1 indicators on the KEGG-DRUG dataset, respectively, outperforming other current models.

Entity relation extraction based on prejudgment and multi-round classification for span

TONG Yuan, YAO Nian-min

2024, 46(05): 916-928. doi:

Abstract ( 164 )

PDF (1643KB) ( 381 ) 　　

Aiming at entity recognition and relation extraction tasks in natural language processing, a model named Smrc is proposed, which makes predictions at the token sequence (span) level. The model uses BERT pre-training model as an encoder and include three modules: entity pre-judgment (Pej), entity multi-round classification (Emr) and relation multi-round classification (Rmr). The Smrc model performs entity recognition through the preliminary judgment of the Pej module and the multi-round entity classification of the Emr module, and then uses the Rmr module’s multi-round relation classification to determine the relationships between entities, thus completing the relation extraction task. On the experimental datasets of CoNLL04, SciERC, and ADE, the F1 values of entity recognition reach 89.67%, 70.62%, and 89.56%, respectively, and the F1 values of relation extraction reach 73.11%, 51.03%, and 79.89%, respectively. Compared with the previous best model Spert on the three datasets, the Smrc model achieves improvements of 0.73%, 0.29%, and 0.61% in entity recognition and 1.64%, 0.19%, and 1.05% in relation extraction through entity pre-judgment and multi-round classification of entities and relations, which demonstrates the effectiveness and advantages of the model.

A context-aware feature representation method in fine-grained entity typing

LIU Pan, GUO Yan-ming, LEI Jun, WANG Hao-ran, LAO Song-yang, LI Guo-hui

2024, 46(05): 929-936. doi:

Abstract ( 173 )

PDF (770KB) ( 543 ) 　　

Fine-grained entity typing assigns fine-grained types to entities in the text, which can provide entities with rich semantic information through type information, and plays important roles in downstream tasks such as relation extraction, entity linking, and question answering systems. Since the length and position of entities in sentences are not uniform, the representation of entities in context can not be calculated. Existing fine-grained entity typing models process entity mentions and their contexts separately into individual feature representations, which separates the semantic relationship between them. This paper proposes a context-aware feature representation method in fine-grained entity typing, which places entities back into their contexts and solves the problem of computing entity feature representation when the entity length and position are not uniform. Experimental results demonstrate that this method can extract the feature representation of entities in their contexts, and significantly improve the performance of fine-grained entity typing. The Macro-F1 value of this method on the Chinese fine-grained entity classification dataset CFET is improved by more than 10%.

An unsupervised phoneme segmentation method for Lao language with multi-feature interaction fusion

LI Xin-jie, WANG Wen-jun, DONG Ling, LAI Hua, YU Zheng-tao, GAO Sheng-xiang,

2024, 46(05): 937-944. doi:

Abstract ( 154 )

PDF (1366KB) ( 373 ) 　　

Aiming at the inaccurate phoneme segmentation problem caused by the lack of consideration of Lao language tone changes and audio diversity in existing methods, this paper proposes an unsupervised phoneme segmentation method for Lao language with multi-feature interaction fusion. Firstly, self-supervised features, spectral features and pitch features are independently coded to avoid the insufficiency of a single feature. Secondly, multiple independent features are gradually fused based on the attention mechanism, so that the model can more comprehensively capture the information of Lao language tone changes and phoneme boundaries. Finally, a learnable framework is adopted to optimize the phoneme segmentation model. The experimental results show that the proposed method improves the R-value by 27.88% on the Lao phoneme segmentation task compared with the baseline methods.

Distantly supervised relation extraction based on entity knowledge

MA Chang-lin, SUN Zhuang

2024, 46(05): 945-950. doi:

Abstract ( 121 )

PDF (681KB) ( 357 ) 　　

To reduce the noise of labeled data in the distantly supervised relationship extraction, a distant supervision relationship extraction model integrating entity description and self-attention mechanism is proposed. Based on multi-instance learning, the comprehensive impacts of entity knowledge and position relation are considered, and the splicing vector of word, entity, entity description and relative position are adopted as the model input. A piecewise convolutional neural network is employed as the sentence encoder, which combines with the improved structured self-attention mechanism to capture the internal correlation of features. The difference vector between tail entity and head entity is constructed as the supervision information of attention mechanism to assign weight to each sentence. Experimental results on New York Times dataset show that the model performance indexes of the model reach the maximum values when compared to state-of-the-art models.

Current Issue

Author center

Review center

Online journal