Loading...
中国计算机学会会刊
中国科技核心期刊
中文核心期刊
Home
Journal honors
Guide
Instructions for Authors
Layout standards
Submission Template
Review process
frequently asked questions
Open access
Submission
Publishing ethics
Subscription
Download
Contact us
中文
Current Issue
2026, No. 3 Published:25 March 2026
Last issue
Columns in this issue:
High Performance Computing
Graphics and Images
High Performance Computing
Design and implementation of an event-based
Monte Carlo particle transport algorithm
LI Tiejun, ZHANG Jianmin, LI Yuhan, YANG Bo
2026, 48(3): 381-388. doi:
Abstract
(
62
)
PDF
(846KB) (
34
)
Monte Carlo particle transport program is a typical high-performance computing (HPC) application. There are two parallel methods for MC particle transport programs: history-based method and event-based method. Proxy programs serve as a crucial foundation for optimizing the performance of large-scale practical programs tailored to specific architectures, and the implementation of an event-based MC proxy program is of great importance for researching performance optimization for many-core architectures. However, there are no publicly available event-based proxy programs. Based on the open source project OpenMC, an event-based Monte Carlo particle transport algorithm is designed, and then a new event-based MC proxy program is implemented. Experimental results show that this proxy program can effectively simulate the branching, memory access, and computational characteristics of OpenMC, with its code size being less than 5% of that of OpenMC. Moreover, its runtime is merely 7.5% of OpenMC’s, providing an efficient and user-friendly platform for optimization research based on event-based algorithms.
A GPU-sharing-based scheduling framework for
accelerating deep learning training tasks
LIN Chenxi, LI Jialun, MO Xuan, ZHOU Jieying, WU Weigang
2026, 48(3): 389-397. doi:
Abstract
(
39
)
PDF
(2247KB) (
21
)
Deep learning (DL) is increasingly being applied across a wide range of business scenarios. How to efficiently utilize resources in GPU clusters for training DL tasks and reduce task completion times has garnered sustained attention from both industry and academia. A single DL training task often fails to fully leverage all the computational resources of a GPU, and the exclusive GPU allocation by traditional schedulers leads to low resource utilization. This paper proposes a GPU-sharing-based task scheduling framework, G-Share, which allows multiple DL tasks to be trained on the same GPU simultaneously, enabling co-location scheduling. Task scheduling and resource allocation are performed while being aware of the interference between co-located tasks, aiming to enhance GPU utilization and thereby accelerate task execution. Specifically, it first characterizes the mutual interference information between tasks through offline modeling and online updates, and models the GPU-sharing-based scheduling problem as a weighted bipartite graph minimum matching problem. By solving this problem, resource allocation results are obtained, and a dynamic task scheduling mechanism combined with time-slicing is employed to perceive changes in the optimal co-location combinations of tasks in online scenarios. Experiments conducted on the DL task workload data from SenseTime demonstrates that G-Share achieves a 20.6% reduction in the average task completion times compared to benchmark methods.
A computing offloading and resource allocation
strategy under edge-cloud collaboration
ZHANG Wenzhu, SHI Yakun, GAO Dumei
2026, 48(3): 398-410. doi:
Abstract
(
45
)
PDF
(2445KB) (
27
)
In the Internet of Vehicles (IoV), the limited computational capabilities of vehicles, the dynamic computational resources of edge servers, and the remote deployment locations of cloud servers pose significant challenges in designing computation offloading and resource allocation schemes. This paper proposes a deep reinforcement learning-based joint computation offloading and resource allocation algorithm, aiming at minimizing the weighted sum of latency and energy consumption for processing computational tasks. Specifically, to enable collaborative processing of computational tasks between edge and cloud servers, a software-defined networking (SDN) based edge-cloud collaborative network architecture is first designed, along with a metric for task priority. Subsequently, computational models for cloud-edge-end device tasks are established separately. Then, an objective function is designed to optimize system latency and energy consumption, which is transformed into a system utility function. Finally, a deep reinforcement learning algorithm is utilized to determine computation offloading and resource allocation strategies based on the system utility. Experimental results demonstrate that, compared to existing algorithms, the proposed algorithm significantly outperforms benchmark algorithms in reducing system latency and energy consumption, as well as improving the success rate of task computation.
A single-layer rectilinear obstacle-avoiding Steiner
minimal tree generation algorithm based on net partitioning
WEN Hao, LI Zhensong
2026, 48(3): 411-421. doi:
Abstract
(
52
)
PDF
(1090KB) (
14
)
In the routing phase of very large-scale integrated circuits (VLSI), the rapid and efficient creation of a rectilinear obstacle-avoiding Steiner minimal tree (ROASMT) is crucial for successful routing. Therefore, this paper proposes a single-layer rectilinear obstacle-avoiding Steiner minimal tree generation algorithm based on net partitioning, which combines partitioning and legalization techniques. By dividing the scanning point regions, an obstacle-avoiding spanning graph (OASG) is generated. Obstacle-avoiding spanning trees are then selected from the OASG and transformed into pin spanning trees (PSTs), thereby partitioning the original net into multiple sub-nets. Subsequently, the rectilinear Steiner minimal tree (RSMT) algorithm is applied to create rectilinear minimal Steiner trees for each obstacle-free sub-net, which are then legalized to obtain valid initial solutions. Additionally, this paper introduces a global optimization method based on “multi-segment edges” and a local optimization method based on “V-like structures”. Algorithm validation results show that the single-layer rectilinear obstacle-avoiding Steiner minimal tree (SL-ROASMT) algorithm based on net partitioning reduces the total wire length by an average of approximately 3.6% compared to graph-based and edge-based algorithms, with all test cases completing routing within 1 second.
Highly scalable three-dimensional marine controlled-source
electromagnetics numerical simulation using
high-order time-domain finite-difference method
PENG Hua, WU Zhiqiang, XIAO Tiaojie, LI Shijie, GONG Chunye, YANG Bo, WANG Haodong, CHEN Xingyou
2026, 48(3): 422-433. doi:
Abstract
(
31
)
PDF
(2190KB) (
13
)
Marine controlled-source electromagnetics (MCSEM) is widely applied in fields such as electromagnetic detection of underwater targets, marine electromagnetic communication, and exploration of marine oil and gas resources. However, current MCSEM numerical simulations face challenges such as insufficient computational accuracy, low parallel communication efficiency, and limited scalability, making it difficult to meet the computational demands of large-scale, three-dimensional complex models. To address these issues, a multi-level parallel numerical simulation algorithm based on the fourth-order finite-difference time-domain (FDTD) method is designed and implemented in this paper. This algorithm employs parallel computation across transmitter sources and parallel solution strategies for sub-regions, fully exploiting parallel granularity. Additionally, it effectively reduces communication overhead through remote memory access technology, significantly enhancing parallel efficiency. The correctness and efficiency of the algorithm are then validated through multiple typical case studies. The results demonstrate that, for a deep-sea model without considering the air layer, with 8 transmitter sources, a regional scale of 20 km × 20 km × 12 km, and a grid size of 245 × 245 × 512, the computational time is reduced from 57.05 hours in serial computation to 72.96 seconds when using 8 process groups with a total of 2 048 processes, achieving a super-linear speedup of 2 815.04 and a parallel efficiency of 137.45%. For a shallow-sea model considering the air layer, with 8 process groups and a total of 256 processes, the computational time is reduced from 64.78 hours in serial computation to 59.75 minutes, achieving a speedup of 65.05 and a parallel efficiency of 25.41%. This algorithm exhibits good scalability and computational accuracy, providing an efficient solution for marine electromagnetic numerical simulations.
Graphics and Images
An improved YOLOv8-based model
for crop and pigweed detection:MES-YOLO
WEN Tao, WANG Tianyi, HUANG Shirui, ZHOU Jianglong
2026, 48(3): 434-443. doi:
Abstract
(
45
)
PDF
(1436KB) (
15
)
With the rapid development of modern agricultural technology, the precise management of crops and the effective control of weeds have become particularly important. Aiming at pigweed, a common weed that affects crop growth, an improved lightweight crop and pigweed detection algorithm based on YOLOv8, called MES-YOLO, is proposed. Firstly, MS-Block module and C2f module are fused and applied to the backbone network of the model by heterogeneous convolution, so as to improve the accuracy and efficiency of the overall target detection. Secondly, the feature pyramid structure HSFPN is improved to ELA-HSFPN and applied to the feature fusion network of the model to enhance the ability of the model to express the target features. Finally, the Inner-SIoU loss function is used to accelerate the convergence of the model. Experimental results demonstrate that, compared to YOLOv8n, the MES-YOLO detection algorithm achieves 2.1 percentage points improvement in the mAP@0.5 metric, reduces computational complexity from 8.2×109 to 6.5×109, and has a parameter count that is only 62% of that of the YOLOv8n model. The improved model is more suitable for low-computational-power environments while meeting high-precision deployment requirements.
Edge and semantic collaborative dual-branch decoding
network for agricultural parcel extraction
YANG Mei, LIU Sinan, PAN Zhen, GAO Lei, MIN Fan
2026, 48(3): 444-455. doi:
Abstract
(
30
)
PDF
(4244KB) (
10
)
Accurate agricultural parcel extraction from remote sensing images for agricultural resource monitoring is a critical technology for achieving intelligent management of cultivated land resources. To address the insufficient segmentation accuracy caused by blurred boundaries, diverse textures, and morphological heterogeneity in complex farmland scenarios in existing deep learning methods, this paper proposes a multi-task neural network ESDNet featuring collaborative edge-semantic optimization. The model achieves performance improvements through three innovative mechanisms: Firstly, a coordinate attention (CA) module is embedded between the encoder and main decoder to enhance the discriminative capability for ambiguous boundaries through coordinate-sensitive attention weighting. Secondly, a feature enhancement (FE) module with multi-level receptive fields is designed, employing pyramid dilated convolutions and adaptive feature fusion strategies to improve the model's resolution of heterogeneous textures. Thirdly, a multi-task collaborative optimization framework inte- grating boundary mapping, distance mapping, and mask mapping is constructed, reinforcing spatial cognition of morphologically complex parcels via a joint learning strategy combining geometric constraints and semantic guidance. To validate the model's generalizability, experiments were conducted on multi-source remote sensing datasets (Gaofen-2 and Sentinel-2 imagery) covering Shandong and Sichuan regions in China and the Netherlands. Results demonstrate that ESDNet achieves superior performance, surpassing state-of-the-art models by 0.77 percentage points, 2.17 percentage points, and 2.28 percentage points in intersection over union (IoU) across the three regions, respectively. The model’s strong generalization capability and high-precision segmentation characteristics provide reliable technical support for dynamic monitoring of cultivated land resources in smart agriculture.
An end-to-end visual multi-task
learning model for task prompts fusion
GENG Huantong, FAN Zichen, JIANG Jun, LIU Zhenyu, LI Jiaxing
2026, 48(3): 456-466. doi:
Abstract
(
47
)
PDF
(1269KB) (
8
)
To address the issues of separated network structures and inter-task interference in existing visual multi-task learning models, an end-to-end visual multi-task learning model based on triple feature embedding and task prompt fusion is proposed. During the image embedding and encoding phase, three distinct encoding modules are employed to capture the original three types of features from the image, fully preserving global, local, and contour features. This enriches the structure and semantic information of the embedding vectors, enabling the model to access image information across different feature dimensions. In the feature extraction phase, to achieve unified end-to-end learning for general tasks, task-specific learning, and cross-task interactions, spatial-channel prompt learning modules and prompt fusion modules are utilized to extract salient features, trends, and raw information from both the image and task prompts. This enhances the expressiveness and guiding capabilities of the task prompts, allowing for more comprehensive extraction of global and local features from both the image and task prompts. Experimental results show that, compared to single-task state-of-the-art (SOTA) models, the evaluation metrics for mDS and RMSE improve by 3.36 percentage points and 2.41 percentage points, respectively. Compared to multi-task SOTA models, these metrics improve by 1.69 percentage points and 0.32 percentage points, respectively, with mIOU improving by 0.99 percentage points. This provides a novel solution for multi-task learning.
An improved error compensation image magnification
algorithm based on FPGA implementation
WAN Zirong, ZHANG Caizhen
2026, 48(3): 467-475. doi:
Abstract
(
31
)
PDF
(2073KB) (
11
)
In order to maintain the edge information as much as possible and improve the visual effect of the enlarged image, an improved error compensation image enlargement algorithm based on FPGA implementation is proposed. The algorithm is based on the theory of error compensation and adds a guided filter based on error compensation correction for each pixel. The error-compensated image is adopted as the guided image and input image of the guided filter, adjusting pixel weights and setting the parameter ε to 0.1 to preserve more edge information. Experimental evaluations were conducted using peak signal-to-noise ratio(PSNR), mean gradient, and gradient standard deviation to compare the improved algorithm with the bilinear error compensation algorithm and analyze the results against those reported in the literature. The findings indicate that, compared to other algorithms, the improved algorithm increases the image’s PSNR by 3 dB to 9 dB, reduces the mean gradient by 9 to 36, and decreases the gradient standard deviation by 8 to 26, effectively mitigating edge distortion in images.
An weighted least squares image smoothing method
based on directional anisotropy
LIANG Haohan, LIU Tingting, CUI Peng, WANG Zhiqiang
2026, 48(3): 476-487. doi:
Abstract
(
47
)
PDF
(4600KB) (
13
)
The problem of excessive reliance on parameter settings in the weighted least squares image smoothing method can lead to blurring of weak gradient structures, preservation of strong gradient textures, and color shifts during multi-scale image decomposition. To address this issue, a weighted least squares image smoothing method based on directional anisotropy is proposed. Firstly, a multi- directional anisotropy structure measurement method is proposed, which enhances the ability to capture texture/structure information by using the directional derivatives of gradient information along each direction. What’s more, it also combines the gradient amplitude of the original image to achieve attenuation of the structure measurement amplitude, thereby improving the refinement of the structure. Secondly, the adaptive Sobel operator with variable template is utilized to replace the one-dimensional difference operator for computing the first-order partial derivatives and gradient weights of the regular term. This adjustment allows for better perception of gradient changes within the neighborhood range, thus preserving the integrity of edges. Lastly, the structure measurement amplitude is integrated into the gradient weights, enabling the use of small-weight smoothing parameters for structure preservation in structural regions and large-weight smoothing parameters for preserving texture details in texture regions. Additionally, a multi-channel smoothing result fusion operation is employed to solve color shifts and color distortion issues. In terms of visual presentation, the new method effectively eliminates textures while retaining delicate structures. In quantitative terms, the new method achieves a harmonious balance between texture suppression and structure preservation, outperforming mainstream texture smoothing methods.
Integrating multi-scale information and feature mapping
relationships for hierarchical multi-granularity image classification
TENG Shangzhi, MEI Changwang, YOU Xindong, Lv Xueqiang
2026, 48(3): 488-499. doi:
Abstract
(
40
)
PDF
(1459KB) (
10
)
To explore the detailed texture information of fine-grained images at different granularity levels and to focus on the relationships between hierarchical features, a hierarchical multi-granularity image classification method that integrates multi-scale information and feature mapping relationships is proposed. Firstly, mid-level semantic features extracted from the backbone network are utilized as local detailed features of images at different category granularities and fused with corresponding high-level semantic features at the same granularities. Then, a feature mapping algorithm is employed to represent the mapping relationships between category hierarchies, enabling the fusion of multi-granularity features across different levels. Finally, a reordering classification loss (RCL) is introduced to enhance classification accuracy across hierarchical categories, while a triplet center loss (TCL) is utilized to minimize the distance between objects and their class centers in the fine-grained feature space and maximize the distance from different class centers. Evaluation results on 3 hierarchical multi-granularity datasets-CUB- 200-2011, FGVC-Aircraft, and Stanford Cars-demonstrate that the proposed method achieves fine-grained image classification performance of 88.8%, 94.2%, and 95.1%, respectively, with weighted average precision (wAP) values of 90.4%, 95.1%, and 95.1%. These results fully validate the effectiveness and advanced nature of the proposed method for hierarchical multi-granularity image classification tasks.
An adversarial examples defense method
for image reconstruction based on SCViT
ZHANG Xinjun, GUO Jifa
2026, 48(3): 500-511. doi:
Abstract
(
27
)
PDF
(1682KB) (
8
)
The growing development of artificial intelligence (AI) has brought great convenience to people’s lives, but it has also gradually triggered human contemplation regarding its security. Image classification is a crucial research task in the field of computer vision; however, the vulnerability of deep neural networks makes them susceptible to attacks from adversarial examples. Adversarial examples represent a significant research direction within the realm of AI security, with a plethora of techniques emerging for both generating and defending against them. This paper introduces modifications based on the vision Transformer (ViT) and proposes a novel model, similarity comparison vision Transformer (SCViT), for comparing the similarity of image patches. In SCViT, image patches are processed through a linear projection layer and a Transformer Encoder to obtain corresponding representation vectors. The cosine similarity between these vectors is then calculated to determine the degree of similarity between image patches. To mitigate the influence of positional encoding on similarity computation, a small coefficient, denoted as α, is introduced before the positional encoding in SCViT. By utilizing SCViT for image patches similarity comparison, clean sample patches are used to replace adversarial sample patches one by one. Subsequently, all replaced clean sample patches are concatenated to form a new image for classification. Experimental results on the CIFAR-10 dataset demonstrate that selecting an appropriate value for α can enhance the defensive performance of the proposed method. Furthermore, experiments conducted on the Inception_v3 and Inception_v4 classification models indicate that the proposed method exhibits good transferability across different classification networks. Compared with several commonly used image reconstruction defense methods, the proposed method not only achieves superior defensive performance but also demonstrates greater robustness, with image classification accuracy exceeding 80% against 4 types of attack methods. Additionally, experiments on the CIFAR-100 and ImageNet datasets show that the classification accuracy for adversarial examples improves by over 54 percentage points and 46 percentage points, respectively, highlighting the versatility of the proposed method.
A KCF-TLD fusion target tracking algorithm
based on LAB and HOG feature
WU Xiaolong, LI Xuesong, DING Yan, LUO Zijuan, ZHANG Bozhi
2026, 48(3): 512-520. doi:
Abstract
(
60
)
PDF
(2818KB) (
9
)
To address the issues of the kernelized correlation filter (KCF) algorithm being susceptible to environmental illumination changes, target deformations, and target occlusions, as well as the slow solution speed of the tracking-learning-detection (TLD) algorithm, a KCF-TLD fusion target tracking algorithm based on LAB and HOG (histogram of oriented gradients) features is proposed. This algorithm utilizes LAB and HOG features instead of image samples for correlation filter operations, enhancing the KCF algorithm’s adaptability to changes in environmental illumination and target shape. By replacing the tracker component of the TLD algorithm with an improved KCF algorithm, computationally intensive optical flow calculations with high time complexity can be avoided, thereby improving the computational efficiency of the TLD algorithm. Meanwhile, the detector in the TLD algorithm can provide initialization samples for the correlation filter when the target is occluded, enabling the re-tracking of occluded targets. Comparative validation was conducted using the OTB-100 open-source dataset. Compared to the original KCF algorithm, the proposed algorithm improves tracking accuracy by 14.6%, 12.1%, and 17.5% under conditions of environmental illumination changes, target deformations, and target occlusions, respectively. Furthermore, compared to the original TLD algorithm, the proposed algorithm significantly increases the video processing frame rate.
A 3D human pose estimation method integrating semantic
graph convolutional network and self-attention mechanism
TONG Lijing, YING Yizhuo, CAO Nan
2026, 48(3): 521-530. doi:
Abstract
(
35
)
PDF
(1323KB) (
7
)
Aiming at the problem that it is difficult to capture the global characteristics of human joint sequences and the estimation accuracy is not high, a 3D human pose estimation method combining semantic graph convolutional network and self-attention mechanism is proposed. Firstly, in order to improve the feature extraction effect in the process of mapping from two-dimensional human pose sequence to three-dimensional human pose sequence, self-attention mechanism is integrated into semantic graph convolutional network to carry out spatial feature extraction based on the integration of local features and global features. Secondly, the channel-mixing module of the MLP-Mixer network is improved by introducing a semantic graph convolutional network and a U-shaped MLP structure for temporal feature extraction. Finally, 3D human pose estimation is performed based on the fused features from 2D human images and the extracted temporal features. Experimental evaluations on the Human3.6M dataset for 3D human pose estimation demonstrate that, compared with current mainstream 3D human pose estimation methods, the proposed method reduces the average error metrics MPJPE and PA-MPJPE by approximately 4.5 mm and 0.2 mm compared with the suboptimal method, respectively. The experimental results validate the effectiveness of the proposed method.
Human pose estimation combining mixed
attention and multi-scale feature
GU Xuejing, LI Yanru, YANG Lanxiao
2026, 48(3): 531-539. doi:
Abstract
(
32
)
PDF
(904KB) (
9
)
To solve the problem of low accuracy of multi-person pose estimation in occlusion scenes, a human pose estimation model named DAW-YOLOPose, which combines mixed attention mechanism and multi-scale sequence feature is proposed. Firstly, the mixed local channel attention (MLCA) mechanism is used to improve the backbone network of YOLOv8Pose, effectively capturing and transmitting spatial and channel information without increasing the number of model parameters, so as to improve the feature expression effect of the network. Secondly, a new multi-scale sequence feature fusion network is proposed to enhance the extraction ability of multi-scale feature information and integrate feature maps of different scales. Finally, the gradient gain allocation strategy of Wise-IoU v3 loss function is used to improve the ability to distinguish high-quality anchor frames and reduce the negative impact of low-quality samples on model training. The experimental results on MSCOCO dataset show that, compared with YOLOv8Pose, DAW-YOLOPose improves the mAP@0.5, mAP@0.5:0.95 and recall by 2.7 percentage points,1.4 percentage points and 1.9 percentage points respectively, achieving a better estimation effect.
An improved low-light pedestrian
detection algorithm based on YOLOv8
XU Guangping, XU Huiying, ZHU Xinzhong, HUANG Xiao, WANG Shumeng, SONG Jie
2026, 48(3): 540-550. doi:
Abstract
(
47
)
PDF
(3910KB) (
12
)
In order to solve the problem that the current mainstream low-light pedestrian detection framework has poor performance due to insufficient image brightness and contrast in this task, this paper proposes the RetinaHA-YOLOv8 algorithm. The algorithm uses RetinexFormer as a pre-processing module to restore the damaged image, ensuring that the subsequent algorithm can extract clearer and more useful features from the enhanced image. Additionally, it uses the hybrid attention transformation (HAT) attention mechanism to retain key information in the initial stage and promote deep fusion after feature fusion. Finally, in order to balance the additional computational burden and meet the real-time detection requirements, the online re-parameterized convolution technology is introduced to improve the inference speed and frames per second while maintaining the detection accuracy. The experimental results verify the effectiveness of the RetinaHA-YOLOv8 algorithm on the public low-light pedestrian detection dataset, with AP increased by 5.4%, 11.7% and 9.5% respectively, while meeting the real-time requirements in practical applications.
RCL-YOLO:A lightweight dense crowd detection algorithm
LI Mengxin, CHEN Jiaming, L Fan, ZHENG Kunyan, ZHAO Jingwen
2026, 48(3): 551-560. doi:
Abstract
(
39
)
PDF
(1952KB) (
9
)
To effectively address the issues of occlusion and missed detection in crowded scenes and further enhance both accuracy and detection speed, a lightweight dense crowd detection algorithm that improves upon YOLOv8 is proposed. Firstly, RFAConv (Receptive Field Attention Convolution) is employed to replace some of the 3×3 convolutional blocks in the YOLOv8 backbone network, thereby strengthening the network’s ability to extract features and capture detailed feature information. Secondly, the cross-scale feature fusion module(CCFM) is utilized to aggregate information across scales through a cross-scale feature fusion structure, enhancing the model’s adaptability to scale variations and enabling it to precisely locate objects of different sizes simultaneously. Additionally, the lightweight detection head(LGD) is adopted, replacing batch normalization (BN) with group normalization (GN) to improve the detection head’s performance in localization and classification. Experimental results demonstrate that, compared to the original YOLOv8 algorithm, the improved algorithm achieves 0.4 percentage points increase in mAP@0.5 and 0.5 percentage points increase in mAP@0.5:0.95 on the WiderPerson dataset, while reducing the parameter count by 1.6×106 and the computational load by 2.4 GFLOPs. Through ablation experiments and comparative model experiments, the effectiveness and generalization capability of the proposed algorithm are validated. It improves the issues of occlusion and missed detection in dense crowds while meeting the requirements for both lightweight design and accuracy.
A road obstacle detection model based on improved YOLOv8
JIANG Jianwei, JIA Xiaoyun, DUAN Kepan, GUO Yu, SHENG Lianghao, WEI Lianting
2026, 48(3): 561-570. doi:
Abstract
(
48
)
PDF
(1821KB) (
20
)
Road obstacle detection is a significant part of intelligent driving technology. In response to the current problems of low accuracy in detecting small obstacles on roads, poor detection performance in adverse environmental scenes, and scarcity of road obstacle datasets, a suitable obstacle dataset for road scenes is organized and constructed. Based on the YOLOv8 model, a new model, YOLOv8-J with high detection accuracy is proposed. Firstly, a lightweight backbone network called LskViT based on RepViT is designed to enhance the model’s ability to extract multi-scale features. Secondly, the SPD-Conv convolutional module is introduced to strengthen the model’s learning capability for low-resolution images. Finally, an additional small object detection layer is added to help the model capture more shallow features, thereby improving its detection performance for small obstacles. Experimental results demonstrate that, compared to the baseline model YOLOv8, the improved YOLOv8-J model achieves increases of 5.9 percentage points and 6.1 percentage points in mAP@0.5 and mAP@0.5:0.95 values, respectively. The improved model is well-suited for road obstacle detection tasks and further enhances detection performance for small obstacles in adverse environments.
Author center
Submission
Note to authors
Paper template
Copyright agreement
Review center
Expert
Committee
Editor in chief review
Office editorial
Online journal
Current issue
Accepted
Archive
Special issue
Download ranking
Cited ranking