• 中国计算机学会会刊
  • 中国科技核心期刊
  • 中文核心期刊

Computer Engineering & Science ›› 2025, Vol. 47 ›› Issue (2): 298-307.

• Graphics and Images • Previous Articles     Next Articles

Focusing paradigm prompt learning of segment anything for unsupervised video object segmentation

SHEN Yonghui1,2,3,BU Dongxu1,2,3,ZHANG Shengyu1,2,3,SONG Huihui1,2,3   

  1. (1.School of Automation,Nanjing University of Information Science & Technology,Nanjing 210044;
    2.Jiangsu Key Laboratory of Big Data Analysis Technology,Nanjing 210044;
    3.Collaborative Innovation Center on Atmospheric Environment and Equipment Technology,Nanjing 210044,China)Abstract:Unsupervised video object segmentation aims to automatically locate and segment the primary objects in video frames during the testing phase. Currently, most models rely on appearance cues extracted from RGB images and motion cues extracted from optical flow maps for object segmentation. However, issues such as object occlusion, rapid motion, or stillness can lead to missing information in optical flow, making it difficult to achieve good segmentation results solely based on the limited information obtained from the appearance branch. To address this problem, this paper proposes a focused learning network (FPLNet), which introduces an additional dual-branch structure to capture the positional and contour information of the main objects, thus compensating for the missing optical flow information. Firstly, the proposed model utilizes the backbone network of the segment anything model (SAM) to extract appearance and motion information, thereby improving the model's generalization. Then, the two additional segmentation branches, coarse-grained and fine-grained, are introduced as the cue part of the focus learning network. In the decoding part, RGB appearance information, optical flow motion information, coarse-grained features, and fine-grained features are progressively fused to mimic the process of focused learning of target features in the human visual system. Extensive testing is conducted on three standard datasets, and the experimental results demonstrate that the proposed model achieves superior performance compared to existing models.
  • Received:2023-11-01 Revised:2024-01-03 Online:2025-02-25 Published:2025-02-24

Abstract: unsupervised video object segmentation;focusing learning;segment anything model