• 中国计算机学会会刊
  • 中国科技核心期刊
  • 中文核心期刊

Computer Engineering & Science ›› 2023, Vol. 45 ›› Issue (09): 1602-1610.

• Graphics and Images • Previous Articles     Next Articles

An unsupervised video summarization algorithm based on deep and shallow feature fusion

ZENG Fan-feng,WANG Chun-zhen,LI Chen   

  1. (School of Information,North China University of Technology,Beijing 100144,China)
  • Received:2022-06-09 Revised:2022-10-18 Accepted:2023-09-25 Online:2023-09-25 Published:2023-09-12

Abstract: To solve the problem that the existing unsupervised video summarization algorithms do not accurately judge the importance of video frames, an unsupervised video summarization algorithm based on deep and shallow feature fusion is proposed. The deep features of video frames are extracted by a Convolutional Neural Network (CNN), while the shallow features are first extracted by the Speeded Up Robust Features (SURF) operator and then encoded using the Bag-of-Words (BOW) model. The deep and shallow features are fused to enrich the information of the feature descriptors as the input of the network model. A Bidirectional Long Short-Term Memory network (BiLSTM) is used to model the temporal information and output frame importance scores. The model is optimized using reinforcement learning. For generating static video summaries, a keyframe selection method based on local maxima is designed, which follows the temporal structure of the original video and avoids redundancy. Compared with several unsupervised video summarization algorithms on the SumMe and TVSum datasets, experimental results show that the proposed algorithm can make more accurate judgments on video content and generate higher-quality summaries.

Key words: video summarization, feature fusion, bi-directional long short-term memory (BiLSTM) network, reinforcement learning, local maximum