• 中国计算机学会会刊
  • 中国科技核心期刊
  • 中文核心期刊

Computer Engineering & Science ›› 2022, Vol. 44 ›› Issue (01): 84-91.

Previous Articles     Next Articles

Image caption generation based on residual dense hierarchical information

WANG Xi,ZHANG Kai,LI Jun-hui,KONG Fang   

  1. (School of Computer Science and Technology,Soochow University,Suzhou 215006,China)

  • Received:2020-08-26 Revised:2020-11-09 Accepted:2022-01-25 Online:2022-01-25 Published:2022-01-13

Abstract: The current mainstream method of image caption generation is based on deep neural networks, especially the self-attention mechanism model. However, the traditional deep neural network layers are stacked linearly, which makes the information captured by the low-level network not be able to be reflected in the high-level network and not fully utilized. Therefore, this paper proposes a method based on dense residual network to obtain hierarchical semantic information to generate high-quality image captions. First of all, in order to make full use of the layer information of the network and extract the local features of each layer in the deep network, this paper proposes Layer RDense (Layer Residual Dense), which carries out dense residual connections between layers. Secondly, SubRDense (Sublayer Residual Dense) is proposed. It uses a dense residual network in the sub-layers of each layer of the network at the Decoder side, in order to better integrate image features and image description information. The experimental results based on the MSCOCO 2014 dataset show that the proposed LayerRDense and SubRDense networks can further improve the performance of image caption generation.

Key words: image caption, self-attention mechanism, residual dense network