Computer Engineering & Science ›› 2022, Vol. 44 ›› Issue (01): 84-91.
Previous Articles Next Articles
WANG Xi,ZHANG Kai,LI Jun-hui,KONG Fang
Received:
Revised:
Accepted:
Online:
Published:
Abstract: The current mainstream method of image caption generation is based on deep neural networks, especially the self-attention mechanism model. However, the traditional deep neural network layers are stacked linearly, which makes the information captured by the low-level network not be able to be reflected in the high-level network and not fully utilized. Therefore, this paper proposes a method based on dense residual network to obtain hierarchical semantic information to generate high-quality image captions. First of all, in order to make full use of the layer information of the network and extract the local features of each layer in the deep network, this paper proposes Layer RDense (Layer Residual Dense), which carries out dense residual connections between layers. Secondly, SubRDense (Sublayer Residual Dense) is proposed. It uses a dense residual network in the sub-layers of each layer of the network at the Decoder side, in order to better integrate image features and image description information. The experimental results based on the MSCOCO 2014 dataset show that the proposed LayerRDense and SubRDense networks can further improve the performance of image caption generation.
Key words: image caption, self-attention mechanism, residual dense network
WANG Xi, ZHANG Kai, LI Jun-hui, KONG Fang. Image caption generation based on residual dense hierarchical information[J]. Computer Engineering & Science, 2022, 44(01): 84-91.
0 / / Recommend
Add to citation manager EndNote|Ris|BibTeX
URL: http://joces.nudt.edu.cn/EN/
http://joces.nudt.edu.cn/EN/Y2022/V44/I01/84