Image caption generation based on residual dense hierarchical information

Computer Engineering & Science ›› 2022, Vol. 44 ›› Issue (01): 84-91.

Previous Articles Next Articles

Image caption generation based on residual dense hierarchical information

WANG Xi,ZHANG Kai,LI Jun-hui,KONG Fang

(School of Computer Science and Technology,Soochow University,Suzhou 215006,China)

Received:2020-08-26 Revised:2020-11-09 Accepted:2022-01-25 Online:2022-01-25 Published:2022-01-13

Abstract

Abstract: The current mainstream method of image caption generation is based on deep neural networks, especially the self-attention mechanism model. However, the traditional deep neural network layers are stacked linearly, which makes the information captured by the low-level network not be able to be reflected in the high-level network and not fully utilized. Therefore, this paper proposes a method based on dense residual network to obtain hierarchical semantic information to generate high-quality image captions. First of all, in order to make full use of the layer information of the network and extract the local features of each layer in the deep network, this paper proposes Layer RDense (Layer Residual Dense), which carries out dense residual connections between layers. Secondly, SubRDense (Sublayer Residual Dense) is proposed. It uses a dense residual network in the sub-layers of each layer of the network at the Decoder side, in order to better integrate image features and image description information. The experimental results based on the MSCOCO 2014 dataset show that the proposed LayerRDense and SubRDense networks can further improve the performance of image caption generation.

Key words: image caption, self-attention mechanism, residual dense network

WANG Xi, ZHANG Kai, LI Jun-hui, KONG Fang. Image caption generation based on residual dense hierarchical information[J]. Computer Engineering & Science, 2022, 44(01): 84-91.

[1]	FU Yan, YANG Xu, YE Ou. A smoke recognition method based on CNN and Transformer feature fusion [J]. Computer Engineering & Science, 2024, 46(11): 2045-2052.
[2]	MEI Yun-hong, LIU Mao-fu, . A military image set captioning method based on image and text relevance and context guidance [J]. Computer Engineering & Science, 2024, 46(09): 1625-1634.
[3]	LIU Guo-qi, HE Ting-nian, RONG Yi-xuan, LI Zhuo-ran . A point of interest recommendation model based on tracks and friend relationship of users [J]. Computer Engineering & Science, 2024, 46(09): 1693-1701.
[4]	LIU Xiao-hua, XU Ru-zhi, YANG Cheng-yue. A Chinese named entity recognition model based on multi-feature fusion embedding#br# [J]. Computer Engineering & Science, 2024, 46(08): 1473-1481.
[5]	MA Chang-lin, SUN Zhuang. Distantly supervised relation extraction based on entity knowledge [J]. Computer Engineering & Science, 2024, 46(05): 945-950.
[6]	JIN Guang-yin, ZHAO Xu-jun, GONG Yi-xuan. Moving trajectory destination prediction based on long short-term memory network [J]. Computer Engineering & Science, 2024, 46(03): 525-534.
[7]	YU Zi-cheng, LING Jie. A DGA domain name detection method based on Transformer and multi-feature fusion [J]. Computer Engineering & Science, 2023, 45(08): 1416-1423.
[8]	WANG Jian, JIANG Lin, WANG Lin-qin, YU Zheng-tao, ZHANG Song, GAO Sheng-xiang, . A low-resource Lao text regularization task based on BiLSTM [J]. Computer Engineering & Science, 2023, 45(07): 1292-1299.
[9]	YUAN Ye, LIAO Wei. A text similarity calculation method based on multiple related information interaction [J]. Computer Engineering & Science, 2022, 44(07): 1313-1320.
[10]	ZHANG Xin,CHENG Hua,FANG Yi-quan. A DGA domain name detection method based on Transformer [J]. Computer Engineering & Science, 2020, 42(03): 411-417.

Image caption generation based on residual dense hierarchical information

PDF

Knowledge

Abstract

Cite this article

share this article

Related Articles 10

Recommended Articles

Metrics

Comments