基于残差密集网络层次信息的图像标题生成

计算机工程与科学 ›› 2022, Vol. 44 ›› Issue (1): 84-91.

基于残差密集网络层次信息的图像标题生成

王习，张凯，李军辉，孔芳

（苏州大学计算机科学与技术学院，江苏苏州 215006）

收稿日期:2020-08-26 修回日期:2020-11-09 出版日期:2022-01-25 发布日期:2022-01-13
基金资助:
国家自然科学基金（61876120）

Image caption generation based on residual dense hierarchical information

WANG Xi,ZHANG Kai,LI Jun-hui,KONG Fang

(School of Computer Science and Technology,Soochow University,Suzhou 215006,China)

Received:2020-08-26 Revised:2020-11-09 Online:2022-01-25 Published:2022-01-13

摘要/Abstract

摘要： 当前图像标题生成任务的主流方法是基于深层神经网络的方法，尤其是基于自注意力机制模型的方法。然而，传统的深层神经网络层次之间是线性堆叠的，这使得低层网络捕获的信息无法在高层网络中体现，从而没有得到充分的利用。提出基于残差密集网络的方法获取层次语义信息来生成高质量的图像标题。首先，为了能够充分利用网络的层次信息，以及提取深层网络中的各个层的局部特征，提出LayerRDense在层与层之间进行残差密集连接。其次，提出SubRDense，在Decoder端的每层网络中的子层中运用残差密集网络，以更好地融合图像特征和图像的描述信息。在MSCOCO 2014数据集上的实验结果表明，所提出的LayerRDense和SubRDense网络均能进一步提高图像标题生成的性能。

关键词: 图像标题, 自注意力机制, 残差密集网络

Abstract: The current mainstream method of image caption generation is based on deep neural networks, especially the self-attention mechanism model. However, the traditional deep neural network layers are stacked linearly, which makes the information captured by the low-level network not be able to be reflected in the high-level network and not fully utilized. Therefore, this paper proposes a method based on dense residual network to obtain hierarchical semantic information to generate high-quality image captions. First of all, in order to make full use of the layer information of the network and extract the local features of each layer in the deep network, this paper proposes Layer RDense (Layer Residual Dense), which carries out dense residual connections between layers. Secondly, SubRDense (Sublayer Residual Dense) is proposed. It uses a dense residual network in the sub-layers of each layer of the network at the Decoder side, in order to better integrate image features and image description information. The experimental results based on the MSCOCO 2014 dataset show that the proposed LayerRDense and SubRDense networks can further improve the performance of image caption generation.

Key words: image caption, self-attention mechanism, residual dense network

王习, 张凯, 李军辉, 孔芳. 基于残差密集网络层次信息的图像标题生成[J]. 计算机工程与科学, 2022, 44(1): 84-91.

WANG Xi, ZHANG Kai, LI Jun-hui, KONG Fang. Image caption generation based on residual dense hierarchical information[J]. Computer Engineering & Science, 2022, 44(1): 84-91.

[1]	刘畅, 徐炜遐. CNN-ViTAMR：一种基于Transformer的自动信号调制识别算法及其轻量化实现#br#[J]. 计算机工程与科学, 2025, 47(8): 1408-1416.
[2]	王莹, 杨青, 王翔宇, 张勇, . 基于非对称空间特征的脑电信号情感分析研究[J]. 计算机工程与科学, 2025, 47(5): 921-930.
[3]	张梦圆, 端阳, 王彬彬, 张蕾, 吴裔, 刘畅, 郭乃网, 程大伟. 基于深度对抗网络的动态图生成模型研究[J]. 计算机工程与科学, 2025, 47(4): 728-739.
[4]	刘国岐, 何廷年, 荣艺煊, 李卓然. 基于用户轨迹和好友关系的兴趣点推荐[J]. 计算机工程与科学, 2024, 46(9): 1693-1701.
[5]	刘晓华, 徐茹枝, 杨成月. 一种基于多特征融合嵌入的中文命名实体识别模型研究[J]. 计算机工程与科学, 2024, 46(8): 1473-1481.
[6]	马长林, 孙状. 基于实体知识的远程监督关系抽取[J]. 计算机工程与科学, 2024, 46(5): 945-950.
[7]	晋广印, 赵旭俊, 龚艺璇. 基于长短期记忆网络的移动轨迹目的地预测[J]. 计算机工程与科学, 2024, 46(3): 525-534.
[8]	付燕, 杨旭, 叶鸥. 基于CNN和Transformer特征融合的烟雾识别方法[J]. 计算机工程与科学, 2024, 46(11): 2045-2052.
[9]	余子丞, 凌捷. 基于Transformer和多特征融合的DGA域名检测方法[J]. 计算机工程与科学, 2023, 45(8): 1416-1423.
[10]	王剑, 姜林, 王琳钦, 余正涛, 张松, 高盛祥, . 基于BiLSTM的低资源老挝语文本正则化任务[J]. 计算机工程与科学, 2023, 45(7): 1292-1299.
[11]	袁野, 廖薇. 基于多重相关信息交互的文本相似度计算方法[J]. 计算机工程与科学, 2022, 44(7): 1313-1320.
[12]	陈曦, 赵红东, 杨东旭, 徐柯南, 任星霖, 封慧杰. 基于线性注意力机制的单样本生成对抗网络研究[J]. 计算机工程与科学, 2022, 44(11): 2056-2063.
[13]	刘婕, 张磊, 朱少杰, 刘佰龙, 张雪飞. 基于自注意力机制的多模态语义轨迹预测[J]. 计算机工程与科学, 2021, 43(11): 2069-2070.
[14]	闫雄, 段跃兴, 张泽华. 采用自注意力机制和CNN融合的实体关系抽取[J]. 计算机工程与科学, 2020, 42(11): 2059-2066.
[15]	张鑫，程华，房一泉. 基于Transformer的DGA域名检测方法[J]. 计算机工程与科学, 2020, 42(03): 411-417.