• 中国计算机学会会刊
  • 中国科技核心期刊
  • 中文核心期刊

计算机工程与科学 ›› 2024, Vol. 46 ›› Issue (09): 1625-1634.

• 图形与图像 • 上一篇    下一篇

基于图文关联与上下文引导的军事新闻图集描述生成方法

梅运红1,2,刘茂福1,2   

  1. (1.武汉科技大学计算机科学与技术学院,湖北 武汉 430065;
    2.湖北省智能信息处理与实时工业系统重点实验室,湖北  武汉 430065)

  • 收稿日期:2023-03-31 修回日期:2023-09-14 接受日期:2024-09-25 出版日期:2024-09-25 发布日期:2024-09-23

A military image set captioning method based on image and text relevance and context guidance

MEI Yun-hong1,2,LIU Mao-fu1,2   

  1. (1.School of Computer Science and Technology,Wuhan University of Science and Technology,Wuhan 430065;
    2.Hubei Province Key Laboratory of Intelligent Information Processing 
    and Real-Time Industrial System,Wuhan 430065,China)
  • Received:2023-03-31 Revised:2023-09-14 Accepted:2024-09-25 Online:2024-09-25 Published:2024-09-23

摘要: 传统的图像描述生成方法由于缺少现实世界的先验知识,生成的描述文本不具有解释性,同时在某些专业领域生成的描述文本准确性不高。针对上述问题,提出了军事新闻图集描述生成任务,还构建了军事新闻图集数据集。该任务存在2个关键挑战:描述信息来源于整个图集和对应的新闻文本中,模型学习到的语义不够充分。进一步提出了一种基于图文关联与上下文引导的军事新闻图集描述生成方法ITRCG。基于ITRCG实现跨模态信息交互,引导模型学习更完整的语义,并通过标签清理辅助命名实体生成。在构建的军事新闻图集数据集上进行了验证实验,结果表明ITRCG能够有效提高描述文本的质量,在各项评价指标上均取得了提升。

关键词: 图像描述, 图文关联注意力, 上下文引导注意力, 图集, 新闻文本

Abstract: Traditional image captioning methods do not generate explanatory description texts due to the lack of a priori knowledge of the real world, while the accuracy of the generated description texts is not high in some specialized fields. To address these problems, the military news image set captioning task is proposed, and a military news image set dataset is also constructed. The task has two key challenges: the description information is derived from the whole image set and the corresponding news articles; the semantics learned by the model is not sufficient. A military news image set captioning method based on image and text relevance and context guidance (ITRCG) is further proposed. Based on ITRCG, cross-modal information interaction is realized, the model is guided to learn more complete semantics, and named entity generation is assisted by label cleaning. Experimental validation is conducted on the constructed military news image set dataset, and the results show that ITRCG can effectively improve the quality of the description text and achieve improvements in all evaluation metrics.

Key words: image captioning, image and text relevance attention, context guidance attention, image set, news text