• 中国计算机学会会刊
  • 中国科技核心期刊
  • 中文核心期刊

Computer Engineering & Science ›› 2024, Vol. 46 ›› Issue (11): 2071-2080.

• Artificial Intelligence and Data Mining • Previous Articles     Next Articles

Dual-level interactive adaptive fusion for multimodal neural machine translation

DU Lian-cheng1,2,GUO Jun-jun1,2,YE Jun-jie1,2,YU Zheng-tao1,2   

  1. (1.Faculty of Information Engineering and Automation,Kunming University of Science and Technology,Kunming 650504;
    2.Key Laboratory of Artificial Intelligence in Yunnan Province,Kunming 650500,China)Abstract:The objective of multimodal neural machine translation (MNMT) is to enhance the quality of pure text-based neural machine translation by incorporating additional modalities.Images encompass various semantic information,including entity relationships,attributes,and spatial positioning.However,most existing fusion methods primarily focus on partial visual information in images,neglecting the exploration of intra-modal relationships,resulting in limited utilization of visual information and the inability to fully exploit the semantic richness within images. This paper proposes a dual-level interactive adaptive fusion multimodal neural machine translation method that considers diverse aspects of visual features to maximize the utilization of visual information.Experimental results demonstrate the effectiveness of the proposed method in harnessing the visual information,showcasing significant improvements over state-of-the-art MNMT methods on the English-to-German (EN→DE) and English-to-French (EN→FR) translation tasks on Multi30K dataset.

  • Received:2023-09-14 Revised:2023-12-25 Accepted:2024-11-25 Online:2024-11-25 Published:2024-11-27

Abstract: multimodal neural machine translation;dual visual feature interaction;image-to-text cross-modal adaptive fusion