Dual-level interactive adaptive fusion for multimodal neural machine translation
DU Lian-cheng1,2,GUO Jun-jun1,2,YE Jun-jie1,2,YU Zheng-tao1,2
(1.Faculty of Information Engineering and Automation,Kunming University of Science and Technology,Kunming 650504;
2.Key Laboratory of Artificial Intelligence in Yunnan Province,Kunming 650500,China)Abstract:The objective of multimodal neural machine translation (MNMT) is to enhance the quality of pure text-based neural machine translation by incorporating additional modalities.Images encompass various semantic information,including entity relationships,attributes,and spatial positioning.However,most existing fusion methods primarily focus on partial visual information in images,neglecting the exploration of intra-modal relationships,resulting in limited utilization of visual information and the inability to fully exploit the semantic richness within images. This paper proposes a dual-level interactive adaptive fusion multimodal neural machine translation method that considers diverse aspects of visual features to maximize the utilization of visual information.Experimental results demonstrate the effectiveness of the proposed method in harnessing the visual information,showcasing significant improvements over state-of-the-art MNMT methods on the English-to-German (EN→DE) and English-to-French (EN→FR) translation tasks on Multi30K dataset.