• 中国计算机学会会刊
  • 中国科技核心期刊
  • 中文核心期刊

计算机工程与科学 ›› 2022, Vol. 44 ›› Issue (08): 1481-1487.

• 人工智能与数据挖掘 • 上一篇    下一篇

基于词典模型融合的神经机器翻译

王煦,贾浩,季佰军,段湘煜   

  1. (苏州大学自然语言处理实验室,江苏  苏州 215006)
  • 收稿日期:2020-11-20 修回日期:2021-01-29 接受日期:2022-08-25 出版日期:2022-08-25 发布日期:2022-08-25
  • 基金资助:
    国家自然科学基金(61673289)

Neural machine translation based on dictionary model fusion

WANG Xu,JIA Hao,JI Bai-jun,DUAN Xiang-yu   

  1. (Natural Language Processing Laboratory,Soochow University,Suzhou 215006,China)
  • Received:2020-11-20 Revised:2021-01-29 Accepted:2022-08-25 Online:2022-08-25 Published:2022-08-25

摘要: 无监督神经机器翻译仅利用大量单语数据,无需平行数据就可以训练模型,但是很难在2种语系遥远的语言间建立联系。针对此问题,提出一种新的不使用平行句对的神经机器翻译训练方法,使用一个双语词典对单语数据进行替换,在2种语言之间建立联系,同时使用词嵌入融合初始化和双编码器融合训练2种方法强化2种语言在同一语义空间的对齐效果,以提高机器翻译系统的性能。实验表明,所提方法在中-英与英-中实验中比基线无监督翻译系统的BLEU值分别提高2.39和1.29,在英-俄和英-阿等单语实验中机器翻译效果也显著提高了。

关键词: 神经网络, 神经机器翻译, 词典, 无监督

Abstract: Unsupervised neural machine translation can train models using only a large amount of monolingual data without the need of parallel data, but it is difficult to establish the connection between two linguistically distant languages. To address this problem, this paper proposes a new neural machine translation training method without parallel sentence pairs. A bilingual dictionary is used to replace words in monolingual data, so as to establish the connection between the two languages. Meanwhile, word embedding fusion initialization and dual-encoder fusion training are used to enhance the alignment of the two languages in the same semantic space, in order to improve the performance of the machine translation system. Experiments show that, compared with other unsupervised models, our method can improve the BLEU values by 2.39 and 1.29 over the baseline system on the Chinese-English and English-Chinese translation tasks, and also achieve good results on the English-Russian and English- Arabic translation tasks with monolingual data.

Key words: neural network, neural machine translation, dictionary, unsupervised