• 中国计算机学会会刊
  • 中国科技核心期刊
  • 中文核心期刊

计算机工程与科学 ›› 2022, Vol. 44 ›› Issue (04): 730-736.

• 人工智能与数据挖掘 • 上一篇    下一篇

基于预训练的无监督神经机器翻译模型研究

薛擎天,李军辉,贡正仙,徐东钦   

  1. (苏州大学自然语言处理实验室,江苏 苏州 215006)
  • 收稿日期:2020-08-26 修回日期:2020-11-30 接受日期:2022-04-25 出版日期:2022-04-25 发布日期:2022-04-20
  • 基金资助:
    国家自然科学基金(61876120)

Unsupervised neural machine translation model based on pre-training

XUE Qing-tian,LI Jun-hui,GONG Zheng-xian,XU Dong-qin   

  1. (Natural Language Processing Laboratory,Soochow University,Suzhou 215006,China)
  • Received:2020-08-26 Revised:2020-11-30 Accepted:2022-04-25 Online:2022-04-25 Published:2022-04-20

摘要: 依赖于大规模的平行语料库,神经机器翻译在某些语言对上已经取得了巨大的成功。无监督神经机器翻译UNMT又在一定程度上解决了高质量平行语料库难以获取的问题。最近的研究表明,跨语言模型预训练能够显著提高UNMT的翻译性能,其使用大规模的单语语料库在跨语言场景中对深层次上下文信息进行建模,获得了显著的效果。进一步探究基于跨语言预训练的UNMT,提出了几种改进模型训练的方法,针对在预训练之后UNMT模型参数初始化质量不平衡的问题,提出二次预训练语言模型和利用预训练模型的自注意力机制层优化UNMT模型的上下文注意力机制层2种方法。同时,针对UNMT中反向翻译方法缺乏指导的问题,尝试将Teacher-Student框架融入到UNMT的任务中。实验结果表明,在不同语言对上与基准系统相比,本文的方法最高取得了0.8 ~ 2.08个百分点的双语互译评估(BLEU)值的提升。

关键词: 神经网络, 神经机器翻译, 无监督, 预训练

Abstract: Depending on the large-scale parallel corpus, neural machine translation has achieved great success in some language pairs. Subsequently, unsupervised neural machine translation (UNMT) has partly solved the problem that high quality corpus is difficult to obtain. Recent studies show that cross-lingual language model pretraining can significantly improve the translation performance of UNMT. This method models deep context information in cross-lingual language scenarios by using a large-scale monolingual corpus, and obtains significant results. This paper further explores UNMT based on cross-lingual language pretraining, proposes several improved methods of training model, and compares the performance between UNMT and baseline system on different language pairs. Aiming at the issue of unbalanced initialization of unsupervised NMT parameters when using pre-trained models, this paper proposes a secondary pre-training stage to continue pre-training, and propose to initialize the Cross attention sub-layer with the self-attention sub-layer in unsupervised NMT model. Meanwhile, as back- translation plays a critical role in unsupervised NMT, we propose to use Teacher-Student framework to guide back-translation.Experimental results show that, compared with the baseline system, these methods improve BLEU by 0.8~2.08 percentages at most.



Key words: neural network, neural machine translation, unsupervised, pre-training