语言模型蒸馏的低资源神经机器翻译方法

计算机工程与科学 ›› 2024, Vol. 46 ›› Issue (04): 743-751.

语言模型蒸馏的低资源神经机器翻译方法

申影利1,2,赵小兵2，3

(1.中央民族大学中国少数民族语言文学学院,北京 100081;
2.国家语言资源监测与研究少数民族语言中心,北京 100081；3.中央民族大学信息工程学院,北京 100081)

收稿日期:2023-02-12 修回日期:2023-04-20 接受日期:2024-04-25 出版日期:2024-04-25 发布日期:2024-04-18
基金资助:
国家社会科学基金重大项目（22&ZD035）

A neural machine translation method based on language model distillation

SHEN Ying-li1,2,ZHAO Xiao-bing2,3

(1.School of Chinese Ethnic Minority Languages and Literatures,Minzu University of China,Beijing 100081;
2.National Language Resource Monitoring & Research Center of Minority Languages,Beijing 100081;
3.School of Information Engineering,Minzu University of China,Beijing 100081,China)

Received:2023-02-12 Revised:2023-04-20 Accepted:2024-04-25 Online:2024-04-25 Published:2024-04-18

摘要/Abstract

摘要： 大规模平行语料库的缺乏是低资源神经机器翻译面临的关键问题之一。提出语言模型蒸馏的神经机器翻译方法，通过单语语言模型对神经机器翻译训练进行正则化，引入语言模型包含的先验知识以提升翻译效果。具体地，借鉴知识蒸馏思想，使用丰富单语数据训练的目标端语言模型（教师模型）构造低资源神经机器翻译模型（学生模型）的正则化因子，让翻译模型学习到语言模型中高度泛化的先验知识。与传统单语语言模型融合参与解码过程不同的是，本文方法中的语言模型只在训练阶段使用，不参与推断阶段，因此能够有效提升解码速度。在第十七届全国机器翻译大会CCMT2021维吾尔语-汉语和藏语-汉语2种民汉低资源翻译数据集上的实验结果表明，相比目前最先进的语言模型融合方法，BLEU提高了1.42%（藏汉方向）~2.11%（汉维方向）。

关键词: 语言模型, 知识蒸馏, 正则化, 低资源神经机器翻译

Abstract: The lack of large parallel corpora is one of the key issues in low-resource neural machine translation. This paper proposes a neural machine translation method based on language model distillation, which regularizes neural machine translation training by using a monolingual language model. This method introduces prior knowledge contained in the language model to improve translation results. Specifically, we draw on the idea of knowledge distillation, and use the target-side language model (teacher model) trained on rich monolingual data to construct the regularization factor of the low- resource neural machine translation model (student model), allowing the translation model to learn highly generalized prior knowledge from the language model. Unlike traditional monolingual language models that participate in the decoding process, the language model in this method is only used during training and does not participate in the inference stage, so it can effectively improve decoding speed. Experimental results on two low-resource translation datasets of Uyghur-Chinese and Tibetan-Chinese from the 17th national machine translation conference (CCMT2021) show that compared with the current state-of-the-art language model fusion baseline system, BLEU can be improved by 1.42 points (Tibetan-Chinese) to 2.11 points (Chinese-Uyghu).

Key words: language model, knowledge distillation, regularization, low-resource neural machine translation

申影利, 赵小兵, . 语言模型蒸馏的低资源神经机器翻译方法[J]. 计算机工程与科学, 2024, 46(04): 743-751.

SHEN Ying-li, ZHAO Xiao-bing, . A neural machine translation method based on language model distillation[J]. Computer Engineering & Science, 2024, 46(04): 743-751.

[1]	罗超, 缪君, 郑义林, 华锋, 储珺. 对比约束下的非局部关联单图像去反光级联算法[J]. 计算机工程与科学, 2024, 46(05): 861-871.
[2]	王剑, 姜林, 王琳钦, 余正涛, 张松, 高盛祥, . 基于BiLSTM的低资源老挝语文本正则化任务[J]. 计算机工程与科学, 2023, 45(07): 1292-1299.
[3]	范韫. 基于EMD距离的稀疏自编码器[J]. 计算机工程与科学, 2022, 44(05): 894-900.
[4]	赵林锁, 陈泽, 丁琳琳, 宋宝燕. 基于RELM的时间序列数据加权集成分类方法[J]. 计算机工程与科学, 2022, 44(03): 545-553.
[5]	李海丰, 吴治龙, 聂晶晶. 强干扰条件下机场道面细小裂缝自动识别算法[J]. 计算机工程与科学, 2020, 42(11): 2020-2029.
[6]	黄浩淼, 张江, 张晶, 保峻嵘. 融合TLD框架的DSST实时目标跟踪改进算法[J]. 计算机工程与科学, 2020, 42(09): 1587-1598.
[7]	李晓璐1,周亚同1,何静飞1,翁丽源1,李书华2. 全变分正则化非局部均值地震数据降噪[J]. 计算机工程与科学, 2020, 42(06): 1106-1110.
[8]	马永军，陈海山. 基于LSTM模型的食品安全网络舆情预警研究[J]. 计算机工程与科学, 2019, 41(09): 1603-1611.
[9]	金巧园，万磊，盛明伟，唐松奇. 各向异性带宽自适应水面运动目标跟踪算法[J]. 计算机工程与科学, 2019, 41(02): 308-314.
[10]	南敬昌，胡婷婷，盛爽爽，高明明. Doherty功放的贝叶斯正则化神经网络逆向建模研究[J]. 计算机工程与科学, 2018, 40(08): 1496-1502.
[11]	周杭霞，叶佳骏，任欢. 基于快速自编码的RELM的文本分类[J]. J4, 2016, 38(05): 871-876.
[12]	宛艳萍孙曙光肖庭延. 数值求导的离散正则化方法[J]. J4, 2008, 30(10): 85-86.
[13]	张亮罗鹏飞. 基于空间自适应正则化和Hopfield网络的图像盲复原方法[J]. J4, 2004, 26(6): 31-33.