A neural machine translation method based on language model distillation

Computer Engineering & Science ›› 2024, Vol. 46 ›› Issue (04): 743-751.

• Artificial Intelligence and Data Mining • Previous Articles Next Articles

A neural machine translation method based on language model distillation

SHEN Ying-li1,2,ZHAO Xiao-bing2,3

(1.School of Chinese Ethnic Minority Languages and Literatures,Minzu University of China,Beijing 100081;
2.National Language Resource Monitoring & Research Center of Minority Languages,Beijing 100081;
3.School of Information Engineering,Minzu University of China,Beijing 100081,China)

Received:2023-02-12 Revised:2023-04-20 Accepted:2024-04-25 Online:2024-04-25 Published:2024-04-18

Abstract

Abstract: The lack of large parallel corpora is one of the key issues in low-resource neural machine translation. This paper proposes a neural machine translation method based on language model distillation, which regularizes neural machine translation training by using a monolingual language model. This method introduces prior knowledge contained in the language model to improve translation results. Specifically, we draw on the idea of knowledge distillation, and use the target-side language model (teacher model) trained on rich monolingual data to construct the regularization factor of the low- resource neural machine translation model (student model), allowing the translation model to learn highly generalized prior knowledge from the language model. Unlike traditional monolingual language models that participate in the decoding process, the language model in this method is only used during training and does not participate in the inference stage, so it can effectively improve decoding speed. Experimental results on two low-resource translation datasets of Uyghur-Chinese and Tibetan-Chinese from the 17th national machine translation conference (CCMT2021) show that compared with the current state-of-the-art language model fusion baseline system, BLEU can be improved by 1.42 points (Tibetan-Chinese) to 2.11 points (Chinese-Uyghu).

Key words: language model, knowledge distillation, regularization, low-resource neural machine translation

SHEN Ying-li, ZHAO Xiao-bing, . A neural machine translation method based on language model distillation[J]. Computer Engineering & Science, 2024, 46(04): 743-751.

[1]	LUO Chao, MIAO Jun, ZHENG Yi-lin, HUA Feng, Chu Jun. A single image reflection removal cascaded algorithm using non-local correlation and contrast constraint [J]. Computer Engineering & Science, 2024, 46(05): 861-871.
[2]	FAN Yun. Sparse autoencoder based on earth mover distance [J]. Computer Engineering & Science, 2022, 44(05): 894-900.
[3]	LI Haifeng, WU Zhilong, NIE Jingjing. An automatic fine crack recognition algorithm for airport pavement under significant noises [J]. Computer Engineering & Science, 2020, 42(11): 2020-2029.
[4]	HUANG Hao-miao, ZHANG Jiang, ZHANG Jing, BAO Jun-rong. An improved DSST real-time target tracking algorithm based on TLD framework [J]. Computer Engineering & Science, 2020, 42(09): 1587-1598.
[5]	LI Xiao-lu1,ZHOU Ya-tong1,HE Jing-fei1，WENG Li-yuan1,LI Shu-hua2. Total variational regularization for non-local mean seismic data denoising [J]. Computer Engineering & Science, 2020, 42(06): 1106-1110.
[6]	A Yong-jun，CHEN Hai-shan. LSTM modle based early warning of internet public opinion on food security [J]. Computer Engineering & Science, 2019, 41(09): 1603-1611.
[7]	JIN Qiaoyuan,WAN Lei,SHENG Mingwei,TANG Songqi. An anisotropic bandwidth-adaptive tracking algorithm for surface moving targets [J]. Computer Engineering & Science, 2019, 41(02): 308-314.
[8]	NAN Jingchang,HU Tingting,SHENG Shuangshuang,GAO Mingming. Bayesian regularization neural network inverse modeling for Doherty power amplifier [J]. Computer Engineering & Science, 2018, 40(08): 1496-1502.
[9]	. [J]. J4, 2008, 30(10): 85-86.

A neural machine translation method based on language model distillation

PDF

Knowledge

Abstract

Cite this article

share this article

Related Articles 9

Recommended Articles

Metrics

Comments