Unsupervised domain-adapted machine translation based on improving the quality of pseudo-parallel sentence pairs

Computer Engineering & Science ›› 2022, Vol. 44 ›› Issue (12): 2230-2237.

• Artificial Intelligence and Data Mining • Previous Articles Next Articles

Unsupervised domain-adapted machine translation based on improving the quality of pseudo-parallel sentence pairs

XIAO Ni-ni,JIN Chang,DUAN Xiang-yu

(Natural Language Processing Laboratory,School of Computer Science and Technology,
Soochow University,Suzhou 215006,China)

Received:2021-04-26 Revised:2021-09-13 Accepted:2022-12-25 Online:2022-12-25 Published:2023-01-05

Abstract

Abstract: The good performance of neural machine translation system depends on a large amount of in-domain bilingual parallel data. Domain adaptation is a good solution when the specific domain data is sparse or non-existent. Unsupervised domain adaptation strategies fine-tune the pre-trained translation models by generating pseudo-parallel corpus. However, existing methods do not consider the semantic and emotional characteristics of the languages sufficiently, resulting in a lot of errors and noises in the target domain translation, which affects the cross-domain performance of the model. To alleviate this problem, this paper improves the quality of pseudo-parallel sentence pairs by combining model and data, so as to improve the adaptive ability of the model domain. Firstly, a more reasonable pre-training strategy is proposed to learn more general textual representations of out-domain data, in order to enhance the generalization capability of the model and improve the accuracy of the generated in-domain pseudo- corpus. Then, sentence sentiment features are combined to do posteriori filtering, in order to improve the quality of pseudo-parallel corpus. The experimental results show that, compared with the strong baseline system with back-translation, this method increases the BLEU value by 1.25 and 1.38 respectively in the Chinese-English and English-Chinese translation experiments, thus effectively improving the translation performance.

Key words: neural network, neural machine translation, domain adaptation, model optimization, sentiment information

XIAO Ni-ni, JIN Chang, DUAN Xiang-yu. Unsupervised domain-adapted machine translation based on improving the quality of pseudo-parallel sentence pairs[J]. Computer Engineering & Science, 2022, 44(12): 2230-2237.

[1]	JIANG Yunzhuo, GONG Zhengxian. Document-level neural machine translation based on rhetorical structure [J]. Computer Engineering & Science, 2025, 47(01): 180-190.
[2]	XIN Gao-feng, LIU Yu-xiao, ZHANG Qing-long, HAN Rui, LIU Chi. Block-grained domain adaptation for neural networks at edge [J]. Computer Engineering & Science, 2024, 46(08): 1361-1371.
[3]	FAN Qi, WANG Shan-min, LIU Cheng-guang, LIU Qing-shan. Multi-target domain facial expression recognition based on class-wise feature constraint [J]. Computer Engineering & Science, 2024, 46(05): 836-845.
[4]	SHEN Ying-li, ZHAO Xiao-bing, . A neural machine translation method based on language model distillation [J]. Computer Engineering & Science, 2024, 46(04): 743-751.
[5]	CHEN Huan-huan, WANG Jian, Muhammad Naeem Ul Hassan, . Chinese-Urdu neural machine translation interacting POS sequence prediction in Urdu language [J]. Computer Engineering & Science, 2024, 46(03): 518-524.
[6]	WANG Shan-shan, WANG Meng-zhu, LUO Zhi-gang. A focally discriminative loss for unsupervised domain adaptation method [J]. Computer Engineering & Science, 2024, 46(01): 132-141.
[7]	WANG Yu-lei, XIE Kai-liang, CHEN Si-yun, HU Jie, CHANG Sheng. A universal design on hardware acceleration of convolutional neural networks [J]. Computer Engineering & Science, 2023, 45(04): 577-581.
[8]	MA Zheng, CHU Jun-zheng, WU Peng-fei. A simulated remote sensing image generation method based on adversarial learning [J]. Computer Engineering & Science, 2023, 45(03): 489-494.
[9]	ZHANG Ying-chen, GAO Sheng-xiang, YU Zheng-tao, WANG Zhen-han, MAO Cun-li, . A Chinese-Vietnamese neural machine translation method using the dual representation of BERT and word embedding [J]. Computer Engineering & Science, 2023, 45(03): 546-553.
[10]	WANG Xu, JIA Hao, JI Bai-jun, DUAN Xiang-yu. Neural machine translation based on dictionary model fusion [J]. Computer Engineering & Science, 2022, 44(08): 1481-1487.
[11]	XUE Qing-tian, LI Jun-hui, GONG Zheng-xian, XU Dong-qin. Unsupervised neural machine translation model based on pre-training [J]. Computer Engineering & Science, 2022, 44(04): 730-736.
[12]	YOU Cong-cong, GAO Sheng-xiang, YU Zheng-tao, MAO Cun-li, PAN Run-hai, . A Chinese-Vietnamese neural machine translation method based on synonym data augmentation [J]. Computer Engineering & Science, 2021, 43(08): 1497-1502.
[13]	JIA Cheng-xun, , LAI Hua, YU Zheng-tao, WEN Yong-hua, YU Zhi-qiang, . Pseudo-parallel corpus generation for Chinese-Vietnamese neural machine translation based on pivot language [J]. Computer Engineering & Science, 2021, 43(03): 542-550.
[14]	SHI Xiao-jing, NING Qiu-yi, JI Bai-jun, DUAN Xiang-yu. Enhancing information transfer in neural machine translation [J]. Computer Engineering & Science, 2021, 43(01): 134-141.
[15]	CHEN Cheng1,GUO Wei-bin1,LI Qing-yu2. Adversarial domain adaptation with self-attention in image classification [J]. Computer Engineering & Science, 2020, 42(02): 259-265.

Unsupervised domain-adapted machine translation based on improving the quality of pseudo-parallel sentence pairs

PDF

Knowledge

Abstract

Cite this article

share this article

Related Articles 15

Recommended Articles 0

Metrics

Comments