• 中国计算机学会会刊
  • 中国科技核心期刊
  • 中文核心期刊

Computer Engineering & Science ›› 2022, Vol. 44 ›› Issue (12): 2230-2237.

• Artificial Intelligence and Data Mining • Previous Articles     Next Articles

Unsupervised domain-adapted machine translation based on improving the quality of pseudo-parallel sentence pairs

XIAO Ni-ni,JIN Chang,DUAN Xiang-yu   

  1. (Natural Language Processing Laboratory,School of Computer Science and Technology,
    Soochow University,Suzhou 215006,China)
  • Received:2021-04-26 Revised:2021-09-13 Accepted:2022-12-25 Online:2022-12-25 Published:2023-01-05

Abstract: The good performance of neural machine translation system depends on a large amount of in-domain bilingual parallel data. Domain adaptation is a good solution when the specific domain data is sparse or non-existent. Unsupervised domain adaptation strategies fine-tune the pre-trained translation models by generating pseudo-parallel corpus. However, existing methods do not consider the semantic and emotional characteristics of the languages sufficiently, resulting in a lot of errors and noises in the target domain translation, which affects the cross-domain performance of the model. To alleviate this problem, this paper improves the quality of pseudo-parallel sentence pairs by combining model and data, so as to improve the adaptive ability of the model domain. Firstly, a more reasonable pre-training strategy is proposed to learn more general textual representations of out-domain data, in order to enhance the generalization capability of the model and improve the accuracy of the generated in-domain pseudo- corpus. Then, sentence sentiment features are combined to do posteriori filtering, in order to improve the quality of pseudo-parallel corpus. The experimental results show that, compared with the strong baseline system with back-translation, this method increases the BLEU value by 1.25 and 1.38 respectively in the Chinese-English and English-Chinese translation experiments, thus effectively improving the translation performance.

Key words: neural network, neural machine translation, domain adaptation, model optimization, sentiment information