• 中国计算机学会会刊
  • 中国科技核心期刊
  • 中文核心期刊

Computer Engineering & Science ›› 2021, Vol. 43 ›› Issue (08): 1497-1502.

Previous Articles     Next Articles

A Chinese-Vietnamese neural machine translation method based on synonym data augmentation

YOU Cong-cong1,2,GAO Sheng-xiang1,2,YU Zheng-tao1,2,MAO Cun-li1,2,PAN Run-hai1,2   

  1. (1.Faculty of Information Engineering and Automation,Kunming University of Science and Technology,Kunming 650500;

    2.Yunnan Key Laboratory of Artificial Intelligence,Kunming University of Science and Technology,Kunming 650500,China)

  • Received:2020-02-18 Revised:2020-07-12 Accepted:2021-08-25 Online:2021-08-25 Published:2021-08-24

Abstract: The scarcity of resources in the Chinese-Vietnamese parallel corpus greatly affects the effect of Chinese-Vietnamese machine translation. Data enhancement is an effective way to improve Chinese-Vietnamese machine translation. Bilingual dictionary-based vocabulary replacement and data enhancement is currently a more popular method. Since Chinese-Vietnamese 
bilingualism is a low-resource languages, bilingual dictionaries are difficult to obtain, and synonyms for low-frequency words are easier to obtain from monolingual word vectors. Therefore, we propose a data enhancement method based on synonym replacement of low-frequency words. This method uses a small-scale parallel corpus. Firstly, by learning monolingual word vectors, a synonym list of low-frequency words at one end is obtained. Then, low-frequency words are replaced with synonyms. Secondly, the language model is used to filter the replaced sentences. Finally, The filtered sentence is matched with the sentence in the language on the other side to obtain an extended parallel corpus. The experimental results of Chinese-Vietnamese translation experiments show that the proposed method achieves good results, and the extended method improves the BLEU value by 1.8 and 1.1, compared with the baseline and back translation methods.


Key words: Chinese-Vietnamese, data augmentation, synonym substitution, neural machine translation