Computer Engineering & Science ›› 2021, Vol. 43 ›› Issue (08): 1497-1502.
Previous Articles Next Articles
YOU Cong-cong1,2,GAO Sheng-xiang1,2,YU Zheng-tao1,2,MAO Cun-li1,2,PAN Run-hai1,2
Received:
Revised:
Accepted:
Online:
Published:
Abstract: The scarcity of resources in the Chinese-Vietnamese parallel corpus greatly affects the effect of Chinese-Vietnamese machine translation. Data enhancement is an effective way to improve Chinese-Vietnamese machine translation. Bilingual dictionary-based vocabulary replacement and data enhancement is currently a more popular method. Since Chinese-Vietnamese bilingualism is a low-resource languages, bilingual dictionaries are difficult to obtain, and synonyms for low-frequency words are easier to obtain from monolingual word vectors. Therefore, we propose a data enhancement method based on synonym replacement of low-frequency words. This method uses a small-scale parallel corpus. Firstly, by learning monolingual word vectors, a synonym list of low-frequency words at one end is obtained. Then, low-frequency words are replaced with synonyms. Secondly, the language model is used to filter the replaced sentences. Finally, The filtered sentence is matched with the sentence in the language on the other side to obtain an extended parallel corpus. The experimental results of Chinese-Vietnamese translation experiments show that the proposed method achieves good results, and the extended method improves the BLEU value by 1.8 and 1.1, compared with the baseline and back translation methods.
Key words: Chinese-Vietnamese, data augmentation, synonym substitution, neural machine translation
YOU Cong-cong, GAO Sheng-xiang, YU Zheng-tao, MAO Cun-li, PAN Run-hai, . A Chinese-Vietnamese neural machine translation method based on synonym data augmentation[J]. Computer Engineering & Science, 2021, 43(08): 1497-1502.
0 / / Recommend
Add to citation manager EndNote|Ris|BibTeX
URL: http://joces.nudt.edu.cn/EN/
http://joces.nudt.edu.cn/EN/Y2021/V43/I08/1497