• 中国计算机学会会刊
  • 中国科技核心期刊
  • 中文核心期刊

计算机工程与科学 ›› 2024, Vol. 46 ›› Issue (03): 518-524.

• 人工智能与数据挖掘 • 上一篇    下一篇

融合乌尔都语词性序列预测的汉乌神经机器翻译

陈欢欢1,2,王剑1,2,Muhammad Naeem Ul Hassan1,2   

  1. (1.昆明理工大学信息工程与自动化学院,云南 昆明 650500;
    2.昆明理工大学云南省人工智能重点实验室,云南 昆明 650500)
  • 收稿日期:2023-01-11 修回日期:2023-03-22 接受日期:2024-03-25 出版日期:2024-03-25 发布日期:2024-03-18
  • 基金资助:
    国家自然科学基金(62166022,62266028)

Chinese-Urdu neural machine translation interacting POS sequence prediction in Urdu language

CHEN Huan-huan1,2,WANG Jian1,2,Muhammad Naeem Ul Hassan1,2   

  1. (1.Faculty of Information Engineering and Automation,Kunming University of Science and Technology,Kunming 650500;
    2.Key Laboratory of Artificial Intelligence in Yunnan Province,
    Kunming University of Science and Technology,Kunming 650500,China)
  • Received:2023-01-11 Revised:2023-03-22 Accepted:2024-03-25 Online:2024-03-25 Published:2024-03-18

摘要: 面向南亚和东南亚的小语种机器翻译,目前已有不少研究团队开展了深入研究,但作为巴基斯坦官方语言的乌尔都语,由于稀缺的数据资源和与汉语之间的巨大差距,有针对性的汉乌机器翻译方法研究非常稀少。针对这种情况,提出了基于Transformer的融合乌尔都语词性序列的汉乌神经机器翻译模型。首先利用Transformer对目标语言乌尔都语的词性序列进行预测,然后将翻译模型的预测结果和词性序列模型的预测结果相结合进行联合预测,从而实现语言知识到翻译模型的融入。在现有小规模汉乌数据集上的实验表明,所提方法在数据集上的BLEU值相较于基准模型提升了0.13,取得了较为明显的效果。

关键词: Transformer, 神经机器翻译, 乌尔都语, 词性序列

Abstract: At present, many research teams have conducted in-depth research on minority language machine translation for South and Southeast Asia. However, as the official language of Pakistan, Urdu has limited data resources and a significant gap from Chinese, resulting in a lack of targeted research on Chinese-Urdu machine translation methods. To address this issue, this paper proposes a Chinese-Urdu neural machine translation model based on Transformer and incorporating Urdu part-of-speech sequence prediction. Firstly, Transformer is used to predict the part-of-speech sequence of the target language Urdu. Then, the translation model’s prediction results are combined with the part-of-speech sequence prediction model's results to jointly predict the final translation, thereby integrating language knowledge into the translation model. Experimental results on a small-scale Chinese-Urdu dataset show that the proposed method has a BLEU score of 0.13 higher than the baseline model on the dataset, achieving significant improvement.

Key words: Transformer, neural machine translation, Urdu, part of speech sequence