• 中国计算机学会会刊
  • 中国科技核心期刊
  • 中文核心期刊

计算机工程与科学 ›› 2025, Vol. 47 ›› Issue (01): 180-190.

• 人工智能与数据挖掘 • 上一篇    

基于修辞结构的篇章级神经机器翻译

姜云卓,贡正仙   

  1. (苏州大学计算机科学与技术学院,江苏 苏州 215008)

  • 收稿日期:2023-10-30 修回日期:2024-03-18 接受日期:2025-01-25 出版日期:2025-01-25 发布日期:2025-01-18

Document-level neural machine translation based on rhetorical structure

JIANG Yunzhuo,GONG Zhengxian   

  1. (School of Computer Science & Technology,Soochow University,Suzhou 215008,China)
  • Received:2023-10-30 Revised:2024-03-18 Accepted:2025-01-25 Online:2025-01-25 Published:2025-01-18

摘要: 虽然篇章级神经机器翻译发展多年,并取得了长足的进步,但是其大部分工作都是从模型的角度出发,利用上下文字词信息来构建有效的网络结构,忽视了使用跨句子的篇章结构和修辞信息对模型进行指导。针对这一问题,在修辞结构理论的指导下,提出了对篇章单元和修辞结构树特征分别进行编码的方法。实验结果表明,所提方法加强了编码器对篇章结构和修辞上的表征能力,使用该方法对模型进行改进后,其翻译结果在多个数据集上都获得了明显提升,性能超过了多个优质的基线模型,并且在提出的定量评估方法和人工分析中译文质量上也表现出了明显改善。

关键词: 神经机器翻译, 篇章分析, 篇章翻译, 修辞结构理论

Abstract: Despite years of development and significant progress in document-level neural machine translation, most efforts have focused on building effective network structures from a model perspective by utilizing contextual word information, neglecting the guidance of cross-sentence discourse structure and rhetorical information for the model. Addressing this issue, under the guidance of Rhetorical Structure Theory, a method for separately encoding discourse units and rhetorical structure tree features is proposed. Experimental results show that the proposed  method enhances the encoders ability to represent discourse structure and rhetorical aspects. The improved model surpasses several high-quality baseline models, achieving notable improvements in translation performance across multiple datasets. Additionally, significant improvements in translation quality are demonstrated through the proposed quantitative evaluation method and human analysis.

Key words: neural machine translation, discourse analysis, document-level translation, rhetorical structure theory