• 中国计算机学会会刊
  • 中国科技核心期刊
  • 中文核心期刊

计算机工程与科学 ›› 2025, Vol. 47 ›› Issue (5): 931-939.

• 人工智能与数据挖掘 • 上一篇    下一篇

低资源场景下的汉语—传统蒙古语跨语言摘要方法研究

班琪1,2,云静1,2,邓磊1,2   

  1. (1.内蒙古工业大学数据科学与应用学院(网络空间安全学院),内蒙古 呼和浩特 010080;
    2.内蒙古自治区基于大数据的软件服务工程技术研究中心,内蒙古 呼和浩特 010080)

  • 收稿日期:2024-08-15 修回日期:2024-08-29 出版日期:2025-05-25 发布日期:2025-05-27
  • 基金资助:
    国家自然科学基金(62062055); 内蒙古高校青年科技英才项目(NJYT24061); 内蒙古自治区直属高校基本科研业务费(JY20220249)

Research on Chinese—traditional Mongolian cross-lingual summarization methods in low-resource scenarios

BAN Qi1,2,YUN Jing1,2,DENG Lei1,2   

  1. (1.College of Data Science and Application(College of Cyber Security),
    Inner Mongolia University of Technology,Hohhot 010080;
    2.Inner Mongolia Autonomous Region Engineering & Technology Research Center of
     Big Data based Software Service,Hohhot 010080,China)
  • Received:2024-08-15 Revised:2024-08-29 Online:2025-05-25 Published:2025-05-27

摘要: 跨语言摘要任务旨在给定一种语言的源文档(如中文)生成目标语言(如传统蒙古文)的摘要。传统的多任务框架通常采用序列到序列的网络,应用多个专用于各特定任务的解码器。然而,在将文档从一种语言提炼为另一种具有不同形态和结构特性语言的摘要时,多任务框架无法有效捕捉和理解2种语言之间的关系和差异。特别是对于传统蒙古语,其形态变化繁杂、构词形式多样的特点,使得低资源下语言特征的学习和处理变得更加困难。为了解决这一问题,提出一种在多任务框架中引入一致性学习的跨语言摘要模型。通过计算源语言摘要和生成的目标语言摘要之间概率分布差异的距离度量进行一致性建模,在交叉熵损失和一致性损失的约束下优化跨语言摘要模型。此外,构建了一个中—蒙跨语言摘要数据集,在此数据集上获得了有竞争力的ROUGE分数,表明了所提模型在资源匮乏情况下的有效性。

关键词: 中—蒙跨语言摘要, 一致性学习, 低资源

Abstract: The cross-langual summarization aims to generating a summary in the target language (such as traditional Mongolian) given a source document in one language (such as Chinese).Typically,traditional multi-task frameworks employ sequence-to-sequence networks,which apply multiple decoders,each dedicated to a specific task.However,when documentation is translated from one language into another,the above structures cannot effectively capture and understand the relationships and differences between the two languages due to the different morphological and structural characteristics of both languages.This is particularly evident in the case of traditional Mongolian,where its complex morphological changes and diverse word formation patterns make the learning and processing of language features under low-resource conditions challenging.To address this challenge,we propose a cross-lingual summarization model that embeds consistency learning into a multi-task framework.Model consistency by calculating the distance metric of the probability distribution difference between the source language summary and the generated target language summary.Subsequently,the cross-lingual summarization model is optimized under the constraints of both cross-entropy loss and consistency loss.Furthermore,we built a Chinese—Mongolian cross-lingual summarization dataset.The competitive ROUGE scores obtained on this dataset demonstrate the effectiveness of the proposed model in resource-poor conditions.


Key words: Chinese—Mongolian cross-lingual summarization, consistency learning, low-resource