• 中国计算机学会会刊
  • 中国科技核心期刊
  • 中文核心期刊

计算机工程与科学 ›› 2026, Vol. 48 ›› Issue (2): 309-318.

• 人工智能与数据挖掘 • 上一篇    下一篇

基于改进的PEGASUS模型与自适应纠错机制的双阶段文本摘要方法

张航,吴俊


  

  1. (扬州大学信息与人工智能学院(工业软件学院),江苏 扬州 225000)

  • 收稿日期:2024-10-25 修回日期:2025-02-20 出版日期:2026-02-25 发布日期:2026-03-10

A two-stage text summarization method based on an improved PEGASUS model and adaptive error correction mechanism

ZHANG Hang,WU Jun   

  1. (School of Information and Artificial Intelligence(College of Industrial Software),Yangzhou University,Yangzhou 225000,China)
  • Received:2024-10-25 Revised:2025-02-20 Online:2026-02-25 Published:2026-03-10

摘要: 为解决抽取式摘要的词语冗余、可读性差以及生成式摘要的语义混乱、逻辑性差和暴露偏差等问题,提出了一种基于改进的PEGASUS模型与自适应纠错机制的双阶段文本摘要方法,采取混合式摘要生成技术。在抽取阶段,利用BERT模型获取文本向量,并结合Bi-GRU与图结构,通过改进的MMR算法有效减少候选摘要的冗余,提高摘要的精确性。在生成阶段,利用PEGASUS模型处理抽取的句子,结合层次聚类技术并引入自适应纠错机制解决了未登录词(OOV)问题,并采用对比学习框架显著降低了暴露偏差。实验结果表明,该方法所建立的模型在NLPCC数据集上ROUGE指标显著提升,与现有混合式方法的模型相比,各指标分别平均提高2.66个百分点、0.84个百分点和1.81个百分点,提高了摘要质量,并在解决未登录词和暴露偏差问题上表现出优越性能。


关键词: 混合式摘要, BERT模型, PEGASUS模型, 层次聚类;自适应纠错机制;对比学习框架

Abstract: Abstract:To address the issues of word redundancy and poor readability in extractive summarization, as well as semantic confusion, logical inconsistency, and exposure bias in abstractive summarization, this paper proposes a two-stage text summarization method based on an improved PEGASUS model and an adaptive error correction mechanism, employing a hybrid summarization technique. In the extraction stage, text vectors are obtained using the BERT model, combined with a Bi-GRU  and a graph structure. An improved MMR algorithm is utilized to effectively reduce redundancy in candidate summaries, enhancing summary precision. In the generation stage, the extracted sentences are processed by the PEGASUS model, incorporating hierarchical clustering technology and introducing an adaptive error correction mechanism to solve the out-of-vocabulary (OOV) problem. Additionally, a contrastive learning framework is adopted to significantly mitigate exposure bias. Experimental results demonstrate that the model established by our method achieves significant improvements in ROUGE scores on the NLPCC dataset, with average increases of 2.66 percentage points, 0.84 percentage points, and 1.81 percentage points  across various metrics compared to models established by existing hybrid methods. This method not only improves summary quality but also exhibits superior performance in resolving OOV problem and exposure bias.

Key words: hybrid summarization, BERT model, PEGASUS model, hierarchical clustering, adaptive error correction mechanism, contrastive learning framework