• 中国计算机学会会刊
  • 中国科技核心期刊
  • 中文核心期刊

Computer Engineering & Science ›› 2026, Vol. 48 ›› Issue (2): 309-318.

• Artificial Intelligence and Data Mining • Previous Articles     Next Articles

A two-stage text summarization method based on an improved PEGASUS model and adaptive error correction mechanism

ZHANG Hang,WU Jun   

  1. (School of Information and Artificial Intelligence(College of Industrial Software),Yangzhou University,Yangzhou 225000,China)
  • Received:2024-10-25 Revised:2025-02-20 Online:2026-02-25 Published:2026-03-10

Abstract: Abstract:To address the issues of word redundancy and poor readability in extractive summarization, as well as semantic confusion, logical inconsistency, and exposure bias in abstractive summarization, this paper proposes a two-stage text summarization method based on an improved PEGASUS model and an adaptive error correction mechanism, employing a hybrid summarization technique. In the extraction stage, text vectors are obtained using the BERT model, combined with a Bi-GRU  and a graph structure. An improved MMR algorithm is utilized to effectively reduce redundancy in candidate summaries, enhancing summary precision. In the generation stage, the extracted sentences are processed by the PEGASUS model, incorporating hierarchical clustering technology and introducing an adaptive error correction mechanism to solve the out-of-vocabulary (OOV) problem. Additionally, a contrastive learning framework is adopted to significantly mitigate exposure bias. Experimental results demonstrate that the model established by our method achieves significant improvements in ROUGE scores on the NLPCC dataset, with average increases of 2.66 percentage points, 0.84 percentage points, and 1.81 percentage points  across various metrics compared to models established by existing hybrid methods. This method not only improves summary quality but also exhibits superior performance in resolving OOV problem and exposure bias.

Key words: hybrid summarization, BERT model, PEGASUS model, hierarchical clustering, adaptive error correction mechanism, contrastive learning framework