• 中国计算机学会会刊
  • 中国科技核心期刊
  • 中文核心期刊

Computer Engineering & Science ›› 2023, Vol. 45 ›› Issue (11): 2060-2069.

• Artificial Intelligence and Data Mining • Previous Articles     Next Articles

A TextRank automatic summarization generation algorithm based on co-occurrence keywords

YAN Hong-can1,2,LI Bo-chu1,GU Jian-tao1,2    

  1. (1.College of Science,North China University of Science and Technology,Tangshan 063210;
    2.Key Laboratory of Data Science and Application of Hebei Province,Tangshan 063000,China)
  • Received:2021-11-09 Revised:2022-07-29 Accepted:2023-11-25 Online:2023-11-25 Published:2023-11-16

Abstract: The traditional TextRank algorithm only considers the similarity between sentences but neglects the similarity between articles themselves when generating summaries, and the generated summaries often contain repeated expressions of information. Therefore, a TextRank algorithm based on co-occurrence keywords is proposed. The article is represented as a sentence vector by word2Vc model. Considering the category of the article, the co-occurrence keywords of this kind of article are taken as parameters to participate in the iterative calculation of sentence weight. The sentence weight obtained by iteration is corrected by sentence length, keyword number and other information. The experimental results show that the proposed algorithm can improve the comprehensiveness and accuracy of the summary generation. At the same time, this algorithm uses MMR to remove the redundancy of abstracts, which improves the problem of repeated representation of abstracts. 

Key words: automatic summary generation, TextRank, co-occurrence keyword, MMR algorithm, word2vec model