• 中国计算机学会会刊
  • 中国科技核心期刊
  • 中文核心期刊

计算机工程与科学

• 人工智能与数据挖掘 • 上一篇    下一篇

融合多信息句子图模型的多文档摘要抽取

蒋亚芳1,2,严馨1,2,徐广义3,周枫1,2,邓忠莹1,2   

  1. (1.昆明理工大学信息工程与自动化学院,云南 昆明 650500;
    2.昆明理工大学云南省人工智能重点实验室,云南 昆明 650500;
    3.云南南天电子信息产业股份有限公司,云南 昆明 650041)
     
  • 收稿日期:2019-07-01 修回日期:2019-09-11 出版日期:2020-03-25 发布日期:2020-03-25
  • 基金资助:

    国家自然科学基金(61462055,61562049)

Multi-document summarization extraction based
on multi-information sentence graph model

JIANG Ya-fang1,2,YAN Xin1,2,XU Guang-yi3,ZHOU Feng1,2,DENG Zhong-ying1,2   

  1. (1.Faculty of Information Engineering and Automation, Kunming University of Science and Technology,Kunming 650500;
    2.Yunnan Key Laboratory of Artificial Intelligence,Kunming University of Science and Technology,Kunming 650500;
    3.Yunnan Nantian Electronic Information Industry Co.,Ltd.,Kunming  650041,China)
     
  • Received:2019-07-01 Revised:2019-09-11 Online:2020-03-25 Published:2020-03-25

摘要:

针对现有多文档抽取方法不能很好地利用句子主题信息和语义信息的问题,
提出一种融合多信息句子图模型的多文档摘要抽取方法。首先,以句子为节点,构建句子图模型;然后,将基于句子的贝叶斯主题模型和词向量模型得到的句子主题概率分布和句子语义相似度相融合,得到句子最终的相关性,结合主题信息和语义信息作为句子图模型的边权重;最后,借助句子图最小支配集的摘要方法来描述多文档摘要。该方法通过融合多信息的句子图模型,将句子间的主题信息、语义信息和关系信息相结合。实验结果表明,该方法能够有效地改进抽取摘要的综合性能。
 
 

关键词: 多文档摘要, 句子贝叶斯主题模型, 词向量, 句子图模型, 最小支配集

Abstract:

In view of the problem that the existing multi-document extraction method cannot make good use of sentence topic information and semantic information, this paper proposes a multi-document summarization extraction method that integrates multi-information sentence graph model. Firstly, a sentence graph model with sentences as nodes is constructed. Secondly, the Bayesian topic model based on sentences and the word vector model are combined to get the probability distribution of sentence topic and the semantic similarity of sentences, and the final relevance of sentences is obtained. The topic information and semantic information are used as the edge weights of the sentence graph model. Finally, the summary of the multi-document is described by the summary method of the minimum dominance set of the sentence graph. This method combines the topic information, semantic information and relationship information between sentences by integrating the multi-information sentence graph model. The experimental results show that the method can effectively improve the comprehensive performance of the summarization extraction.
 

Key words: multi-document summarization, sentence Bayesian theme model, word vector, sentence graph model, minimum dominating set