• 中国计算机学会会刊
  • 中国科技核心期刊
  • 中文核心期刊

计算机工程与科学

• 人工智能与数据挖掘 • 上一篇    下一篇

一种新的基于段向量的文本自动摘要方法

申强强,熊泽宇,熊岳山   

  1. (国防科技大学计算机学院,湖南 长沙 410073)
  • 收稿日期:2018-08-24 修回日期:2018-10-17 出版日期:2019-06-25 发布日期:2019-06-25
  • 基金资助:

    国家自然科学基金(61379103)

A new automatic summarization method
based on paragraph vector

SHEN Qiangqiang,XIONG Zeyu,XIONG Yueshan   

  1. (School of Computer,National University of Defense Technology,Changsha 410073,China)
     
  • Received:2018-08-24 Revised:2018-10-17 Online:2019-06-25 Published:2019-06-25

摘要:

文本自动摘要技术在网页搜索和网页内容推荐等多个领域都有着非常广阔的应用前景。经典的文本摘要算法采用统计学的方法来提取文章关键字,进而提取主题句。这种方法在一定程度上忽略了文本的语义和语法信息。近年来,分布式词向量嵌入技术已经应用到文本检索当中,基于该技术提出了一种词向量化的自动文本摘要方法,该方法主要分为4个步骤:词向量生成、基于词向量的段向量生成、关键词提取和主题句抽取,最终实现文本段落的自动摘要。实验结果表明,改进的文本自动摘要方法能够有效提取主题句。
 

关键词: 文本自动摘要, 词向量, 段向量, 主题句

Abstract:

Automatic text summarization technology has a very broad application prospect in many fields, such as web search and browsing recommendation. The classic text summarization algorithm uses statistical methods to extract article keywords and topic sentences. It ignores semantic and grammatical information of the text to some extent. As distributed word vector embedding technology has been widely used in text summarization in recent years, we propose an automatic text summarization method based on word vector generation. This method mainly includes four modules: word vector generation, paragraph vector generation based on word vector, keyword extraction, and topic sentence extraction, through which an automatic text summarization of the document can finally be achieved. Experimental results show that the improved automatic text summarization method can extract topic sentences effectively.
 

Key words: automatic text summarization, word vector, paragraph vector, topic sentence