• 中国计算机学会会刊
  • 中国科技核心期刊
  • 中文核心期刊

J4 ›› 2016, Vol. 38 ›› Issue (06): 1257-1261.

• 论文 • 上一篇    下一篇

基于混合方法的中文微博自动摘要技术研究

高永兵,钟振华,王宇,马占飞   

  1. (内蒙古科技大学信息工程学院,内蒙古 包头 014000)
  • 收稿日期:2015-06-03 修回日期:2015-08-01 出版日期:2016-06-25 发布日期:2016-06-25
  • 基金资助:

    国家自然科学基金(61163025);内蒙古自治区自然科学基金(2015MS0621)

Automatic summarization of Chinese
microblog based on a hybrid method  

GAO Yongbing,ZHONG Zhenhua,WANG Yu,MA Zhanfei   

  1. (College of Information Engineering,Inner Mongolia University of Science and Technology,Baotou 014000,China)
  • Received:2015-06-03 Revised:2015-08-01 Online:2016-06-25 Published:2016-06-25

摘要:

针对微博内容驳杂、信息稀疏的问题,深入研究传统自动摘要技术,结合微博数据特点,在微博事件提取的基础上提出一种基于统计和理解的混合摘要方法。首先根据词频、句子位置等文本特征得到基于统计的初始摘要;然后通过语义词典,计算句子相似度、确定事件主体进行基于语义理解的可读性加工,使最终摘要更具可读性;最后采用合理的摘要评价方法评价所得摘要。实验结果表明,该方法在不同压缩比例下均能获得质量稳定且可读性良好的摘要。

关键词: 微博事件, 事件价值, 可读性, 自动摘要

Abstract:

Microblog features complex contents and sparse information. In order to solve these problems, on the basis of indepth study on traditional automatic abstract techniques, combing with the data of microblog features, we propose a hybrid automatic summarization method based on statistics and comprehension for microblog event extraction. Firstly, we obtain the initial abstract based on the statistics according to word frequency and the location of sentences. Then we calculate sentence similarity through the semantic dictionary, determine the event subject, process the semantic understanding based readability, and make the final abstract more readable. Finally, a reasonable abstract evaluation method is adopted to evaluate the obtained abstract. Experimental results show that the proposed method can obtain a good summary of stable quality and readability under different compression ratios.

Key words: micro-blog event;event value;readablity;automatic summarization