• 中国计算机学会会刊
  • 中国科技核心期刊
  • 中文核心期刊

计算机工程与科学

• 人工智能与数据挖掘 • 上一篇    下一篇

基于图分析方法和余弦相似性的主题检测研究

马长林,程梦丽,王涛   

  1. (华中师范大学计算机学院,湖北 武汉 430079)
  • 收稿日期:2018-08-19 修回日期:2018-10-13 出版日期:2019-04-25 发布日期:2019-04-25

Topic detection based on graph
analytical method and cosine similarity
 

MA Changlin,CHENG Mengli,WANG Tao   

  1. (School of Computer,Central China Normal University,Wuhan 430079,China)
  • Received:2018-08-19 Revised:2018-10-13 Online:2019-04-25 Published:2019-04-25

摘要:

如何从海量文本中自动提取有价值的主题信息已成为重要的技术挑战,当下的研究方法大多数是在假设主题相互独立的前提下进行的,但实际上主题与主题之间有着复杂的内在联系。为解决以上问题,将相关性理论与改进的图分析方法相结合,基于主题相关性和术语共现性对主题检测进行建模,高精度语义信息和潜在共现关系同时被用于主题检测,来发现重要且有意义的主题和趋势,仿真实验验证了本文模型的有效性。

关键词: 主题检测, 图分析方法, 余弦相似性

Abstract:

How to automatically extract valuable topic information from massive texts has become an important technical challenge. Currently, most methods carry out their research under the assumption that topics are independent. However, there are complicated inherent relationships between topics. In order to solve the abovementioned problem, we combine the correlated theory with an improved graph analytical approach to model topic detection based on topic correlation and term co-occurrence. Semantic information with high accuracy and potential co-occurrence relationship are simultaneously considered for topic detection to discover important and meaningful topics and trends. Simulation results verify the validity of the proposed model.

Key words: topic detection, graph analytical method, cosine similarity