• 中国计算机学会会刊
  • 中国科技核心期刊
  • 中文核心期刊

Computer Engineering & Science

Previous Articles     Next Articles

A Twitter hotspot mining method based on
 sematic clustering of word vectors
 

LIU Pei-lei,TANG Jin-tao,WANG Ting,XIE Song-xian,YUE Da-peng,LIU Hai-chi   

  1. (College of Computer,National University of Defense Technology,Changsha 410073,China)
  • Received:2016-03-29 Revised:2016-06-07 Online:2018-02-25 Published:2018-02-25

Abstract:

With the rapid development of social media, information overloading becomes a challenge. As a result, how to mining hotspots automatically from so many short and noisy data is an important problem. Social data are real-time and geographic, which usually contain plenty of meta-information. According to these characteristics, this paper proposes a hotspot mining method, which combines user’s behavior patterns and text content analysis. In the process of content analysis, we cluster text on the word scale rather than message scale. Besides, sematic clustering technology of word vectors is used for promoting the performance of keywords extraction. Experimental results on real datasets show that this method is better than traditional methods. Specifically, keywords extracted by this method have strong semantic relevance and good topic segmentation, which are superior to the traditional hot-spot mining methods on the main indexes.
 

Key words: hotspot mining, Twitter, word embedding, semantic clustering