• 中国计算机学会会刊
  • 中国科技核心期刊
  • 中文核心期刊

Computer Engineering & Science

Previous Articles     Next Articles

Individual microblog clustering by
semantic correlation based on HowNet

GAO Yongbing1,SONG Tianshu1,2,LI Jiangyu1,MA Zhanfei3   

  1. (1.School of Information Engineering,Inner Mongolia University of Science and Technology,Baotou 014010;
    2.School of Computer Science and Engineering,Guilin University of Aerospace Technology,Guilin 541004;
    3.Department of Computer,Baotou Teachers’ College,Baotou 014030,China)

     
  • Received:2018-03-06 Revised:2018-10-17 Online:2019-06-25

Abstract:

Individual microblogs with large clustering correlation enable a quick understanding of  bloggers' professional interests and experiences. Existing short text clustering methods lack sufficient consideration of the correlation between semantics and sentences. We propose a novel individual microblog clustering method according to semantic correlation based on the HowNet. The main steps are as follows: (1) use the skipgram to train a large number of microblog texts to generate word vectors; (2) according to original semantic senses of words to eliminate ambiguity in the sentence; (3) calculate the similarity of words and sentences between microblogs respectively and get the correlation metrics; (4) cluster individual microblogs according to the microblog correlation. Experimental results show that the proposed clustering method outperforms the hierarchical clustering method and density clustering method.
 
 

Key words: individual microblog, HowNet, semantics, clustering, disambiguation