• 中国计算机学会会刊
  • 中国科技核心期刊
  • 中文核心期刊

Computer Engineering & Science

Previous Articles    

Analysis on topic evolution of news comments by
combining word vector and clustering algorithm

LIN Jianghao1,ZHOU Yongmei1,2,YANG Aimin1,2,WANG Wei2   

  1. (1.Laboratory for Language Engineering and Computing,Guangdong University of Foreign
    Studies,Guangzhou 510006;
    2.Cisco School of Informatics,Guangdong University of Foreign Studies,Guangzhou
    510006,China)
  • Received:2016-07-01 Revised:2016-09-05 Online:2016-11-25 Published:2016-11-25

Abstract:

The analysis of topic evolution is regarded as the mining of topic content evolving with
the time. This article, based on the hypothesis that topic content may be embodied by key
words, adopt word2vec for the training of 750 thousand pieces of news and microblog texts
to establish the model of word vector. The text information flow is applied to the model
and all word vectors by time series are acquired. Kmeans is used to cluster the word
vectors before the key words are drawn and the analysis of topic evolution is visualized.
By comparing the effect of the word vector model with those of PLSA or LDA topic models on
drawing topic, the results show that the former is more effective than the latter two
models. In addition, the collection of abundant and varied data can facilitate the training
of the word vector model with better generalization ability and the investigation on real
time analysis of topic evolution.

Key words: topic evolution, word2vec, PLSA, LDA