• 中国计算机学会会刊
  • 中国科技核心期刊
  • 中文核心期刊

Computer Engineering & Science

Previous Articles     Next Articles

Microblogging opinion analysis based
on an improved K-means algorithm

XIE Xiu-juan1,LI Xiang-ju1,MO Ling-fei2   

  1. (1.Department of Computer Engineering,Southeast University Chengxian College,Nanjing 210000;
    2.School of Instrument Science and Engineering,Southeast University,Nanjing 210000,China)
  • Received:2016-02-22 Revised:2016-06-16 Online:2018-01-25 Published:2018-01-25

Abstract:

In order to avoid selecting isolated points as the initial clustering center which can cause clustering results to fall into local optimum, we propose a new K-means (clustering algorithm) initial clustering center selection method based on density. This algorithm firstly calculates the average similarity between each data object and the others, and finds the core objects whose average similarities are higher than a fixed threshold. The least similar core object to each other is taken as the initial clustering center. We build a crawler for Sina Microblog to grab thousands of different types of data. After dividing words, pretreatment and weight calculation, we use the improved K-means algorithm for clustering analysis. Compared with the traditional K-means algorithm, our proposal has a more stable precision/full ratio, and the average clustering time is also shortened. Experimental results show that the improved algorithm has higher accuracy and better stability in microblog clustering, and can be used in discovering public opinion from a large number of microblog data.
 
 

Key words: microblog, clustering center, K-means clustering algorithm, density