Microblogging opinion analysis based
on an improved K-means algorithm

Computer Engineering & Science

Previous Articles Next Articles

Microblogging opinion analysis based

on an improved K-means algorithm

XIE Xiu-juan1，LI Xiang-ju1，MO Ling-fei2

（1.Department of Computer Engineering,Southeast University Chengxian College,Nanjing 210000;

2.School of Instrument Science and Engineering,Southeast University,Nanjing 210000,China）

Received:2016-02-22 Revised:2016-06-16 Online:2018-01-25 Published:2018-01-25

Abstract

Abstract:

In order to avoid selecting isolated points as the initial clustering center which can cause clustering results to fall into local optimum, we propose a new K-means (clustering algorithm) initial clustering center selection method based on density. This algorithm firstly calculates the average similarity between each data object and the others, and finds the core objects whose average similarities are higher than a fixed threshold. The least similar core object to each other is taken as the initial clustering center. We build a crawler for Sina Microblog to grab thousands of different types of data. After dividing words, pretreatment and weight calculation, we use the improved K-means algorithm for clustering analysis. Compared with the traditional K-means algorithm, our proposal has a more stable precision/full ratio, and the average clustering time is also shortened. Experimental results show that the improved algorithm has higher accuracy and better stability in microblog clustering, and can be used in discovering public opinion from a large number of microblog data.

Key words: microblog, clustering center, K-means clustering algorithm, density

XIE Xiu-juan1，LI Xiang-ju1，MO Ling-fei2.

Microblogging opinion analysis based

on an improved K-means algorithm

[J]. Computer Engineering & Science.

[1]	LU Jianyun, SHAO Junming. A clustering algorithm based on the multi-level density center graph [J]. Computer Engineering & Science, 2025, 47(2): 327-335.
[2]	WU Peicheng, ZHAO Xujun, JIN Lizhong. Anomaly detection of stream data based on grid density stacking [J]. Computer Engineering & Science, 2025, 47(1): 75-85.
[3]	YU Ding-cui, LUO Long-fei, SONG Yun-peng, LI Wen-tong, SHI Liang. Exploration of memory page size for high-density flash memory [J]. Computer Engineering & Science, 2024, 46(7): 1167-1174.
[4]	YU Qin, Wu Fei, ZHANG Meng, XIE Chang-sheng. A survey of error correction codes in holographic storage [J]. Computer Engineering & Science, 2024, 46(4): 571-579.
[5]	ZHONG Zhuo-hui, CHEN Li-fei, . A model-based non-convex clustering algorithm [J]. Computer Engineering & Science, 2024, 46(2): 292-302.
[6]	ZHAO Jia-bin, XU Hui-ying, ZHU Rong, CHEN Bin, WANG Xiao-Lin, , ZHU Xin-zhong. A MFFBSNet crowd counting algorithm based on multi-scale feature fusion and background suppression [J]. Computer Engineering & Science, 2024, 46(12): 2205-2214.
[7]	WANG Ruo-bin, GENG Fang-dong, ZHANG Yong-mei, SONG Wei, WANG Wei-feng, XU Lin. Blended MOOC video viewing pattern mining based on an improved self-adaptive DBSCAN [J]. Computer Engineering & Science, 2023, 45(9): 1670-1678.
[8]	CHEN Biao, CHEN Cai, ZHANG Kun, YE Qin. Research on factors of heat dissipation of CPU chips in FCBGA package [J]. Computer Engineering & Science, 2023, 45(3): 406-410.
[9]	LI Chao, TU Guo-qing, . An optimization method of high density LoRa network [J]. Computer Engineering & Science, 2023, 45(3): 426-433.
[10]	LI Rong-chun, ZHOU Xin, WANG Qing-lin, MEI Song-zhu. High-throughput parallel baseband processing algorithms based on GPUs for satellite communication [J]. Computer Engineering & Science, 2023, 45(10): 1720-1730.
[11]	LI Lan, LIU Jie, ZHANG Jie. A complex pedestrian detection model based on improved YOLOv4 algorithm [J]. Computer Engineering & Science, 2022, 44(8): 1449-1456.
[12]	DUAN Ling, GUO Jun-jun, YU Zheng-tao, XIANG Yan, . Aspect identification of microblog cases based on the interactive attention of contents and comments [J]. Computer Engineering & Science, 2022, 44(6): 1097-1104.
[13]	SHEN Guo-xin, JIANG Zhong-yun. A Canopy bisecting K-Means algorithm based on density and central index [J]. Computer Engineering & Science, 2022, 44(2): 372-380.
[14]	WANG Chun-dong, ZHANG Hui, MO Xiu-liang, YANG Wen-jun. Overview on sentiment analysis of microblog [J]. Computer Engineering & Science, 2022, 44(1): 165-175.
[15]	CHENG Yu-sheng, CAO Tian-cheng, WANG Yi-bin, ZHENG Wei-jie. An imbalanced multi-label learning algorithm based on negative correlation enhancement [J]. Computer Engineering & Science, 2021, 43(9): 1700-1710.

Microblogging opinion analysis based

on an improved K-means algorithm

PDF

Knowledge

Abstract

Cite this article

share this article

Related Articles 15

Recommended Articles

Metrics

Comments