• 中国计算机学会会刊
  • 中国科技核心期刊
  • 中文核心期刊

J4 ›› 2015, Vol. 37 ›› Issue (02): 402-409.

• 论文 • 上一篇    下一篇

微博用户的个性分类分析

张岩峰1,陈长松1,杨涛1,左俐俐2,丁飞1   

  1. (1.公安部第三研究所,上海 200031;2.中石化管理干部学院,北京 100021)
  • 收稿日期:2013-09-02 修回日期:2013-11-08 出版日期:2015-02-25 发布日期:2015-02-25

Personality classification analysis for micro-blog users 

ZHANG Yanfeng1,CHEN Changsong1,YANG Tao1,ZUO Lili2,DING Fei1   

  1. (1.The Third Research Institute,the Ministry of Public Security,Shanghai 200031;
    2.Sinopec Management Institute,Beijing 100021,China)
  • Received:2013-09-02 Revised:2013-11-08 Online:2015-02-25 Published:2015-02-25

摘要:

社交网络给每个社会中的人提供了自由表达个人情感、观点、兴趣、建议等言论的平台。用户在这些平台上发表的言论、所做的行为以及用户在平台上建立的社交圈子也给数据挖掘带来了新的数据和机会。提出了一种利用用户在微博上的公开数据信息实现对该用户的MBTI个性维度进行分类分析的方法。在该方法中,基于对用户微博数据的分析,提出了能够表征用户心理和行为的文本和非文本特征,然后采用三种机器学习的分类方法—提升决策树、支持向量机和贝叶斯逻辑递归来对微博用户的个性进行分类分析。实验结果表明,通过对微博数据的挖掘可以在不同MBTI个性维度上达到75%~90%的准确率。

关键词: 社交网络, 微博, 个性分类, 提升决策树, 支持向量机, 贝叶斯逻辑递归

Abstract:

Social networks provide people a platform for expressing their own emotion,viewpoints,interest and suggestion.The users’texts,behaviors and social circles bring new challenges for data mining.In this article,we propose a new method through which we can classify and predict micro-blog users’MBTI personality values by utilizing their shared microblog information. Firstly, based on the analysis of microblog users’ information, texts and nontext features,which can indicate users’psycholinguistic and behavioral characteristics,are extracted.And then three classification methods—AdaBoost decision tree,support vector machine and Bayesian logistic regression,are adopted to classify the personalities of micro-blog users.Extensive experimental results indicate that the prediction accuracy are between 75% and 90% for the four different personality dimensions of MBTI.

Key words: social network;micro-blog;personality classification;AdaBoost decision tree;support vector machine;Bayesian logistic regression