• 中国计算机学会会刊
  • 中国科技核心期刊
  • 中文核心期刊

计算机工程与科学 ›› 2022, Vol. 44 ›› Issue (03): 495-501.

• 人工智能与数据挖掘 • 上一篇    下一篇

基于朴素贝叶斯分类的网络谣言识别研究

李文丽   

  1. (上海大学管理学院,上海 200444)

  • 收稿日期:2020-04-15 修回日期:2020-10-29 接受日期:2022-03-25 出版日期:2022-03-25 发布日期:2022-03-24
  • 基金资助:

Network rumor recognition based on naive Bayesian classification 

LI Wen-li   

  1. (School of Management,Shanghai University,Shanghai 200444,China)
  • Received:2020-04-15 Revised:2020-10-29 Accepted:2022-03-25 Online:2022-03-25 Published:2022-03-24

摘要: 谣言的传播会破坏社会秩序、危害国家稳定、造成大众恐慌,而社交平台的广泛应用使得信息传播速度更快、波及范围更广,加大了谣言造成的负面影响,如何快速准确地识别网络谣言成为信息传播领域的热点问题。谣言识别本质上是一个二分类问题,因而基于贝叶斯分类的思想设计了网络谣言识别的朴素贝叶斯分类算法,利用Matlab软件构建朴素贝叶斯分类器,并采用从微博中收集的数据对该算法进行实验验证,通过控制训练集,对比识别结果的准确率、精确率、召回率和F1值,探究了不同训练条件下的朴素贝叶斯分类器对谣言与非谣言的识别情况和内含规律。研究表明,朴素贝叶斯分类器对于网络谣言识别具有有效性,且训练集的选取与控制对识别结果的影响较大,识别准确率随着训练条件的不同发生波动。

关键词: 朴素贝叶斯分类, 谣言识别, 机器学习

Abstract: Rumors spread can destroy social order, endanger national stability and cause public panic. The wide application of social platforms makes information spread faster and more widely, increasing the negative impact caused by rumors. How to quickly and accurately identify online rumors has become a hot issue in the field of information dissemination. Rumor recognition is a binary classification problem. Therefore, based on the idea of Bayesian classification, a Naive Bayesian classification algorithm for network rumor recognition is designed. The naive Bayesian classifier is constructed by Matlab software, and the algorithm is verified by experiments with data collected from microblogs. By controlling the training set, the accuracy, precision, recall rate and F1 value of the identification results are compared, and the identification situation and inherent laws of the naive Bayesian classifier for rumor and non-rumor under different training conditions are explored. The research shows that naive Bayesian classifier is effective for online rumor identification, the selection and control of training sets have great influence on the identification results, and the identification accuracy fluctuates with different training conditions. 

Key words: Naive Bayesian classification, rumor identification, machine learning