• 中国计算机学会会刊
  • 中国科技核心期刊
  • 中文核心期刊

计算机工程与科学 ›› 2024, Vol. 46 ›› Issue (04): 752-760.

• 人工智能与数据挖掘 • 上一篇    

基于用户权威度和多特征融合的微博谣言检测模型

许莉芬1,曹霑懋1,郑明杰1,肖博健2   

  1. (1.华南师范大学计算机学院,广东 广州 510631;2.华南师范大学人工智能学院,广东 佛山 528200)

  • 收稿日期:2023-09-04 修回日期:2023-10-22 接受日期:2024-04-25 出版日期:2024-04-25 发布日期:2024-04-18

A microblog rumor detection model based on user authority and multi-feature fusion

XU Li-fen1,CAO Zhan-mao1,ZHENG Ming-jie1,XIAO Bo-jian2   

  1. (1.School of Computer Science,South China Normal University,Guangzhou 510631;
    2.School of Artificial Intelligence,South China Normal University,Foshan 528200,China)
  • Received:2023-09-04 Revised:2023-10-22 Accepted:2024-04-25 Online:2024-04-25 Published:2024-04-18

摘要: 网络谣言的广泛传播及其对社会的负面影响急切需要高效的谣言检测模型。由于数据集的文本缺乏语义信息和严格的句法结构,结合用户特征和语境特征来丰富语义信息显得很有意义。对此,提出一种基于用户权威度和多特征融合的微博谣言检测模型MRUAMF。首先,抽取出用户信息完整度、用户活跃度、用户交际广度和用户平台认证指数4项指标构建用户权威度定量计算模型,通过级联用户权威度及其构成指标,并使用2层全连接网络融合特征,有效量化用户特征。其次,考虑到语境对谣言理解的有效性,提取相关语境特征。最后,使用BERT预训练模型提取文本特征,并结合多模态适应门MAG融合用户特征、语境特征与文本特征。在微博数据集上进行的实验表明,相比基线模型,MRUAMF模型的检测性能更优,准确率达0.941。

关键词: 谣言检测, BERT, MAG, 用户权威度, 层次分析法

Abstract: The widespread dissemination of online rumors and their negative impact on society urgently require efficient rumor detection. Due to the lack of semantic information and strict syntactic structure in the text of the dataset, it is meaningful to combine user characteristics and contextual features to enrich semantic information. In this regard, MRUAMF is proposed. Firstly, four indicators including user information completeness, user activity, user communication span, and user platform authentication index are extracted to construct a quantitative calculation model for user authority. By cascading user authority and its constituent indicators, and using a two-layer fully connected network to fuse features, user characteristics are effectively quantified. Secondly, considering the effectiveness of context in understanding rumors, relevant contextual features are extracted. Finally, the BERT pre-training model is used to extract text features, which are then combined with the Multimodal Adaptation Gate (MAG) to fuse user features, contextual features, and text features. Experiments on the microblog dataset show that compared with the baseline model, the MRUAMF model has better detection performance with an accuracy rate of 0.941.