• 中国计算机学会会刊
  • 中国科技核心期刊
  • 中文核心期刊

计算机工程与科学 ›› 2024, Vol. 46 ›› Issue (10): 1765-1774.

• 计算机网络与信息安全 • 上一篇    下一篇

基于双向胶囊网络的恶意评论检测

李公瑾1,邵玉斌1,杜庆治1,龙华1,2,马迪南2    

  1. (1.昆明理工大学信息工程与自动化学院,云南 昆明 650504;2.云南省媒体融合重点实验室,云南 昆明 650032)

  • 收稿日期:2023-07-11 修回日期:2023-10-31 接受日期:2024-10-25 出版日期:2024-10-25 发布日期:2024-10-29
  • 基金资助:
    云南省媒体融合重点实验室项目(320225403)

Toxic comments detection based on bidirectional capsule network

LI Gong-jin1,SHAO Yu-bin1,DU Qing-zhi1,LONG Hua1,2,MA Di-nan2   

  1. (1.Faculty of Information Engineering and Automation,Kunming University of Science and Technology,Kunming 650504;
    2.Yunnan Key Laboratory of Media Integration,Kunming 650032,China)
  • Received:2023-07-11 Revised:2023-10-31 Accepted:2024-10-25 Online:2024-10-25 Published:2024-10-29

摘要: 为了解决现有检测模型无法准确识别语言风格多变、语意隐晦的恶意评论问题,提出了一种基于双向胶囊网络的恶意评论检测模型。首先,利用BERT模型对评论文本进行词嵌入,创建输入矩阵;其次,将输入矩阵传递给双向特征提取层,该层由堆叠的LSTM、双向胶囊网络和注意力网络组成,从正向和反向同时捕获文本的深层语义信息,将生成的正向和反向矩阵拼接起来并输入到注意力机制中,聚焦与恶意评论相关的词语并生成输出向量;再次,拼接输出向量与语境辅助特征向量,丰富特征表示;最后,将拼接向量输入到全连接层中,通过Sigmoid激活函数对评论文本进行分类。在维基百科恶意评论数据集上进行的实验表明,相较于现有研究,基于双向胶囊网络的恶意评论检测模型性能提升显著,能够捕获评论文本中更丰富的语义信息,有效检测恶意评论。

关键词: BERT语言模型, 双向胶囊网络, 语境辅助特征, 恶意评论检测

Abstract: To address the issue that existing detection models struggle to accurately identify malicious comments with varied linguistic styles and implicit semantics, a malicious comment detection model based on a bidirectional capsule network is proposed. Firstly, the BERT model is utilized to perform word embedding on comment texts, creating an input matrix. This input matrix is then passed to a bidirectional feature extraction layer, which comprises stacked LSTM, bidirectional capsule networks, and attention networks. This layer captures the deep semantic information of the text simultaneously from both forward and backward directions. The generated forward and backward matrices are concatenated and input into an attention mechanism, which focuses on words related to malicious comments and generates an output vector. Secondly, the output vector is concatenated with a context-assisted feature vector to enrich the feature representation. Finally, the concatenated vector is input into a fully connected layer, and the comment text is classified through the Sigmoid activation function. Experiments conducted on the Wikipedia malicious comment dataset demonstrate that compared to existing research, the malicious comment detection model based on the bidirectional capsule network achieves significant performance improvements. It is capable of capturing richer semantic information in comment texts and effectively detecting malicious comments.

Key words: BERT language model, bidirectional capsule network, contextual auxiliary features, toxic comments detection