• 中国计算机学会会刊
  • 中国科技核心期刊
  • 中文核心期刊

Computer Engineering & Science ›› 2024, Vol. 46 ›› Issue (10): 1765-1774.

• Computer Network and Znformation Security • Previous Articles     Next Articles

Toxic comments detection based on bidirectional capsule network

LI Gong-jin1,SHAO Yu-bin1,DU Qing-zhi1,LONG Hua1,2,MA Di-nan2   

  1. (1.Faculty of Information Engineering and Automation,Kunming University of Science and Technology,Kunming 650504;
    2.Yunnan Key Laboratory of Media Integration,Kunming 650032,China)
  • Received:2023-07-11 Revised:2023-10-31 Accepted:2024-10-25 Online:2024-10-25 Published:2024-10-29

Abstract: To address the issue that existing detection models struggle to accurately identify malicious comments with varied linguistic styles and implicit semantics, a malicious comment detection model based on a bidirectional capsule network is proposed. Firstly, the BERT model is utilized to perform word embedding on comment texts, creating an input matrix. This input matrix is then passed to a bidirectional feature extraction layer, which comprises stacked LSTM, bidirectional capsule networks, and attention networks. This layer captures the deep semantic information of the text simultaneously from both forward and backward directions. The generated forward and backward matrices are concatenated and input into an attention mechanism, which focuses on words related to malicious comments and generates an output vector. Secondly, the output vector is concatenated with a context-assisted feature vector to enrich the feature representation. Finally, the concatenated vector is input into a fully connected layer, and the comment text is classified through the Sigmoid activation function. Experiments conducted on the Wikipedia malicious comment dataset demonstrate that compared to existing research, the malicious comment detection model based on the bidirectional capsule network achieves significant performance improvements. It is capable of capturing richer semantic information in comment texts and effectively detecting malicious comments.

Key words: BERT language model, bidirectional capsule network, contextual auxiliary features, toxic comments detection