• 中国计算机学会会刊
  • 中国科技核心期刊
  • 中文核心期刊

计算机工程与科学 ›› 2025, Vol. 47 ›› Issue (3): 448-458.

• 计算机网络与信息安全 • 上一篇    下一篇

基于Transformer和Text-CNN的日志异常检测

尹春勇,张小虎   

  1. (南京信息工程大学计算机学院、网络空间安全学院,江苏 南京 210044)
  • 收稿日期:2023-08-18 修回日期:2023-12-04 出版日期:2025-03-25 发布日期:2025-04-01
  • 基金资助:

Log anomaly detection based on Transformer and Text-CNN

YIN Chunyong,ZHANG Xiaohu   

  1. (School of Computer Science,School of Cyber Science and Engineering,
    Nanjing University of Information Science & Technology,Nanjing 210044,China)
  • Received:2023-08-18 Revised:2023-12-04 Online:2025-03-25 Published:2025-04-01

摘要: 日志数据作为软件系统中最为重要的数据资源之一,记录着系统运行期间的详细信息,自动化的日志异常检测对于维护系统安全至关重要。随着大型语言模型在自然语言处理领域的广泛应用,基于Transformer的日志异常检测方法被广泛地提出。传统的基于Transformer的方法,难以捕捉日志序列的局部特征,针对上述问题,提出了基于Transformer和Text-CNN的日志异常检测方法LogTC。首先,通过规则匹配将日志转换成结构化的日志数据,并保留日志语句中的有效信息;其次,根据日志特性采用固定窗口或会话窗口将日志语句划分为日志序列;再次,使用自然语言处理技术Sentence-BERT生成日志语句的语义化表示;最后,将日志序列的语义化向量输入到LogTC日志异常检测模型中进行检测。实验结果表明,LogTC能够有效地检测日志数据中的异常,且在2个数据集上都取得了较好的结果。

关键词: 日志异常检测, 深度学习, 词嵌入, Transformer, Text-CNN

Abstract: Log data, as one of the most important data resources in software systems, records detailed information during system operation, and automated log anomaly detection is crucial for maintain- ing system security. With the widespread application of large language models in the field of natural language processing, Transformer-based log anomaly detection methods have been widely proposed. Traditional Transformer-based methods struggle to capture the local features of log sequences. To address this issue, this paper proposes a log anomaly detection method, LogTC, based on Transformer and Text-CNN. Firstly, logs are converted into structured log data through rule matching, while preserving the effective information in log statements. Secondly, log statements are divided into log sequences using fixed windows or session windows according to log characteristics. Thirdly, natural language processing technology, specifically Sentence-BERT, is used to generate semantic representations of log statements. Finally, the semantic vectors of the log sequences are input into the LogTC log anomaly detection model for detection. Experimental results show that LogTC can effectively detect anomalies in log data and achieves good results on two datasets.

Key words: log anomaly detection, deep learning, word embedding, Transformer, Text-CNN