• 中国计算机学会会刊
  • 中国科技核心期刊
  • 中文核心期刊

计算机工程与科学

• 论文 •    下一篇

对于大规模系统日志的日志模式提炼算法的优化

赵一宁,肖海力   

  1. (中国科学院计算机网络信息中心,北京 100190)  
  • 收稿日期:2017-01-05 修回日期:2017-03-17 出版日期:2017-05-25 发布日期:2017-05-25
  • 基金资助:

    国家重点研发计划项目(2016YFB0201404);十二五863重大项目(2014AA01A302)

Optimization of the log pattern extraction
algorithm for large-scale syslog files

ZHAO Yi-ning,XIAO Hai-li   

  1. (Computer Network Information Center,Chinese Academy of Sciences,Beijing 100190,China)
  • Received:2017-01-05 Revised:2017-03-17 Online:2017-05-25 Published:2017-05-25

摘要:

LARGE框架是部署在中国科学院超级计算环境中的日志分析系统,通过日志收集、集中分析、结果反馈等步骤对环境中的各种日志文件进行监控和分析。在对环境中系统日志的监控过程中,系统维护人员需要通过日志模式提炼算法将大量的过往系统日志记录缩减为少量的日志模式集合。然而随着日志规模的增长以及messages日志文件的特殊性,原有的日志模式提炼算法已经难以满足对大规模日志快速处理的需要。介绍了一种对于日志模式提炼算法的优化方法,通过引入MapReduce机制实现在存在多个日志输入文件的情况下对日志处理和模式提炼的流程进行加速。实验表明,当输入文件较多时,该优化方法能够显著提高词汇一致率算法的运行速度,大幅减少运行时间。此外,还对使用词汇转换函数时的算法运行时间和提炼效果进行了验证。

关键词: 日志处理, MapReduce机制, 大数据分析, 网格环境

Abstract:

The LARGE system is a log analysis framework deployed in the supercomputing environment in Chinese Academy of Sciences. It monitors and analyzes various log files in the environment through log collection, centrally analysis and result feedback. In the process of monitoring system logs, it is necessary for system maintenance personnel to reduce the large number of original logs into a small set of log patterns using the log pattern extraction algorithm. However, because of the fast increase of log size and the peculiarity of messages log files,  the traditional log pattern extraction algorithm fails to satisfy the requirement of rapid processing of logs. We propose an optimization method for  the log pattern extraction algorithm by introducing the idea of the MapReduce mechanism to accelerate the process of log pattern extraction in case of multiple input log files. Evaluation results show that when there are a number of input files, the optimization method can significantly improve the running speed of the vocabulary consistency algorithm and greatly reduce  the running time. We also evaluate the time cost and the extraction effect the optimization algorithm when the vocabulary conversion function is used.

Key words: log processing, MapReduce, bigdata analysis, grid environment