• 中国计算机学会会刊
  • 中国科技核心期刊
  • 中文核心期刊

计算机工程与科学

• 论文 • 上一篇    下一篇

基于机器学习的日志函数自动识别方法

贾周阳,廖湘科,刘晓东,李姗姗,周书林,谢欣伟   

  1. (国防科学技术大学计算机学院,湖南 长沙 410073)
  • 收稿日期:2015-06-15 修回日期:2015-10-27 出版日期:2017-01-25 发布日期:2017-01-25
  • 基金资助:

    国家自然科学基金(61379146,61272483);腾讯高校合作项目“面向故障检测的大规模开源软件日志增强技术研究”

Logging function recognition based on
machine learning technique
 

JIA Zhouyang,LIAO Xiangke,LIU Xiaodong,LI Shanshan,ZHOU Shulin,XIE Xinwei   

  1. (College of Computer,National University of Defense Technology,Changsha 410073,China)
  • Received:2015-06-15 Revised:2015-10-27 Online:2017-01-25 Published:2017-01-25

摘要:

随着软件规模的不断增长,日志在故障检测中发挥着愈加重要的作用。然而,目前软件日志缺乏统一标准,常受开发人员个人习惯影响,为大规模系统中日志的自动化分析带来了挑战。其中,日志函数的识别作为日志分析的前提条件,对分析结果有着直接影响。提出了一种基于机器学习的方法以支持日志自动识别。通过系统分析广泛使用的大规模开源软件,总结出日志函数编写的主要形式,并提取不同形式间的共性特征,进而基于机器学习实现了自动日志识别工具iLog。实验显示,使用iLog识别的日志函数能力平均为使用特定关键字的76倍,十折交叉验证得到iLog的分析结果的FScore为0.93。

关键词: 日志函数, 机器学习, 静态分析, 代码质量, 故障检测

Abstract:

With software scaling up continuously, logging mechanism has become an indispensable part in failure diagnosis area. A pretty similar symptom may be caused by various software bugs, and the most obvious evidence is always logging messages. Meanwhile, the development of most pieces of largescale software is affected by developers' personal habits rather than being guided by certain conventional specification, so logrelated analysis suffers in largescale software. The recognition of logging function plays a precondition role in log analysis and affects the results of log analysis directly. We propose a machine learning method to fill the gap that logging function recognition has not been paid attention by most existing logrelated works. Learning from widelyused software, we summary three logging functions, extract five common features to complement automated loggingfunction recognition tool iLog based on machine learning. Evaluations show that the recognition ability of iLog is 76 times of those  using key words. Additionally, 10fold crossvalidation shows that the FScore average is 0.93.

Key words: logging function, machine learning, static analysis, code quality, failure diagnosis