• 中国计算机学会会刊
  • 中国科技核心期刊
  • 中文核心期刊

J4 ›› 2015, Vol. 37 ›› Issue (09): 1756-1760.

• 论文 • 上一篇    下一篇

基于条件随机场模型的数据异常检测算法

王文珂1,文雅玫2,蔡喆2   

  1. (1.国防科学技术大学计算机学院,湖南 长沙  410073;2.湖南省烟草专卖局(公司)经济信息中心,湖南 长沙 410004)
  • 收稿日期:2014-07-08 修回日期:2014-10-21 出版日期:2015-09-25 发布日期:2015-09-25
  • 基金资助:

    国家自然科学基金资助项目(61202335)

Abnormal data detection algorithm
based on conditional random fields model  

WANG Wenke1,WEN Yamei2,CAI Zhe2   

  1. (1.College of Computer,National University of Defense Technology,Changsha 410073;
    2.Information Center,Hunan Tobacco,Changsha 410004,China)
  • Received:2014-07-08 Revised:2014-10-21 Online:2015-09-25 Published:2015-09-25

摘要:

企业数据中心作为辅助决策的重要工具,保证其数据的及时性、准确性和科学性是最基本的要求和最核心的原则。对于数据异常的情况,若仅依靠人为的经验在海量数据中进行判断是很困难的,也是不科学且低效的。针对企业购销存数据的准确性问题,研究了基于机器学习的数据异常检测算法。由于购销存数据是由一组相对固定的数据项组成,可以看作是一个结构化数据序列,因此选择了解决结构化序列预测问题最为有效的条件随机场模型CRFs。通过对大量历史数据进行学习,分析出数据的自身规律以及关联关系,使计算机具备自动检测异常的能力。实验结果表明了该算法的有效性。

关键词: 数据中心, 机器学习, 数据异常检测, 条件随机场模型

Abstract:

Data centers are an important auxiliary tool for business leaders to make decisions, and  timely, accurate and scientific data are basic requirements and key principles. It is difficult and inefficient to find out abnormal one in huge amounts of data by human experience. In this paper, we propose an algorithm for detecting abnormal data based on machine learning. Because enterprise sales data consist of a series of relatively fixed data items, they can be recognized as a structured data sequence. Conditional Random Fields (CRFs) model is efficient for structured data sequence prediction, so it can be used as the detection model. A large number of history data are learnt and their intrinsic rules and relationship are analyzed so as to enable computers to detect abnormal data automatically. Experimental result shows the effectiveness of the proposed algorithm.

Key words: data center;machine learning;detection of abnormal data;conditional randomfieldsmodel