• 中国计算机学会会刊
  • 中国科技核心期刊
  • 中文核心期刊

计算机工程与科学 ›› 2011, Vol. 33 ›› Issue (2): 168-172.doi: 10.3969/j.issn.1007130X.2011.

• 高性能计算 • 上一篇    下一篇

基于改进尺度的统计建模数据中离群点去除算法及应用

张新荣1 ,徐保国2   

  1. (1.淮阴工学院电子与电气工程学院,江苏 淮安 223003;2.江南大学信控学院,江苏 无锡 214122)
  • 收稿日期:2010-03-02 修回日期:2010-06-21 出版日期:2011-02-25 发布日期:2011-02-25
  • 通讯作者: 张新荣
  • 作者简介:张新荣(1973),男,陕西渭南人,硕士,讲师,研究方向为工业过程监控与故障诊断技术。徐保国(1950),男,江苏淮安人,教授,研究方向为轻工过程智能控制、智能仪表及现场总线技术、无线传感器网络及其控制。
  • 基金资助:

    国家863计划资助项目(2007AA10Z241)

The Outlier Detection Algorithm and Its Application in the Statistical Monitoring Model Based on Modified Scaling

ZHANG Xinrong1 ,X Baoguo2   

  1. (1.Faculty of Electronic and Electrical Engineering,Huaiyin Institute of Technology,Huaian 223003;2.School of Communication and Control Engineering,Jiangnan University,Wuxi 214122,China)
  • Received:2010-03-02 Revised:2010-06-21 Online:2011-02-25 Published:2011-02-25

摘要:

鉴于传统鲁棒离群点去除算法不能准确估计过程采样数据的均值和协方差,导致基于PCA的统计建模监控影响故障诊断效果的局限性,本文提出一种综合CDCm与MVT的异常检测算法,可以克服上述缺陷。通过改进尺度方法对过程原始采样数据实现准确估计并进行中心化和标准化处理,运用采样数据中的最大变量值来计算距离,采用CDCm算法求出样本值与中心距离最短的正常点,利用获得的有效数据计算MVT迭代算法的第一个马氏距离,选取距离较小值对应的样本点进行迭代运算,最终去除离群点,获得正常数据。通过在发酵过程中的应用,并与传统鲁棒检测算法进行比较,实验与分析结果表明,该算法提高了异常检测的效率和准确度。

关键词: 改进尺度, 离群点, 中心最短距离, 椭球多变量整理

Abstract:

The traditional robust outlier removing algorithm can not obtain the accurate mean and standard deviation of the sample data. Thus it can decrease the ability of processing the  fault diagnosis in the statistical monitoring model based on PCA.An outlier detection algorithm which combines CDCm(Closest Distance to Center, Maximum Variable Distance) and MVT (Ellipsoidal Multivariate Trimming) is proposed. It can overcome the above limitations,utilizing a modified scale to obtain the mean and standard deviation of the processing data,and can carry out the centering and standardization of it.Then the normal data of observations and the closest distance to the center are extracted from the modeling database by the CDCm algorithm of maximum variable distance.Using it,the first mahalanobis distance of MVT is obtained. The other normal data is gotten by the iterative calculation of the mahalanobis distance. A method is applied to detecting outliers from a fermentation process and comparing with the traditional robust outlier detection algorithms. The analysis and experimental results show that it can improve the outlier detecting efficiency and accuracy.

Key words: modified scaling, outlier, closest distance to center, ellipsoidal multivariate trimming