• 中国计算机学会会刊
  • 中国科技核心期刊
  • 中文核心期刊

计算机工程与科学 ›› 2023, Vol. 45 ›› Issue (07): 1209-1215.

• 计算机网络与信息安全 • 上一篇    下一篇

基于字符串匹配算法的业务流程低频日志噪声过滤方法

何子贤,方贤文   

  1. (安徽理工大学数学与大数据学院,安徽 淮南 232001)
  • 收稿日期:2022-01-21 修回日期:2022-06-13 接受日期:2023-07-25 出版日期:2023-07-25 发布日期:2023-07-11
  • 基金资助:
    国家自然科学基金(61572035,61402011);安徽省自然科学基金(2008085QD178)

A low-frequency log noise filtering method in business process based on string matching algorithm

HE Zi-xian,FANG Xian-wen   

  1. (School of Mathematics and Big Data,Anhui University of Science & Technology,Huainan 232001,China)
  • Received:2022-01-21 Revised:2022-06-13 Accepted:2023-07-25 Online:2023-07-25 Published:2023-07-11

摘要: 过程挖掘领域关注的是对业务流程执行所产生的数据的分析,旨在从数据中提取可操作的过程知识。模型的低频日志中可能存在一些噪声,导致分析受到负面影响,由此提出了一种基于频率变化规则及字符串匹配方法从低频的事件日志中识别并过滤噪声的方法。首先,基于直接跟随图和最终跟随图,根据频率变化规则,从事件日志中识别无效的直接活动对序列集合。然后,结合改进的字符串匹配算法,根据直接跟随图的直接关系与事件日志的迹的序列片段的对应关系,将无效活动序列与低频日志迹进行字符串匹配,从而过滤日志中的噪声,优化挖掘模型。最后,通过具体的案例分析及仿真实验,验证了该方法的有效性。

关键词: 噪声, 直接跟随图, 间接跟随图, KMP, 过滤, 优化

Abstract: The process mining field focuses on the analysis of data generated by business process execution, aiming to extract operational process knowledge from the data. However, there may be some noise in the low-frequency logs of the model, which may negatively affect the analysis. Therefore, a method based on frequency change rules and string matching is proposed to identify and filter noise from low-frequency event logs. Firstly, based on the directly-follows graph and the eventually-follows graph, invalid direct activity pairs are identified from the event log sequence set according to frequency change rules. Then, combined with an improved string matching algorithm (KMP), the invalid activity sequences are matched with the low-frequency log traces based on the correspondence between the direct relationship of the direct-follow graph and the sequence fragment of the event log, thus filtering the noise in the log and optimizing the mining model. Finally, the effectiveness of the method is verified through specific case analysis and simulation experiments.

Key words: noise, directly-follows graph, eventually-follows graph, KMP, filtering, optimization ,