• 中国计算机学会会刊
  • 中国科技核心期刊
  • 中文核心期刊

计算机工程与科学 ›› 2024, Vol. 46 ›› Issue (09): 1702-1710.

• 人工智能与数据挖掘 • 上一篇    

一种基于关联程度的高效用数量比频繁模式挖掘算法

王辉1,李燕1,丁丁2,3,吴坤2,3,黄雅平2,3   

  1. (1.中国铁道科学研究院电子计算技术研究所,北京 100081;2.北京交通大学计算机科学与技术学院,北京 100044;
    3.交通数据分析与挖掘北京市重点实验室,北京 100044)
  • 收稿日期:2023-03-29 修回日期:2023-11-29 接受日期:2024-09-25 出版日期:2024-09-25 发布日期:2024-09-23
  • 基金资助:
    中国铁道科学研究院集团有限公司科研重大项目(2021YJ020)

A high utility quantitative frequent pattern mining algorithm based on related degree

WANG Hui1,LI Yan1,DING Ding2,3,WU Kun2,3,HUANG Ya-ping2,3   

  1. (1.Institute of Computing Technologies,China Academy of Railway Sciences,Beijing 100081;
    2.School of Computer Science & Technology,Beijing Jiaotong University,Beijing 100044;
    3.Beijing Key Lab of Traffic Data Analysis and Mining,Beijing 100044,China)
  • Received:2023-03-29 Revised:2023-11-29 Accepted:2024-09-25 Online:2024-09-25 Published:2024-09-23

摘要: 高效用频繁模式挖掘算法运用数据项的重要度信息,能够从数据中挖掘出更重要的频繁模式,而高效用数量比频繁模式挖掘算法可以进一步研究频繁模式中数据项的数量比例关系,是目前数据挖掘领域中的研究课题。从提高算法性能和实用性的角度出发对高效用数量比频繁模式挖掘算法进行优化,提出了一种基于关联程度的高效用数量比频繁模式挖掘算法RHUQI-Miner。RHUQI-Miner首先提出关联程度的概念,依据关联程度构建项目关联程度结构,并给出关联剪枝优化策略,寻找关联程度更高的项目集合,减少冗余和无效的频繁模式;随后运用修正模式长度策略,修正挖掘过程中项集的效用信息,使算法可根据实际数据情况控制输出频繁模式的长度,进一步提升算法的性能,提高算法的实用性。通过对RHUQI-Miner在动车组PHM系统车载故障数据集上的实验结果进行分析,表明该算法能够有效减少挖掘过程中的时间以及内存消耗,可以得出该算法适用于铁路实际数据和业务的有效结论。

关键词: 高效用, 数量比, 频繁模式挖掘, 关联剪枝, 修正模式长度

Abstract: The high utility frequent pattern mining algorithm mines more important frequent patterns from the data by using the importance degree  information. On this basis, the high utility quantitative frequent pattern mining algorithm further explores the quantitative relationship between data items, and thus has become a popular research topic in the field of data mining. RHUQI-Miner is proposed to improve the performance and practicability of the algorithm. Firstly, the concept of related degree is proposed, the item related degree structure is constructed according to the related degree, and a pruning optimization strategy is given to find frequent patterns with higher related degree, reducing redundancy and invalid frequent patterns. Secondly, the fixed pattern length strategy is used to modify the utility information of the item in the mining process, so that the algorithm can control the length of the output frequent pattern according to the actual data situation, and further improve the performance and practicability of the algorithm. The experimental results show that RHUQI-Miner can effectively reduce the time and memory consumption in the mining process, which can provide data support for differentiated and precise maintenance strategies.

Key words: high utility, quantitative, frequent pattern mining, related pruning, fixed pattern length