基于优化上界的高平均效用项集垂直挖掘算法

计算机工程与科学

基于优化上界的高平均效用项集垂直挖掘算法

浦蓉，邵剑飞，胡常礼，曲坤

(昆明理工大学信息工程与自动化学院，云南昆明 650500)

收稿日期:2019-10-21 修回日期:2019-12-11 出版日期:2020-05-25 发布日期:2020-05-25

A vertical mining algorithm for high average-utility

itemsets based on optimal upper bound

PU Rong,SHAO Jian-fei,HU Chang-li,QU Kun

(Faculty of Information Engineering and Automation,Kunming University of Science and Technology,Kunming 650500,China)

Received:2019-10-21 Revised:2019-12-11 Online:2020-05-25 Published:2020-05-25

摘要/Abstract

摘要：

高平均效用项集挖掘是当前研究的热点之一。针对高平均效用项集挖掘算法产生大量无意义的候选项集，而导致高内存消耗和运行时间长的问题，提出了dMHAUI算法。首先定义了集成矩阵Q，并提出了4种基于垂直数据库表示的紧凑平均效用上界及3种有效的修剪策略；将高平均效用项集挖掘所需的信息存储于IDUL结构树，利用改进的diffset技术快速计算项集的平均效用和上界；最后通过递归调用搜索函数得到高平均效用项集。与EHAUPM算法和MHAI算法进行仿真比较，结果表明，dMHAUI算法在运行时间、连接比较次数和可扩展性等方面都有较优的性能。

关键词: 模式挖掘, 高平均效用项集挖掘, dMHAUI算法, 上界, 效用挖掘

Abstract:

Mining high average-utility itemsets is one of the hotspots in the current research. Aiming at the problem that the high average-utility itemsets mining algorithm generates a large number of meaningless candidate itemsets, which results in high memory consumption and long running time, the dMHAUI algorithm is proposed. Firstly, the algorithm defines the integration matrix Q, and proposes four compact average-utility upper bounds based on vertical database representation and three effective pru- ning strategies. Secondly, the information needed for high average-utility itemsets mining is stored in the IDUL structure tree, and the improved diffset technique is used to quickly calculate the average- utility and upper bound of itemsets. Finally, the high average-utility itemsets are obtained by recursively calling the search algorithm. Simulation results show that the dMHAUI function has better performance than the EHAUPM algorithm and the MHAI algorithm in terms of running time, join operation number and scalability.

Key words: pattern mining, high average-utility itemsets mining, dMHAUI algorithm, upper bound, utility mining

浦蓉, 邵剑飞, 胡常礼, 曲坤. 基于优化上界的高平均效用项集垂直挖掘算法[J]. 计算机工程与科学.

PU Rong, SHAO Jian-fei, HU Chang-li, QU Kun.

A vertical mining algorithm for high average-utility

itemsets based on optimal upper bound

[J]. Computer Engineering & Science.

[1]	沈玲珍, 王欣, 石俊豪, 王璐. 模式感知采样算法研究[J]. 计算机工程与科学, 2025, 47(04): 740-750.
[2]	王辉, 李燕, 丁丁, 吴坤, 黄雅平, . 一种基于关联程度的高效用数量比频繁模式挖掘算法[J]. 计算机工程与科学, 2024, 46(09): 1702-1710.
[3]	杨仕琦, 武优西, 耿萌, 李艳. 一次性条件下的三支序列模式挖掘[J]. 计算机工程与科学, 2024, 46(07): 1286-1295.
[4]	芦磊, 王晓峰, 梁晨, 张九龙. 多文字可满足SAT问题的相变点上界[J]. 计算机工程与科学, 2022, 44(07): 1282-1290.
[5]	肖文，胡娟，周晓峰. PFPonCanTree：一种基于MapReduce的并行频繁模式增量挖掘算法[J]. 计算机工程与科学, 2018, 40(01): 15-23.
[6]	邢建英，李梦君，李舟军. 基于不变式生成的循环停机性验证[J]. J4, 2012, 34(4): 108-113.
[7]	赵强利，蒋艳凰，徐明. 选择性集成算法分类与比较[J]. J4, 2012, 34(2): 134-138.
[8]	钱悦1，鲁中海2，窦强1，窦文华1. 片上网络二维和三维结构的通信性能分析[J]. J4, 2011, 33(3): 34-40.
[9]	孙景昊，孟亚坤，谭国真. 时间依赖网络中国邮路问题[J]. J4, 2010, 32(10): 122-125.
[10]	逯燕玲[1] 解文彬[2] 吴晶晶[3,4]. 波动系数对星型网络拓扑结构的影响[J]. J4, 2008, 30(10): 116-117.
[11]	任家东宗俊省李志国. 一种实现偏序约束条件下的序列模式挖掘算法[J]. J4, 2007, 29(5): 86-89.
[12]	罗程[1] 钟诚[1] 李智[1,2]. 网络入侵检测系统中无导师学习分析器的设计[J]. J4, 2006, 28(7): 28-29.
[13]	陈恩红李铜舒王舒. 一种基于MaxGap约束的高效序列模式挖掘算法[J]. J4, 2006, 28(10): 83-86.
[14]	王正元杨克巍刘靖旭谭跃进. 组合优化问题的一种精确求解方法[J]. J4, 2004, 26(12): 64-66.
[15]	金沈杰吴绍春吴耿锋严胜祥. 基于预聚类技术的并行序贯模式挖掘算法[J]. J4, 2004, 26(10): 66-68.