• 中国计算机学会会刊
  • 中国科技核心期刊
  • 中文核心期刊

计算机工程与科学

• 人工智能与数据挖掘 • 上一篇    下一篇

基于优化上界的高平均效用项集垂直挖掘算法

浦蓉,邵剑飞,胡常礼,曲坤   

  1. (昆明理工大学信息工程与自动化学院,云南 昆明 650500)
  • 收稿日期:2019-10-21 修回日期:2019-12-11 出版日期:2020-05-25 发布日期:2020-05-25

A vertical mining algorithm for high average-utility
 itemsets based on optimal upper bound

PU Rong,SHAO Jian-fei,HU Chang-li,QU Kun   

  1. (Faculty of Information Engineering and Automation,Kunming University of Science and Technology,Kunming 650500,China)
  • Received:2019-10-21 Revised:2019-12-11 Online:2020-05-25 Published:2020-05-25

摘要:

高平均效用项集挖掘是当前研究的热点之一。针对高平均效用项集挖掘算法产生大量无意义的候选项集,而导致高内存消耗和运行时间长的问题,提出了dMHAUI算法。首先定义了集成矩阵Q,并提出了4种基于垂直数据库表示的紧凑平均效用上界及3种有效的修剪策略;将高平均效用项集挖掘所需的信息存储于IDUL结构树,利用改进的diffset技术快速计算项集的平均效用和上界;最后通过递归调用搜索函数得到高平均效用项集。与EHAUPM算法和MHAI算法进行仿真比较,结果表明,dMHAUI算法在运行时间、连接比较次数和可扩展性等方面都有较优的性能。
 

关键词: 模式挖掘, 高平均效用项集挖掘, dMHAUI算法, 上界, 效用挖掘

Abstract:

Mining high average-utility itemsets is one of the hotspots in the current research. Aiming at the problem that the high average-utility itemsets mining algorithm generates a large number of meaningless candidate itemsets, which results in high memory consumption and long running time, the dMHAUI algorithm is proposed. Firstly, the algorithm defines the integration matrix Q, and proposes four compact average-utility upper bounds based on vertical database representation and three effective pru- ning strategies. Secondly, the information needed for high average-utility itemsets mining is stored in the IDUL structure tree, and the improved diffset technique is used to quickly calculate the average- utility and upper bound of itemsets. Finally, the high average-utility itemsets are obtained by recursively calling the search algorithm. Simulation results show that the dMHAUI function has better performance than the EHAUPM algorithm and the MHAI algorithm in terms of running time, join operation number and scalability.
 

Key words: pattern mining, high average-utility itemsets mining, dMHAUI algorithm, upper bound, utility mining