• 中国计算机学会会刊
  • 中国科技核心期刊
  • 中文核心期刊

J4 ›› 2011, Vol. 33 ›› Issue (7): 183-187.

• 论文 • 上一篇    下一篇

基于改进FP树的项项正相关关联规则挖掘

刘上力,杨清   

  1. (湖南科技大学网络信息中心,湖南 湘潭 411201)
  • 收稿日期:2011-01-12 修回日期:2011-04-26 出版日期:2011-07-21 发布日期:2011-07-25
  • 作者简介:刘上力(1978),男,湖南湘潭人,硕士,工程师,研究方向为数据挖掘和模式识别。杨清(1969),男,湖南益阳人,博士,教授,研究方向为数据挖掘和模式识别。
  • 基金资助:

    湖南省自然科学基金资助项目(06JJ5132);湖南省教育厅重点科学研究项目(10A028);湖南省科技计划项目(2009JT3031)

BetweenItems Positive Correlated Association Rules Mining Based on Node Linked List FPTree

LIU Shangli,YANG Qing   

  1. (Network Information Center,Hunan University of Science and Technology,Xiangtan 411201,China)
  • Received:2011-01-12 Revised:2011-04-26 Online:2011-07-21 Published:2011-07-25

摘要:

兴趣度量在关联规则挖掘中常用来发现那些潜在的令人感兴趣的模式,基于FP树结构的FPgrowth算法是目前较高效的关联规则挖掘算法之一,如果挖掘潜在的有价值的低支持度模式,这种算法效率较低。为此,本文提出一种新的兴趣度量—项项正相关兴趣度量,该量度具有良好的反单调性,所得到的模式中任意一项在事务中的出现均可提升模式中其余项出现的可能性。同时,提出一种改进的FP挖掘算法,该算法采用一种压缩的FP树结构,并利用非递归调用方法来减少挖掘中建立额外条件模式树的开销。更为重要的是,在频繁项集挖掘中引入项项正相关兴趣度量剪枝策略,有效过滤掉非正相关长模式和无效项集,扩大了可挖掘支持度阈值范围。实验结果表明,该算法是有效和可行的。

关键词: 关联规则, 兴趣度, 项项正相关, 剪枝

Abstract:

Interestingness measures are intended for selecting patterns according to their potential interest to the user in association rules. The FPgrowth algorithm based on the FPtree structure is an efficient algorithm for mining association rules. This algorithm is not quite effective in the process of mining potentially valuable lowsupport patterns. To solve this problem, a novel type of interestingness measure called betweenitems positive correlation interestingness measure is presented. This measure has a good autimonotone, and the presence of an item in one transaction increases the presence of every other item in the same pattern. This paper also proposes an improved FP mining algorithm which creates a compact FP structure by the node linked list and uses a nonrecursive function to decrease the overhead of creating an extra data structure at each mining step. More importantly, this algorithm exploits an efficient pruning strategy which uses the interestingness measure to filter the nonpositive correlated long model and invalid itemsets. The range of the support threshold is expanded. The experimental results indicate the given algorithm is efficient and feasible.

Key words: association rules;interestingness measure;betweenitems positive correlation;pruning