• 中国计算机学会会刊
  • 中国科技核心期刊
  • 中文核心期刊

J4 ›› 2010, Vol. 32 ›› Issue (10): 108-111.doi: 10.3969/j.issn.1007130X.2010.

• 论文 • 上一篇    下一篇

目标频繁模式挖掘算法研究

梁碧珍1,陆月然1,耿立中2,秦亮曦3   

  1. (1.百色学院数学与计算机信息工程系,广西 百色 533000;2.清华大学机械工程学院,北京 100084;3.广西大学计算机与电子信息学院,广西 南宁 530004)
  • 收稿日期:2010-03-17 修回日期:2010-06-19 出版日期:2010-09-29 发布日期:2010-09-29
  • 作者简介:梁碧珍(1965〖CD*2〗),女,广西玉林人,硕士,副教授,研究方向为数据挖掘;陆月然,副教授,研究方向为计算机网络和数据挖掘;耿立中,博士生,研究方向为计算机存储;秦亮曦,教授,研究方向为数据挖掘和进化计算
  • 基金资助:

    广西教育厅项目(200708MS);百色学院重点项目(2007KA03)

Research on the Target Frequent Patterns Mining Algorithms

LIANG Bizhen1,LU Yueran1,GENG Lizhong2,QIN Liangxi3   

  1. (1.Department of Mathematics and Computer Information Engineering,Baise University,Baise 533000;2.School of Mechanical Engineering,Tsinghua University,Beijing 100084;3.School of Computer Science and Electronic Information,Nanning 530004,China)
  • Received:2010-03-17 Revised:2010-06-19 Online:2010-09-29 Published:2010-09-29

摘要:

通用的频繁模式挖掘算法通常产生庞大的频繁模式集,其中很多是用户不感兴趣的非目标模式。要排除这些非目标模式,用户必须进行“二次挖掘”。TFPgrowth虽然生成所有最大目标频繁模式,但要从中获得目标频繁模式,还需经过“二次挖掘”。若在挖掘的早期就对非目标频繁模式的产生加以限制,则有望提高算法的效率。本文在TFP growth 和SFPgrowth的基础上,提出一种目标频繁模式挖掘算法STFPgrowth,通过对TFP树的排序、根据树根结点的不同情形采用不同的建子树方法和目标频繁模式筛选方法等来提高算法的效率。STFPgrowth挖掘的结果是所有满足用户需求的目标频繁模式,不需“二次挖掘”。实验表明,STFPgrowth的效率高于TFPgrowth,也明显优于Apriori和Eclat。

关键词: 频繁模式, 目标频繁模式, 最大目标频繁模式, 挖掘算法

Abstract:

General frequent patterns mining algorithms usually produce large sets of frequent patterns, in which there are many nontarget patterns that users aren’t interested in. To exclude the nontarget patterns , users have to do the second mining. Although TFPgrowth can produce all maximum target frequent patterns , the second minning is still essential to getting the target frequent patterns from them. If we restrict the producing of the nontarget frequent patterns early in the mining process, it would improve the efficiency of the algorithm. Based on the TFPgrowth and the SFPgrowth, a target frequent patterns mining algorithm  named STFPgrowth is proposed in this paper,its efficiency can be promoted by sorting TFPtree, adopting different ways to build sub trees and sift target frequent patterns in different cases of tree nodes. STFPgrowth mines all the target frequent patterns which satisfy users’ requirements, and users need not do the second minning . The experiments show that STFPgrowth is more efficient than the TFPgrowth, and outperforms Apriori and Eclat obviously.

Key words: frequent pattern;target frequent pattern;maximum target frequent pattern;mining algorithm