• 中国计算机学会会刊
  • 中国科技核心期刊
  • 中文核心期刊

J4 ›› 2005, Vol. 27 ›› Issue (10): 53-54.

• 论文 • 上一篇    下一篇

一种开采频繁项目集集合的快速算法

赵栋 卢炎生   

  • 出版日期:2005-10-01 发布日期:2010-06-24

  • Online:2005-10-01 Published:2010-06-24

摘要:

在大的数据集合中,开采其中的频繁项目集集合是数据挖掘中极具挑战的重要任务。已经有很多高效的算法被总结了出来。本文提出了一种思想,即开采频繁项目集集合的一  个子集,我们称之为频繁无析取规则集集合,而并非开采完全的频繁项目集集合。我们证明能借助它不读取数据库而还原出频繁项目集集合的全集和它们的支持度。本文还提  提出了一个开采无析取规则集集合的算法HOPE-Ⅱ,实验结果显示了其高效性。我们将它与另一种称为频繁封闭集的精简集进行对比,几乎所有的实验结果都显示使用无析取 规则集集合比使用封闭集集合来开采频繁项目集集合更有效。

关键词: 数据挖掘 精简集 频繁项目集

Abstract:

Given a large set of data, extracting frequent itemsets in this set is a challenging job in data mining. Many efficient algorithms have been proposed  in the literature. The idea presented in this paper is to extract a condensed representation of the frequent itemsets called disjunction-free sets, instead of extracting the whole frequent itemsets collection. We show that this condensed representation can be used to regenerate all frequent iternsets and their exact frequencies without any access to the original data. An algorithm, HOPE-Ⅱ, is presented to extract the frequent disjunction-free sets andd practical experiments show that this representation can be extracted very efficiently. We compare it with another representation in the literature cal led frequent closed sets, and in nearly all the experiments we have done, the disjunction-free sets have been extracted much more efficiently than the f requent closed sets.

Key words: (data mining, condensed representation;frequent itemset.)