基于向量内积的非频繁项挖掘算法研究
收稿日期: 2010-03-02
修回日期: 2010-05-30
网络出版日期: 2011-02-25
Study on Infrequent Itemsets Mining AlgorithmsBased on Vector Inner Product
Received date: 2010-03-02
Revised date: 2010-05-30
Online published: 2011-02-25
针对负关联规则中非频繁项集的生成问题,将向量内积引入到该领域。通过对事务数据库的布尔化表示及对数据存储结构的合理分配,提出了一种新的非频繁项集快速生成算法。该算法首先将布尔化所得矩阵中的向量进行内积运算,通过逐层递增的思想,用两级支持度模型来约束非频繁项集与频繁项集的产生,使非频繁项集不仅可由频繁项集之间连接产生,而且可由频繁项集与非频繁项集、非频繁项集与非频繁项集之间连接产生。实验结果表明,该方法仅需扫描一次数据库,且具有动态剪枝、不保留中间候选项、不丢失非频繁项集和节省大量内存等优点,对数据库中负关联规则及各项集中低频率、强相关模式等相关算法的研究具有重要意义。
刘彩虹1,刘 强2,李爱平3 . 基于向量内积的非频繁项挖掘算法研究[J]. 计算机工程与科学, 2011 , 33(2) : 92 -96 . DOI: 10.3969/j.issn.1007130X.2011.
Aiming at how to produce infrequent itemsets in the negative association rules, this paper introduces vector inner product to this field. By converting the transaction database to the Boolean Vector Matrix, and by allotting a equitable data storage structure, we put forward a new algorithm to produce infrequent itemsets effectively. First of all, we convert a database to a Boolean Vector Matrix; and then calculate the inner vector in the matrix, and finally produce infrequent itemsets and frequent itemsets with the restriction of the 2LS model according to the idea of incremental change layer after layer ,which makes sure that infrequent itemsets not only can be produced by the joint of frequent itemsets , but also can be produced by the joint between infrequent itemsets and frequent itemsets, and between infrequent itemsets and infrequent itemsets .The experimental results show that this method not only scans the database only once, and also has the virtues such as dynamic pruning, without saving mid items, saving lots of memories, and without losing infrequent itemsets, which has an important meaning to the negative association rule mining and all kinds of itemsets with the characteristics of low frequent appearance, strong correlation in databases.
/
| 〈 |
|
〉 |