• 中国计算机学会会刊
  • 中国科技核心期刊
  • 中文核心期刊

J4 ›› 2011, Vol. 33 ›› Issue (2): 92-96.doi: 10.3969/j.issn.1007130X.2011.

• 论文 • 上一篇    下一篇

基于向量内积的非频繁项挖掘算法研究

刘彩虹1,刘 强2,李爱平3   

  1. (1.大连外国语学院现代教育技术中心,辽宁 大连 116044)(2.海军91423部队,辽宁 大连 116043;3.国防科学技术大学计算机学院,湖南 长沙 410073)
  • 收稿日期:2010-03-02 修回日期:2010-05-30 出版日期:2011-02-25 发布日期:2011-02-25
  • 通讯作者: 刘彩虹
  • 作者简介:刘彩虹(1981),女,吉林长春人,硕士,讲师,CCF会员(E200015828M),研究方向为数据挖掘和数字图像处理。刘强(1981),男,江苏镇江人,硕士,工程师,研究方向为数据挖掘和雷达对抗。李爱平(1974),男,博士,研究方向为人工智能、分布计算和数据库。

Study on Infrequent Itemsets Mining AlgorithmsBased on Vector Inner Product

LIU Caihong1,LIU Qiang2,LI Aiping3   

  1. (1.Modern Education Technology Center,Dalian University of Foreign Languages,Dalian 116044;2.Navy Corps 91423,Dalian 116043;3.School of Computer Science,National University of Defense Technology,Changsha 410073,China)
  • Received:2010-03-02 Revised:2010-05-30 Online:2011-02-25 Published:2011-02-25

摘要:

针对负关联规则中非频繁项集的生成问题,将向量内积引入到该领域。通过对事务数据库的布尔化表示及对数据存储结构的合理分配,提出了一种新的非频繁项集快速生成算法。该算法首先将布尔化所得矩阵中的向量进行内积运算,通过逐层递增的思想,用两级支持度模型来约束非频繁项集与频繁项集的产生,使非频繁项集不仅可由频繁项集之间连接产生,而且可由频繁项集与非频繁项集、非频繁项集与非频繁项集之间连接产生。实验结果表明,该方法仅需扫描一次数据库,且具有动态剪枝、不保留中间候选项、不丢失非频繁项集和节省大量内存等优点,对数据库中负关联规则及各项集中低频率、强相关模式等相关算法的研究具有重要意义。

关键词: 数据挖掘, 负关联规则, 频繁项集, 非频繁项集

Abstract:

Aiming at how to produce infrequent itemsets in the negative association rules, this paper introduces vector inner product to this field. By converting the transaction database to the Boolean Vector Matrix, and by allotting a equitable data storage structure, we put forward a new algorithm to produce infrequent itemsets effectively. First of all, we convert a database to a Boolean Vector Matrix; and then calculate the inner vector in the matrix, and finally produce infrequent itemsets and frequent itemsets with the restriction of the 2LS model according to the idea of incremental change layer after layer ,which makes sure that infrequent itemsets not only can be produced by the joint of frequent itemsets , but also can be produced by the joint between infrequent itemsets and frequent itemsets, and between infrequent itemsets and infrequent itemsets .The experimental results show that this method not only scans the database only once, and also has the virtues such as dynamic pruning, without saving mid items, saving lots of memories, and without losing infrequent itemsets, which has an important meaning to the negative association rule mining and all kinds of itemsets with the characteristics of low frequent appearance, strong correlation in databases.

Key words: data mining;negative association rules;frequent itemsets;infrequent itemsets