基于向量内积的非频繁项挖掘算法研究

doi:10.3969/j.issn.1007130X.2011.

J4 ›› 2011, Vol. 33 ›› Issue (2): 92-96.doi: 10.3969/j.issn.1007130X.2011.

基于向量内积的非频繁项挖掘算法研究

刘彩虹1,刘强2,李爱平3

(1.大连外国语学院现代教育技术中心，辽宁大连 116044)（2.海军91423部队，辽宁大连 116043;3.国防科学技术大学计算机学院，湖南长沙 410073）

收稿日期:2010-03-02 修回日期:2010-05-30 出版日期:2011-02-25 发布日期:2011-02-25
通讯作者: 刘彩虹
作者简介:刘彩虹(1981),女,吉林长春人，硕士，讲师，CCF会员（E200015828M），研究方向为数据挖掘和数字图像处理。刘强(1981),男,江苏镇江人，硕士，工程师，研究方向为数据挖掘和雷达对抗。李爱平(1974),男,博士,研究方向为人工智能、分布计算和数据库。

Study on Infrequent Itemsets Mining AlgorithmsBased on Vector Inner Product

LIU Caihong1,LIU Qiang2,LI Aiping3

(1.Modern Education Technology Center,Dalian University of Foreign Languages,Dalian 116044;2.Navy Corps 91423,Dalian 116043;3.School of Computer Science,National University of Defense Technology,Changsha 410073,China)

Received:2010-03-02 Revised:2010-05-30 Online:2011-02-25 Published:2011-02-25

摘要/Abstract

摘要：

针对负关联规则中非频繁项集的生成问题，将向量内积引入到该领域。通过对事务数据库的布尔化表示及对数据存储结构的合理分配，提出了一种新的非频繁项集快速生成算法。该算法首先将布尔化所得矩阵中的向量进行内积运算，通过逐层递增的思想，用两级支持度模型来约束非频繁项集与频繁项集的产生，使非频繁项集不仅可由频繁项集之间连接产生，而且可由频繁项集与非频繁项集、非频繁项集与非频繁项集之间连接产生。实验结果表明，该方法仅需扫描一次数据库，且具有动态剪枝、不保留中间候选项、不丢失非频繁项集和节省大量内存等优点，对数据库中负关联规则及各项集中低频率、强相关模式等相关算法的研究具有重要意义。

关键词: 数据挖掘, 负关联规则, 频繁项集, 非频繁项集

Abstract:

Aiming at how to produce infrequent itemsets in the negative association rules, this paper introduces vector inner product to this field. By converting the transaction database to the Boolean Vector Matrix, and by allotting a equitable data storage structure, we put forward a new algorithm to produce infrequent itemsets effectively. First of all, we convert a database to a Boolean Vector Matrix; and then calculate the inner vector in the matrix, and finally produce infrequent itemsets and frequent itemsets with the restriction of the 2LS model according to the idea of incremental change layer after layer ,which makes sure that infrequent itemsets not only can be produced by the joint of frequent itemsets , but also can be produced by the joint between infrequent itemsets and frequent itemsets, and between infrequent itemsets and infrequent itemsets .The experimental results show that this method not only scans the database only once, and also has the virtues such as dynamic pruning, without saving mid items, saving lots of memories, and without losing infrequent itemsets, which has an important meaning to the negative association rule mining and all kinds of itemsets with the characteristics of low frequent appearance, strong correlation in databases.

Key words: data mining;negative association rules;frequent itemsets;infrequent itemsets

刘彩虹1,刘强2,李爱平3. 基于向量内积的非频繁项挖掘算法研究[J]. J4, 2011, 33(2): 92-96.

LIU Caihong1,LIU Qiang2,LI Aiping3. Study on Infrequent Itemsets Mining AlgorithmsBased on Vector Inner Product[J]. J4, 2011, 33(2): 92-96.

编辑推荐

Metrics

阅读次数

全文

HTML			PDF

最新录用	在线预览	正式出版	最新录用	在线预览	正式出版
0	0	0	0	0	38

来源	本网站	其他网站

次数	27	11
比例	71%	29%

摘要

最新录用	在线预览	正式出版

0	0	52

	来源	本网站

	次数	53
	比例	100%

[1]	赵琰, 马慧芳, 王文涛, 童海斌, 贺相春. 可靠响应表示增强的知识追踪方法[J]. 计算机工程与科学, 2024, 46(03): 535-544.
[2]	雷轩, 程光, 张玉健, 郭靓, 张付存. 基于电力网络态势感知平台的告警信息关联分析[J]. 计算机工程与科学, 2023, 45(07): 1197-1208.
[3]	王晨宇, 温浩珉, 郭晟楠, 林友芳, 万怀宇, . 面向快递员揽收到达时间预测的多任务深度时空网络[J]. 计算机工程与科学, 2023, 45(01): 136-144.
[4]	程小刚, 郭韧, 周长利, . 基于理性密码学的分布式隐私保护数据挖掘框架[J]. 计算机工程与科学, 2022, 44(10): 1781-1787.
[5]	王文涛, 马慧芳, 舒跃育, 贺相春. 基于上下文表示的知识追踪方法[J]. 计算机工程与科学, 2022, 44(09): 1693-1701.
[6]	刘云, 肖添. 网络日志数据中条件因果挖掘算法的优化研究[J]. 计算机工程与科学, 2021, 43(09): 1584-1590.
[7]	文凯, 许萌萌, 张许红, . 基于列表结构的加权可擦除项集挖掘算法[J]. 计算机工程与科学, 2021, 43(09): 1676-1683.
[8]	熊中敏, 汪博, 陶然, 郑宗生, 陈明, . 一种基于主属性判定的关联规则挖掘约简算法[J]. 计算机工程与科学, 2021, 43(04): 738-745.
[9]	文凯, 耿小海, 朱璐伟, 许萌萌, . 基于AO算法的数据流频繁项集挖掘[J]. 计算机工程与科学, 2020, 42(12): 2259-2264.
[10]	藏润强, 左美云, 郭鑫鑫. 基于Doc2Vec和BiLSTM的老年患者疾病预测研究[J]. 计算机工程与科学, 2020, 42(12): 2273-2279.
[11]	何望1,2，林果园1,2. 基于FP-Growth改进算法的云服务器故障数据分析[J]. 计算机工程与科学, 2020, 42(05): 770-775.
[12]	谭胜昔，贾金萍，赵斌，吉根林. 动态空间网络中的黑洞模式挖掘算法[J]. 计算机工程与科学, 2020, 42(02): 325-333.
[13]	廖纪勇，吴晟，刘爱莲. 基于布尔矩阵约简的Apriori算法改进研究[J]. 计算机工程与科学, 2019, 41(12): 2231-2238.
[14]	何登平1，2，3，何宗浩1,2，李培强1,2. 基于Spark的并行化高效用项集挖掘算法[J]. 计算机工程与科学, 2019, 41(10): 1723-1730.
[15]	陈胜发，贾瑞玉. 基于密度权重Canopy的改进K-medoids算法[J]. 计算机工程与科学, 2019, 41(10): 1823-1828.

基于向量内积的非频繁项挖掘算法研究

Study on Infrequent Itemsets Mining AlgorithmsBased on Vector Inner Product

PDF

可视化

摘要/Abstract

引用本文

使用本文

相关文章 15

编辑推荐

Metrics

本文评价