一种改进的数据流最大频繁项集挖掘算法

J4 ›› 2014, Vol. 36 ›› Issue (05): 963-970.

一种改进的数据流最大频繁项集挖掘算法

胡健，吴毛毛

（江西理工大学信息工程学院，江西赣州 341000）

收稿日期:2012-12-03 修回日期:2013-04-03 出版日期:2014-05-25 发布日期:2014-05-25

An improved algorithm for mining maximal
frequent itemsets over data streams

HU Jian,WU Maomao

(Institute of Information Engineering,Jiangxi University of Science and Technology,Ganzhou 341000,China)

Received:2012-12-03 Revised:2013-04-03 Online:2014-05-25 Published:2014-05-25

摘要/Abstract

摘要：

提出了一种基于DSMMFI算法的改进算法DSMMFIDS算法，它首先将事务数据按一定的全序关系存入DSFIlist列表中；然后按排序后的顺序存储到类似概要数据结构的树中；接着删除树中和DSFIlist列表中的非频繁项，同时删除窗口衰退支持数大的事务项；最后采用自顶向下和自底向上的双向搜索策略来挖掘数据流的最大频繁项集。通过用例分析和实验表明，该算法比DSMMFI算法具有更好的执行效率。

关键词: 数据挖掘, 数据流, 界标窗口, 最大频繁项集, 窗口衰减支持数

Abstract:

Based on the algorithm of DSMMFI, an improved algorithm, named DSMMFIDS (Dictionary Sequence Mining Maximal Frequent Itemsets over Data Streams), is proposed. Firstly, it stores transaction data into DSFIlist in alphabetical order. Secondly, the data are stored sequentially into the tree similar to the summary data structure. Thirdly, nonfrequent items in the tree and DSFIlist are removed, and the transaction items with the maximum count of window attenuation supports are deleted. Finally, the strategy (topdown and bottomup twoway search) is used to mine maximal frequent itemsets over data streams, and case analysis and experiments prove that the algorithm DSMMFIDS has better performance than the algorithm DSMMFI.

Key words: data mining;data stream;landmark windows;maximal frequent itemsets;window attenuation support count

胡健，吴毛毛. 一种改进的数据流最大频繁项集挖掘算法[J]. J4, 2014, 36(05): 963-970.

HU Jian,WU Maomao. An improved algorithm for mining maximal
frequent itemsets over data streams [J]. J4, 2014, 36(05): 963-970.

[1]	陈子雄, 陈旭, 景永俊, 宋吉飞. 基于图神经网络的源代码漏洞检测研究综述[J]. 计算机工程与科学, 2024, 46(10): 1775-1792.
[2]	李金熹, 尹首一, 魏少军, 胡杨. 基于MLIR的数据流模型[J]. 计算机工程与科学, 2024, 46(07): 1151-1157.
[3]	张家豪, 邓金易, 尹首一, 魏少军, 胡杨. 基于Actor模型的众核数据流硬件架构探索[J]. 计算机工程与科学, 2024, 46(06): 959-967.
[4]	赵琰, 马慧芳, 王文涛, 童海斌, 贺相春. 可靠响应表示增强的知识追踪方法[J]. 计算机工程与科学, 2024, 46(03): 535-544.
[5]	雷轩, 程光, 张玉健, 郭靓, 张付存. 基于电力网络态势感知平台的告警信息关联分析[J]. 计算机工程与科学, 2023, 45(07): 1197-1208.
[6]	王晨宇, 温浩珉, 郭晟楠, 林友芳, 万怀宇, . 面向快递员揽收到达时间预测的多任务深度时空网络[J]. 计算机工程与科学, 2023, 45(01): 136-144.
[7]	丁滟, 王闯, 冯了了, 王锋, 常俊胜. 基于区块链监管的联盟数据可信流通[J]. 计算机工程与科学, 2022, 44(10): 1771-1780.
[8]	程小刚, 郭韧, 周长利, . 基于理性密码学的分布式隐私保护数据挖掘框架[J]. 计算机工程与科学, 2022, 44(10): 1781-1787.
[9]	王文涛, 马慧芳, 舒跃育, 贺相春. 基于上下文表示的知识追踪方法[J]. 计算机工程与科学, 2022, 44(09): 1693-1701.
[10]	张喜龙, 韩萌, 陈志强, 武红鑫, 李慕航. 基于Hellinger距离的不平衡漂移数据流Boosting分类算法[J]. 计算机工程与科学, 2022, 44(05): 788-799.
[11]	乔冠杰, 吕高锋, 谭靖, 莫露莎. 大规模数据流统计中冷热流替换策略优化[J]. 计算机工程与科学, 2021, 43(09): 1567-1573.
[12]	刘云, 肖添. 网络日志数据中条件因果挖掘算法的优化研究[J]. 计算机工程与科学, 2021, 43(09): 1584-1590.
[13]	文凯, 许萌萌, 张许红, . 基于列表结构的加权可擦除项集挖掘算法[J]. 计算机工程与科学, 2021, 43(09): 1676-1683.
[14]	朱广林, 吕方, 赖庆宽, 陈华英, 何先波, . 编译器中激进蝴蝶优化方法的研究与实现[J]. 计算机工程与科学, 2021, 43(06): 962-968.
[15]	熊中敏, 汪博, 陶然, 郑宗生, 陈明, . 一种基于主属性判定的关联规则挖掘约简算法[J]. 计算机工程与科学, 2021, 43(04): 738-745.