• 中国计算机学会会刊
  • 中国科技核心期刊
  • 中文核心期刊

J4 ›› 2014, Vol. 36 ›› Issue (05): 963-970.

• 论文 • 上一篇    下一篇

一种改进的数据流最大频繁项集挖掘算法

胡健,吴毛毛   

  1. (江西理工大学信息工程学院,江西 赣州 341000)
  • 收稿日期:2012-12-03 修回日期:2013-04-03 出版日期:2014-05-25 发布日期:2014-05-25

An improved algorithm for mining maximal
frequent itemsets over data streams    

HU Jian,WU Maomao   

  1. (Institute of Information Engineering,Jiangxi University of Science and Technology,Ganzhou 341000,China)
  • Received:2012-12-03 Revised:2013-04-03 Online:2014-05-25 Published:2014-05-25

摘要:

提出了一种基于DSMMFI算法的改进算法DSMMFIDS算法,它首先将事务数据按一定的全序关系存入DSFIlist列表中;然后按排序后的顺序存储到类似概要数据结构的树中;接着删除树中和DSFIlist列表中的非频繁项,同时删除窗口衰退支持数大的事务项;最后采用自顶向下和自底向上的双向搜索策略来挖掘数据流的最大频繁项集。通过用例分析和实验表明,该算法比DSMMFI算法具有更好的执行效率。

关键词: 数据挖掘, 数据流, 界标窗口, 最大频繁项集, 窗口衰减支持数

Abstract:

Based on the algorithm of DSMMFI, an improved algorithm, named DSMMFIDS (Dictionary Sequence Mining Maximal Frequent Itemsets over Data Streams), is proposed. Firstly, it stores transaction data into DSFIlist in alphabetical order. Secondly, the data are stored sequentially into the tree similar to the summary data structure. Thirdly, nonfrequent items in the tree and DSFIlist are removed, and the transaction items with the maximum count of window attenuation supports are deleted. Finally, the strategy (topdown and bottomup twoway search) is used to mine maximal frequent itemsets over data streams, and case analysis and experiments prove that the algorithm DSMMFIDS has better performance than the algorithm DSMMFI.
    

Key words: data mining;data stream;landmark windows;maximal frequent itemsets;window attenuation support count