海量文本数据库中的高效并行频繁项集挖掘方法

J4 ›› 2007, Vol. 29 ›› Issue (9): 110-113.

海量文本数据库中的高效并行频繁项集挖掘方法

王永恒杨树强贾焰

出版日期:2007-09-01 发布日期:2010-06-02

Online:2007-09-01 Published:2010-06-02

摘要/Abstract

摘要：

针对大规模文本数据库中频繁项集挖掘的特殊要求，本文提出了一种新的并行挖掘算法parFIM。parFIM以一种简单的数据结构H-Struct为基础，对数据进行纵向划分从而实现并行挖掘。算法同时考虑了去除短模式和减少重复模式。实验结果表明，parFIM能够很好地适用于大规模文本数据库中的频繁项集挖掘任务。

关键词: 文本挖掘海量文本数据库频繁项集并行

Abstract:

Frequent itemset mining is a common and useful task in data mining. It is also important in text mining. But most of the current mining algorithms can not be used in very large text databases. In order to solve the special problems in frequent itemsets mining in very large text databases,we propose a new parallel mining algorithm parFIM. Based on a simple data structure H-Struct, parFIM mines in parallel by partitioning data vertically. Removing short patterns and reducing duplicated patterns are also considered. Our experiment shows parFIM can suit the frequent itemset mining task well in very large text databases.

Key words: （text mining, very large text database;frequent itemset, parallel）

王永恒杨树强贾焰. 海量文本数据库中的高效并行频繁项集挖掘方法[J]. J4, 2007, 29(9): 110-113.

海量文本数据库中的高效并行频繁项集挖掘方法

PDF

可视化

摘要/Abstract

引用本文

使用本文

相关文章 0

编辑推荐

Metrics

本文评价