一种高效的基于排序二叉树的数据流挖掘算法

J4 ›› 2008, Vol. 30 ›› Issue (11): 151-154.

一种高效的基于排序二叉树的数据流挖掘算法

何昭青[1,2]

出版日期:2008-11-01 发布日期:2010-05-19

Online:2008-11-01 Published:2010-05-19

摘要/Abstract

摘要：

数据流挖掘分类技术是数据挖掘领域非常具有挑战性的工作。VFDT利用Hoeffding不等式很好地解决了在数据流上进行单遍扫描获取高精度决策树的问题；VFDTc改进了V-FDT ，使其能够处理连续属性。基于VFDT和VFDTc，我们设计并实现了一种基于排序二叉树的高效算法V-FDT-BSTree。该算法解决了VFDTc中存在的问题，提高了样本动态插入和最佳划分节点选取的速度，从而提高了分类速度。实验结果表明，VFDT-BSTree在保持决策树大小和分类精度不变的基础上，执行时间相比VFDT平均减少32．25％，比VFDTc平均均减少24．96％。

关键词: 数据流排序二叉树连续属性

Abstract:

Data stream mining classification is a very challenging job in the field of data mining. VFDT is a one-pass algorithm for decision tree construction. It uses the Hoeffding inequality to achieve a probabilistie bound on the accuracy of the tree constructed. VFDTc improves VFDT, and make it be able to p rocess continuous attributes. Based on VFDT and VFDTc , we design and realize an efficient algorithm VFDT-BSTree based on binary search trees. The algor ithm solves the problems existing in VFDTc, and increases the speeds of dynamic sample insertion and best split node selection, and thus improves the sp eed of classification. The experimental results show that VFDT-BSTree＇s time is 32. 25% less than that of VFDT, and 24. 96% less than that of VFDTc on average, while the same tree size and accuracy are kept.

Key words: data streams, binary search tree, continuous attribute

何昭青[1,2]. 一种高效的基于排序二叉树的数据流挖掘算法[J]. J4, 2008, 30(11): 151-154.

编辑推荐

Metrics

阅读次数

全文

162

HTML			PDF

最新录用	在线预览	正式出版	最新录用	在线预览	正式出版
0	0	0	0	0	162

来源	本网站	其他网站

次数	129	33
比例	80%	20%

摘要

105

最新录用	在线预览	正式出版

0	0	105

	来源	本网站

	次数	105
	比例	100%