• 中国计算机学会会刊
  • 中国科技核心期刊
  • 中文核心期刊

J4 ›› 2007, Vol. 29 ›› Issue (10): 65-67.

• 论文 • 上一篇    下一篇

一种挖掘XML文档频繁子树的方法

傅珊珊 吴扬扬   

  • 出版日期:2007-10-01 发布日期:2010-06-02

  • Online:2007-10-01 Published:2010-06-02

摘要:

本文主要研究从由带标签有序树构成的森林中挖掘嵌入式频繁子树,具体做法是:首先对XML文档进行预处理,生成最简结构树SST,然后从SST中挖掘出频繁子树。本文提出了SS TMiner算法,该算法针对TreeMiner算法存在的瓶颈问题,结合当前所处理的SST的结构特点进行改进,进一步提高了算法执行的效率。实验证明,本文提出的方法能够准确高效地
  地挖掘出XML文档中的频繁子树。

关键词: XML 频繁子树 TreeMiner

Abstract:

This paper studies the problem of mining embedded subtrees in a forest of labeled and ordered trees. The method is that we first preprocess XML docume nts to get SSTs (Simplest Structural Trees) and then mine frequent trees in SSTs. In this paper, we improve TreeMiner by breaking the bottleneck of Tr reeMiner and present an algorithm called SST- Miner. The experiments show that this method is efficient to mine frequent trees in XML documents.

Key words: (XML, frequent tree, TreeMiner)