• 中国计算机学会会刊
  • 中国科技核心期刊
  • 中文核心期刊

J4 ›› 2016, Vol. 38 ›› Issue (02): 305-311.

• 论文 • 上一篇    下一篇

一种适用连续不确定XML数据的索引

张晓琳,郭丹丹,郝琨   

  1. (内蒙古科技大学信息工程学院,内蒙古 包头 014010)
  • 收稿日期:2015-01-27 修回日期:2015-04-06 出版日期:2016-02-25 发布日期:2016-02-25
  • 基金资助:

    国家自然科学基金(61163015);内蒙古自然科学基金(2013MS0909)

An efficient index for continuous uncertain XML data  

ZHANG Xiaolin,GUO Dandan,HAO Kun   

  1. (School of Information Engineering,Inner Mongolia University of Science and Technology,Baotou 014010,China)
  • Received:2015-01-27 Revised:2015-04-06 Online:2016-02-25 Published:2016-02-25

摘要:

针对连续不确定XML数据概率阈值范围查询,提出一种新的CUXI索引树。该索引树的构建方法是借鉴U树对空间数据自顶向下递归构建索引树的思想,将连续不确定XML文档中具有相同父亲的叶子节点构建二维数据矩形,在聚类的基础上来构建相应的CUXI索引树,其中叶子节点存储连续不确定数据辅助信息。为了提高查询效率,对连续不确定数据制定了过滤策略,通过遍历索引树过滤掉不满足查询范围的子树。理论和实验结果表明,此索引技术可提高查询处理的性能。

关键词: 连续不确定XML, 概率阈值范围查询, CUXI索引树, 二维数据矩形, 过滤

Abstract:

At present, the uncertain XML index is not completely applicable to continuous uncertain XML data. We propose a continuous uncertain XML index (CUXI) algorithm to support probability threshold range query of continuous uncertain XML data. The algorithm refers to the idea of U Tree, which builds the spatial data index tree in a recursively topdown way. The CUXI index tree constructs a twodimensional data rectangle with the same father’s leaf nodes in XML documents, and the index tree is built accordingly based on the clustering. Leaf nodes calculate in advance and stores some related information of continuous uncertain data. In order to improve query efficiency, a filtering strategy of continuous uncertain data is introduced. When querying, it walks through the index tree to filter the subtrees that do not meet the query range. Experimental results show that the proposed index technique can improve query processing performance to a certain extent.

Key words: continuous uncertain XML;probability threshold range query;CUXI index tree;twodimensional data rectangle;filter