• 中国计算机学会会刊
  • 中国科技核心期刊
  • 中文核心期刊

J4 ›› 2012, Vol. 34 ›› Issue (12): 134-139.

• 论文 • 上一篇    下一篇

连续概率XML数据查询处理技术

张晓琳,郑珍珍,刘立新,李玉峰   

  1. (内蒙古科技大学信息工程学院,内蒙古 包头 014010)
  • 收稿日期:2011-05-05 修回日期:2011-09-19 出版日期:2012-12-25 发布日期:2012-12-25
  • 基金资助:

    国家自然科学基金资助项目(61163015);内蒙古自然科学基金重点项目(20080404Zd21)

Query Processing Technology on Continuous Probabilistic XML

ZHANG Xiaolin,ZHENG Zhenzhen,LIU Lixin,LI Yufeng   

  1. (School of Information Engineering,Inner Mongolia University of Science and Technology,Baotou 014010,China)
  • Received:2011-05-05 Revised:2011-09-19 Online:2012-12-25 Published:2012-12-25

摘要:

目前查询连续概率XML数据多采用离散化方法,需要处理大量直方图分段,查询效率较低。本文提出了一种基于p文档模型的连续概率XML数据查询处理技术,首先利用cont节点扩展p文档模型支持任意的连续分布,在cont节点中编码概率密度函数以及他们的参数;其次采用twig模式匹配找到符合用户要求的路径;然后根据要查询的连续分布类型确定概率查询应该使用符号表示法、积分法或直方图近似法:标准连续分布通过符号表示法中的参数或复杂的累积分布函数计算查询结果,满足积分条件的非标准连续分布采用积分法,其它情况采用直方图近似法。实验结果表明,该方法在概率查询的精确度以及响应时间上比现有方法更高效。

关键词: p-文档模型, 概率XML, 连续分布, 查询处理

Abstract:

At present,most methods of querying the continuous probabilistic XML are discretized.They are not very efficient because the query operators have to process a large number of histogram segments during the query execution.A continuous probabilistic XML query processing technology based on the pdocument model is proposed.Firstly,the pdocument model is expanded to support any continuous distribution by cont node,and the probability density functions and their parameters are encoded in cont node. Secondly, the path that meet user's requirements is found by using the twig pattern match,and then whether a probability query should be executed is decided by using the symbolic form,histograms or using integrals according to the type of continuous distributions to be queried. Standard continuous distributions use the parameters of the symbolic representation in conjunction with some sophisticated functions to compute a query answer,nonstandard continuous distributions that meet integral condition adopt the integral method,and other distributions use the histograms approximating. Experimental results show that this approach has a higher efficiency on both accuracy and response time than the existing approach.

Key words: p-document model;probabilistic XML;continuous distribution;query process