• 中国计算机学会会刊
  • 中国科技核心期刊
  • 中文核心期刊

J4 ›› 2013, Vol. 35 ›› Issue (10): 79-88.

• 论文 • 上一篇    下一篇

均匀分布下不确定数据的关联规则变粒度查询

陈爱东,刘国华,肖瑞,万小妹,石丹妮   

  1. (东华大学计算机科学与技术学院,上海 201600)
  • 收稿日期:2013-05-10 修回日期:2013-08-12 出版日期:2013-10-25 发布日期:2013-10-25
  • 基金资助:

    国家自然科学基金资助项目(61070032)

Granularity transform query on association rules
that mined from uniform distributed uncertain data

CHEN Aidong,LIU Guohua,XIAO Rui,WAN Xiaomei,SHI Danni   

  1. (School of Computer Science and Technology,Donghua University,Shanghai 201600,China)
  • Received:2013-05-10 Revised:2013-08-12 Online:2013-10-25 Published:2013-10-25

摘要:

云计算为大数据的关联规则挖掘与查询提供了平台。为防止隐私泄漏,大数据中往往包含人为添加的不确定因素,如何使用户对不确定数据的关联规则挖掘结果查询透明化是大数据挖掘结果查询亟待解决的问题。在用于共享的大数据中,不确定数据通过对精确数据的泛化处理来实现,具有均匀分布特性,这一特性不利于精确查询,但可为关联规则挖掘结果集的变粒度查询提供便利。首先,通过UFIDM算法进行挖掘并构建关联规则库,为提高查询效率,对泛化标识符和敏感属性分别构建Hilbert packed R树索引。在此基础上,提出了泛化值粒度转换方法和UARS查询算法。最后,通过理论分析和实验比对,展示了算法的可行性和有效性。

关键词: 大数据, 均匀分布不确定数据, 关联规则, 变粒度查询

Abstract:

Cloud computing provides the platform for the associate rule mining and query of big data. Data often contains artificially added uncertainty to prevent the information disclosure. How to allow users to query the result of association rules mining from uncertain data transparently is an urgent problem to be solved in the query of big data mining results. The uncertain big data for sharing achieves uniform distributed characteristic through generalizing precise data, this characteristic is not conductive to accurate queries but can offer convenience for the query on association rules mining result set. Firstly, the association rule library is built by UFIDM algorithm and the Rtree indexes are constructed for both generalized identifiers and sensitive attributes separately in order to improve the query efficiency. Secondly, the generalization value granularity transform method and UARS query algorithm are proposed on this basis. Finally, theoretical analysis and experimental results demonstrate the feasibility and effectiveness of the algorithm.

Key words: big data;uniform distributed uncertain data;association rules;granularity transform query