• 中国计算机学会会刊
  • 中国科技核心期刊
  • 中文核心期刊

J4 ›› 2014, Vol. 36 ›› Issue (01): 176-185.

• 论文 • 上一篇    下一篇

k-匿名数据上的聚集查询及其性质

张君宝,刘国华,王碧颖,王梅,王羽婷,石丹妮,翟红敏   

  1. (东华大学计算机科学与技术学院,上海 201620)
  • 收稿日期:2013-08-10 修回日期:2013-10-20 出版日期:2014-01-25 发布日期:2014-01-25
  • 基金资助:

    国家自然科学基金资助项目(61070032,61103046)

Aggregate query and its properties over kanonymous data         

ZHANG Junbao,LIU Guohua,WANG Biying,WANG Mei,WANG Yuting,SHI Danni,ZHAI Hongmin   

  1. (School of Computer Science and Technology,Donghua University,Shanghai 201620,China)
  • Received:2013-08-10 Revised:2013-10-20 Online:2014-01-25 Published:2014-01-25

摘要:

k-匿名数据中存在大量的有用信息,如何从k-匿名数据中得到有用的知识是目前亟待解决的问题。OLAP是知识发现的主要手段,聚集查询是OLAP的关键操作。为了解决k-匿名数据聚集查询问题,首先,给出了描述k-匿名数据的数据模型。其次,将聚集查询分为两个阶段,在第一阶段,给出k-匿名数据满足的性质和独立属性集的概念,利用k-匿名的性质和独立属性集给出求解满足查询约束的值和概率集合的算法,并将该集合作为第二阶段的输入。在第二阶段,给出聚集查询的语义。为了满足用户不同的查询需求,给出WITH子句约束及不同WITH子句约束的语义,作为聚集查询的第一阶段的补充。最后,讨论了聚集查询的性质,并用实验验证了查询的有效性。

关键词: 数据共享;OLAP;隐私保护; k-匿名;聚集查询

Abstract:

A great deal of information exists in kanonymous data. How to get useful information from kanonymous data is an urgent pending problem. OLAP (OnLine Analytical Processing) is the main approach of knowledge discovery, and the aggregate query is the key operation of OLAP. In order to solve the problem of aggregate query over kanonymous data, firstly, the definition of data model describing kanonymous data is given. Secondly, the aggregate query is separated into two phases. On the first phase, the properties of kanonymous data satisfication and the notion of Independent Attribute Set is presented. Using these properties and the Independent Attribute Set, an algorithm is given to compute the set of value and its probability that satisfy the query constraint, and then take the set as the input of second phase. On the second phase, the semantics of the aggregate query over kanonymous data are defined. In order to meet user’s different query, the definition and the semantic of WITH clause constraint is given as a supplement to first phase. At last, properties of the aggregate query are shown and an experiment is done to prove the validity of our method.

Key words: data sharing;on-line analytical processing;privacy preserve;kanonymity;aggregate query