H-PCPIR-V：基于Huffman编码的PCPIR-V优化算法

计算机工程与科学

H-PCPIR-V：基于Huffman编码的PCPIR-V优化算法

王波涛，李昂，陈月梅，邓诗卓，常博涵，吴俊学

（东北大学计算机科学与工程学院,辽宁沈阳 110169）

收稿日期:2017-09-11 修回日期:2017-11-08 出版日期:2018-03-25 发布日期:2018-03-25
基金资助:
国家自然科学基金（61173030）

H-PCPIR-V：An optimized PCPIR-V

algorithm based on Huffman code

WANG Botao，LI Ang，CHEN Yuemei，DENG Shizhuo，CHANG Bohan，WU Junxue

（College of Computer Science and Engineering,Northeastern University,Shenyang 110169,China）

Received:2017-09-11 Revised:2017-11-08 Online:2018-03-25 Published:2018-03-25

摘要/Abstract

摘要：

隐私问题受到越来越多的关注，基于计算的私有信息检索（CPIR）的隐私保护技术允许用户从服务提供商检索数据并且不会泄露查询信息。但是，对于大规模应用，隐私保护技术与可用性之间存在较大差距。针对CPIR算法计算量大、计算时间长而不适合应用于大规模数据隐私保护的问题，提出了基于Spark和Huffman编码的CPIR最近邻查询隐私保护算法（HPCPIR-V）。HPCPIRV算法主要是在数据预处理阶段将最近邻矩阵使用Huffman编码进行压缩减少计算位数，然后通过压缩后矩阵中元素的最大位数对其他元素进行补位，在服务端使用Spark并行框架对查询网格进行并行计算。通过对比实验及实验结果分析发现，相比PCPIR-V算法，H-PCPIRV算法在服务端的计算代价下降30%左右，客户端的计算代价下降10%左右，通信代价下降40%左右。

关键词: 查询隐私保护, 基于计算能力的私有信息检索, 哈夫曼编码, 最近邻查询

Abstract:

With the privacy issues drawing more and more concerns, privacy protection techniques based on Computational Private Information Retrieval (CPIR) allow a user to retrieve data from a service provider without revealing the users query information. For largescale applications, there exists a gap between privacy protection techniques and its feasibility. For the problem that the CPIR algorithm needs long computing time so as not to be suitable for largescale data privacy protection, this paper proposes a CPIR nearest neighbor privacy protection algorithm (HPCPIRV) based on Spark and Huffman code. The HPCPIRV algorithm partitions the spatial data into Voronoi diagrams according to the points of interest in the data preprocessing stage, and then utilizes the Huffman code to compress the candidate data in order to reduce bit computation operation. Spark parallel framework is used for query grid parallel computing in the server side. The experimental results show that the computational cost of HPCPIRV algorithm is about 30% lower than that of PCPIRV algorithm on the server side, the computational cost of client is about 10% lower, and the communication cost is about 40% lower.

Key words: query privacy protection, computational private information retrieval, Huffman code, nearest neighbor query

王波涛，李昂，陈月梅，邓诗卓，常博涵，吴俊学. H-PCPIR-V：基于Huffman编码的PCPIR-V优化算法[J]. 计算机工程与科学.

WANG Botao，LI Ang，CHEN Yuemei，DENG Shizhuo，CHANG Bohan，WU Junxue.

H-PCPIR-V：An optimized PCPIR-V

algorithm based on Huffman code

[J]. Computer Engineering & Science.

[1]	郑道文1, 周一开1, 唐忆滨2, 3, 刘博生1, 武继刚1. ReHuff：基于ReRAM的Huffman编码硬件结构设计[J]. 计算机工程与科学, 2025, 47(6): 988-997.
[2]	王兆丽，肖春霞，王梅娟，蒋园园，董会. 哈夫曼编码在图像无损压缩中的应用[J]. 计算机工程与科学, 2019, 41(增刊S1): 200-202.
[3]	许彬彬，戴清平，朱敏，谢端强. 基于哈夫曼编码的稀疏矩阵的存储与计算[J]. J4, 2013, 35(11): 134-138.