• 中国计算机学会会刊
  • 中国科技核心期刊
  • 中文核心期刊

计算机工程与科学 ›› 2024, Vol. 46 ›› Issue (02): 244-252.

• 计算机网络与信息安全 • 上一篇    下一篇

面向Stacking算法的差分隐私保护研究

董燕灵1,2,3,张淑芬1,2,3,4,徐精诚1,2,3,王豪石1,2,3   

  1. (1.华北理工大学理学院,河北 唐山 063210;2.河北省数据科学与应用重点实验室,河北 唐山 063210;
    3.唐山市数据科学重点实验室,河北 唐山 063210;4.唐山市大数据安全与智能计算重点实验室,河北 唐山 063210)
  • 收稿日期:2023-07-14 修回日期:2023-09-12 接受日期:2024-02-25 出版日期:2024-02-25 发布日期:2024-02-24
  • 基金资助:
    国家自然科学基金(U20A20179)

Research on differential privacy protection for Stacking algorithm

DONG Yan-ling 1,2,3,ZHANG Shu-fen1,2,3,4,XU Jing-cheng1,2,3,WANG Hao-shi1,2,3   

  1. (1.College of Science,North China University of Science and Technology,Tangshan 063210;
    2.Hebei Key Laboratory of Data Science and Application,Tangshan 063210;
    3.Tangshan Key Laboratory of Data Science,Tangshan 063210;
    4.Tangshan Key Laboratory of Big Data Security and Intelligent Computing,Tangshan 063210,China)
  • Received:2023-07-14 Revised:2023-09-12 Accepted:2024-02-25 Online:2024-02-25 Published:2024-02-24

摘要: 为解决同质集成学习算法对噪声更敏感,难以兼顾较好的预测性能和有效的隐私保护这一问题,提出一种基于差分隐私的DPStacking算法,将异质Stacking算法与差分隐私技术相结合,以优化算法的隐私保护和预测性能。但是,由于Stacking算法的低层和高层模型都可以由不同的学习器构成,若对某个具体学习器设计隐私预算分配方案来提供差分隐私保护,该方案往往无法适用于由任意基学习器和元学习构成的Stacking算法。基于此,设计了一种基于元学习器的隐私预算分配方案,此方案根据皮尔逊相关系数及差分隐私并行组合的特性为元学习器输入的不同构成体分配不同的隐私预算。通过理论与实验验证,DPStacking算法符合ε-差分隐私保护,与基于差分隐私的随机森林算法(DiffRFs)、Adaboost算法(DP-AdaBoost)、XGBoost算法(DPXGB)相比,能有效保护数据隐私的同时拥有更好的预测性能,并较好地解决了单一同质集成学习算法对噪声更加敏感的问题。

关键词: 差分隐私, 隐私预算分配, Stacking算法, 集成学习

Abstract: In order to solve the problem that homogeneous ensemble learning algorithms are more sensitive to noise and difficult to take into account both better predictive performance and effective privacy protection, a DPStacking algorithm based on differential privacy is proposed. This algorithm combines heterogeneous Stacking algorithms with differential privacy technology to optimize the privacy protection and its predictive performance. However, since both the low-level and high-level models of the Stacking algorithm can be composed of different learners, if a privacy budget allocation scheme is designed for a particular learner to provide differential privacy protection, this scheme is often not applicable to Stacking algorithms composed of arbitrary base learners and meta-learners. Based on this, a privacy budget allocation scheme based on meta-learners is designed, which allocates different privacy budgets to different components of meta-learners according to the Pearson correlation coefficient and the characteristics of differential privacy parallel combination. Through theoretical and experimental verification, DPStacking algorithm satisfies  ε-differential privacy protection. Compared with differential privacy random forest algorithm (DiffRFs), Adaboost algorithm (DP-AdaBoost), XGBoost algorithm (DPXGB), it can effectively guarantee data privacy while having better predictive performance, and better solve the problem that single homogeneous ensemble learning algorithm is more sensitive to noise.

Key words: differential privacy, privacy budget allocation, Stacking algorithm, ensemble learning