• 中国计算机学会会刊
  • 中国科技核心期刊
  • 中文核心期刊

Computer Engineering & Science ›› 2022, Vol. 44 ›› Issue (08): 1402-1408.

• Computer Network and Znformation Security • Previous Articles     Next Articles

A Stacking ensemble clustering algorithm based on differential privacy protection

LI Shuai1,2,CHANG Jin-cai1,2,LI-L Mu-zhi1,2,CAI Kun-jie1,2   

  1. (1.College of Science,North China University of Science and Technology,Tangshan 063210;
    2.Hebei Provincial Key Laboratory of Data Science and Application,Tangshan 063210,China)
  • Received:2021-07-21 Revised:2021-09-17 Accepted:2022-08-25 Online:2022-08-25 Published:2022-08-25

Abstract: Aiming at the problem that the accuracy and security of the single clustering algorithm under differential privacy protection are insufficient, a stacking ensemble clustering algorithm based on differential privacy protection is proposed. Stacking is used to integrate a variety of heterogeneous clustering algorithms. K-means clustering, birch hierarchical clustering, spectral clustering and gaussian mixture clustering are used as primary clustering algorithms. By combining the contour coefficient, the clustering results generated by the primary clustering algorithms are weighted into the original data. K-means algorithm is used as the secondary clustering algorithm to cluster the expanded data set. According to the clustering results of the original data and the primary clustering algorithms, adaptive ε  functions are proposed to determine the privacy budget, and different degrees of Laplace noise are allocated to the data with different sensitivities. Theoretical analysis and experimental results show that, compared with the single clustering algorithm, the proposed algorithm can effectively improve the clustering accuracy while satisfying the  ε-differential privacy protection, and achieve a high balance between privacy protection and data availability.

Key words: differential privacy, ensemble clustering, Stacking algorithm, self-adaption ε, privacy protection