• 中国计算机学会会刊
  • 中国科技核心期刊
  • 中文核心期刊

计算机工程与科学 ›› 2021, Vol. 43 ›› Issue (08): 1387-1397.

• 计算机网络与信息安全 • 上一篇    下一篇

基于联邦集成算法对多源数据安全性的研究

罗长银1,2,3,陈学斌1,2,3,刘洋1,2,3,张淑芬1,2,3   

  1. (1.华北理工大学理学院,河北 唐山 063210;2.河北省数据科学与应用重点实验室,河北 唐山063210;

    3.唐山市数据科学重点实验室,河北 唐山 063210)

  • 收稿日期:2020-06-15 修回日期:2020-09-09 接受日期:2021-08-25 出版日期:2021-08-25 发布日期:2021-08-24
  • 基金资助:
    国家自然科学基金(61572170,61170254);唐山市科技项目(18120203A)

A federated ensemble algorithm for multi-source data security

LUO Chang-yin1,2,3,CHEN Xue-bin1,2,3,LIU Yang1,2,3,ZHANG Shu-fen1,2,3#br#

#br#
  

  1. (1.College of Science,North China University of Science and Technology,Tangshan 063210;

    2.Hebei Key Laboratory of Data Science and Application,Tangshan 063210;

    3.Tangshan Key Laboratory of Data Science,Tangshan 063210,China)
  • Received:2020-06-15 Revised:2020-09-09 Accepted:2021-08-25 Online:2021-08-25 Published:2021-08-24

摘要: 联邦学习是隐私保护领域关注的热点内容,存在难以集中本地模型参数与因梯度更新造成数据泄露的问题。提出了一种联邦集成算法,使用256 B的密钥将不同类型的初始化模型传输至各数据源并训练,使用不同的集成算法来整合本地模型参数,使数据与模型的安全性得到很大提升。仿真结果表明,对于中小数据集而言,使用Adaboost集成算法得到的模型准确率达到92.505%,标准差约为8.6×10-8,对于大数据集而言,采用stacking集成算法得到的模型的准确率达到92.495%,标准差约为8.85×10-8,与传统整合多方数据集中训练模型的方法相比,在保证准确率的同时兼顾了数据与模型的安全性。

关键词: 联邦学习, 集成算法, 隐私保护, 联邦集成算法

Abstract: Federated learning is a hot topic in the field of privacy protection, and it has a problem that it is difficult to concentrate local model parameters and data leakage due to gradient updates. This paper proposes a federated ensemble algorithm. The proposal uses a 256-byte key to transfer different types of initialization models to various data sources and do the training, and uses different ensemble algorithms to integrate local model parameters to ensure the security of the data and the model, thus greatly improving the security of data and model. Simulation results show that, for small and medium data sets, the accuracy of the model obtained by the adaboost integration algorithm reaches 92.505%, and the variance is about 8.6×10-8. For large data sets, the accuracy of the model obtained by the stacking ensemble algorithm reaches 92.495%, and the variance is about 8.85×10-8. Compared with the traditional method of training the model with integrated data, the proposal ensures the accuracy while taking into account the data and the model safety. 

Key words: federated learning, ensemble algorithm, privacy protection, federated ensemble algorithm