• 中国计算机学会会刊
  • 中国科技核心期刊
  • 中文核心期刊

Computer Engineering & Science ›› 2024, Vol. 46 ›› Issue (07): 1237-1244.

• Computer Network and Znformation Security • Previous Articles     Next Articles

A data heterogeneity processing method based on asynchronous hierarchical federated learning

GUO Chang-hao,TANG Xiang-yun,WENG Yu   

  1. (School of Information Engineering,Minzu University of China,Beijing 100081,China)
  • Received:2023-10-20 Revised:2023-11-23 Accepted:2024-07-25 Online:2024-07-25 Published:2024-07-19

Abstract: In the era of ubiquitous Internet of Things devices, a vast amount of data with varying distributions and volumes is continuously generated, leading to pervasive data heterogeneity. Addressing the challenges of federated learning for intelligent devices in the IoT landscape, traditional synchronous federated learning mechanisms fall short in effectively tackling the NON-IID data distribution problem. Moreover, they are plagued by issues such as single-point failures and the complexity of maintaining a global clock. However, asynchronous mechanisms may introduce additional communication overhead and obsolescence due to NON-IID data distribution. To offer a more flexible solution to these chal- lenges, an asynchronous hierarchical federated learning  method is proposed. Initially, the BIRCH algorithm is employed to analyze the data distribution across various IoT nodes, leading to the formation of clusters. Subsequently, data within these clusters is dissected and validated to identify nodes with high data quality. Nodes from high-quality clusters are then disaggregated and reorganized into lower-quality clusters, forming new, optimized clusters. Finally, a two-stage model training is conducted, involving both intra-cluster and global aggregation. Additionally, our proposed approach is evaluated using the MNIST dataset. The results show that, compared to the baseline set by the classical FedAVG method, the proposed approach achieves faster convergence on NON-IID datasets and improves model accuracy by more than 15%.

Key words: Internet of Things (IoT), federated learning, asynchronous federated learning, hierarchical federated learning, non-independent and identically distributed data, data distribution