• 中国计算机学会会刊
  • 中国科技核心期刊
  • 中文核心期刊

计算机工程与科学 ›› 2024, Vol. 46 ›› Issue (08): 1381-1389.

• 高性能计算 • 上一篇    下一篇

一种基于动态空间划分和压缩布隆过滤器相结合的分布式元数据负载均衡算法#br#

薛梅婷1,俞万刚2,张纪林1,曾艳2,袁俊峰2,周丽2   

  1. (1.杭州电子科技大学网络空间安全学院,浙江 杭州 310018;2.杭州电子科技大学计算机学院,浙江 杭州 310018)
  • 收稿日期:2023-11-03 修回日期:2023-12-29 接受日期:2024-08-25 出版日期:2024-08-25 发布日期:2024-09-02
  • 基金资助:
    浙江省重点研发计划(2023C03194,2024C01211);浙江省自然科学基金(LQ23F020015,LTGG24F020007)

A distributed metadata load balancing algorithm based on dynamic space partitioning and compressed Bloom filter

XUE Mei-ting1,YU Wan-gang2,ZHANG Ji-lin1,ZENG Yan2,YUAN Jun-feng2,ZHOU Li2   

  1. (1.School of Cyberspace Security,Hangzhou Dianzi University,Hangzhou 310018;
    2.School of Computer Science,Hangzhou Dianzi University,Hangzhou 310018,China)
  • Received:2023-11-03 Revised:2023-12-29 Accepted:2024-08-25 Online:2024-08-25 Published:2024-09-02

摘要: 分布式元数据管理系统利用多个元数据服务器对大量元数据进行存储和管理。该系统将海量元数据通过不同的映射策略分配到不同的元数据服务器上,减少单台元数据服务器所处理的数据量,从而减少磁盘访问次数,进而提高整个元数据管理系统的性能。元数据管理系统通常会使用哈希函数将元数据键映射到不同的元数据服务器中。然而,当数据特征值相似时,由于散列函数的单向性,会导致数据分布不均衡的问题,造成元数据服务器性能下降。为解决上述问题,提出了一种动态空间划分和压缩布隆过滤器相结合的元数据负载均衡算法,该算法首先构建一个哈希桶来组织元数据键,通过哈希算法将元数据键映射到不同的哈希桶中;在映射过程中,根据元数据服务器的负载情况动态调整目标哈希桶,并在上述哈希桶中有序地保存元数据键的映射信息。当访问元数据时,首先通过压缩布隆过滤器对元数据键进行预处理,然后通过二分查找在指定的哈希桶中进行元数据映射信息的查找。与近年来提出的元数据管理算法相比,所提算法在映射键发生倾斜时仍能保证元数据服务器负载均衡,并通过对比实验表明,所提算法相比最优的元数据管理算法,在内存占用仅提升2%的条件下,获得了20%的搜索性能提升。

关键词: 分布式元数据管理, 负载均衡算法, 一致性哈希, 压缩布隆过滤器

Abstract: The distributed metadata management system utilizes multiple metadata servers (MDS) to store and manage a large amount of metadata. The system reduces the data load on each individual MDS by employing different mapping strategies to distribute the massive metadata across multiple MDS, thus minimizing the disk access frequency and improving the overall performance of the metadata management system. Typically, a hash function is used to map metadata keys to different MDS. However, when the feature values of the data are similar, the one-way nature of the hash function can result in data distribution imbalance, leading to performance degradation of the MDS. To address the performance degradation issue caused by uneven data distribution, this paper proposes a dynamic spatial partitioning and compressed Bloom filter-based metadata load balancing algorithm. The algorithm first constructs a hash bucket to organize the metadata keys, mapping the keys to different hash buckets using a hash algorithm. During the mapping process, the target hash bucket is dynamically adjusted based on the load condition of the MDS, and the mapping information of the metadata keys is sequentially stored within the corresponding hash bucket. When accessing metadata, the algorithm preprocesses the metadata keys using a compressed Bloom filter, and then performs a binary search within the specified hash bucket to retrieve the mapping information. Compared to recent metadata management algorithms, the proposed algorithm ensures load balancing of MDS even when key skewness occurs. Experimental results show that the algorithm achieves a 20% improvement in search performance compared to the optimal metadata management algorithm, with only a 2% increase in memory consumption.

Key words: distributed metadata management, load balancing algorithm, consistent hashing, compressed Bloom filter ,