• 中国计算机学会会刊
  • 中国科技核心期刊
  • 中文核心期刊

Computer Engineering & Science ›› 2024, Vol. 46 ›› Issue (07): 1193-1201.

• High Performance Computing • Previous Articles     Next Articles

Learning indexing method for massive high-dimensional data based on partitioned hierarchical graph

HUA Yue-lin1,2,3,ZHOU Xiao-lei2,3,FAN Qiang2,3,WANG Fang-xiao2,3,YAN Hao2,3   

  1. (1.School of Computer Science,Nanjing University of Information Science & Technology,Nanjing 210044;
    2.The 63rd Research Institute,National University of Defense Technology,Nanjing 210007;
    3.Laboratory for Big Data and Decision,National University of Defense Technology,Changsha 410073,China)
  • Received:2023-10-12 Revised:2023-11-21 Accepted:2024-07-25 Online:2024-07-25 Published:2024-07-18

Abstract: Learning to index is the key to solving the problem of approximate nearest neighbor search in massive high-dimensional data. However, existing learning to index techniques are limited to individual partitions and rely on the construction of neighborhood graph. As the dimensionality and scale of data grow, indexing struggles to accurately judge boundary data of partitions, leading to increased construction time complexity and challenges in scalability. To address the above problems, a learn to index method based on partitioned hierarchical graphs, PBO-HNSW is proposed. The method redistributes partition boundary data and constructs distributed graph index structures in parallel. It effectively addresses the challenges faced by the approximate nearest neighbor search problem. Experimental results show that PBO-HNSW method is able to achieve millisecond index construction on millions of massive high-dimensional data. When the recall is 0.93, the construction time of the PBO-HNSW method is only 36.4%  of baseline methods.


Key words: approximate nearest neighbor search, learning to index, hierarchical navigable small world (HNSW), partition learning, index structure