• 中国计算机学会会刊
  • 中国科技核心期刊
  • 中文核心期刊

Computer Engineering & Science

Previous Articles     Next Articles

Hierarchical navigable small world graph
algorithms based on quantization coding

LI Qiuzhen1,BAI Xingqiang2,LI Lixia1,WANG Ying1   

  1. (1.Wuhan Digital Engineering Institute,Wuhan 430074;
    (2.School of Computer Science and Technology,Huazhong University of Science and Technology,Wuhan 430074,China)
  • Received:2018-07-03 Revised:2018-09-03 Online:2019-04-25 Published:2019-04-25

Abstract:

With the rapid development of big data and artificial intelligence, structured processing and content-based retrieval for multimedia data have received attention. Facing the massive high-dimensional feature vectors after multimedia data structuralization, how to achieve fast and accurate search results is a problem that artificial intelligence must solve when dealing with largescale data. The recently proposed hierarchical navigable small world (HNSW) graph retrieval algorithm achieves the best performance in a number of public data sets. However, the algorithm suffers large memory overhead. Retrieval algorithms based on quantization coding can greatly compress the vectors of data sets and greatly reduce memory usage. Combining quantization coding with the hierarchical navigable smallworld graph algorithm, we propose two improved hierarchical navigable small world graph algorithms based on quantization coding, namely HNSWSQ algorithm using scalar quantization-encoding vectors and HNSWPQ algorithm using productquantized coding vectors. The two algorithms adopt different quantization strategies to encode and store the original vectors to reduce memory overhead, and then the HNSW algorithm is used to create an index to reduce the timeconsumption of the retrieval. The HNSWSQ algorithm has similar recall rate and average retrieval time to those of the HNSW algorithm on multiple data sets, and the memory overhead is greatly reduced. Experimental results show that compared with the HNSW algorithm, the memory overhead of the HNSWSQ algorithm on the SIFT-1M and GIST-1M data sets is reduced by 45.1% and 70.4%, respectively.

Key words: