• 中国计算机学会会刊
  • 中国科技核心期刊
  • 中文核心期刊

计算机工程与科学

• 高性能计算 • 上一篇    下一篇

量化编码的分层可通航小世界图算法

李秋珍1,白兴强2,李立夏1,王赢1   

  1. (1.武汉数字工程研究所,湖北 武汉 430074;2.华中科技大学计算机科学与技术学院,湖北 武汉 430074)
  • 收稿日期:2018-07-03 修回日期:2018-09-03 出版日期:2019-04-25 发布日期:2019-04-25
  • 基金资助:

    军委装备发展部科研订购局“十三五”装备预研领域基金(61401320501)

Hierarchical navigable small world graph
algorithms based on quantization coding

LI Qiuzhen1,BAI Xingqiang2,LI Lixia1,WANG Ying1   

  1. (1.Wuhan Digital Engineering Institute,Wuhan 430074;
    (2.School of Computer Science and Technology,Huazhong University of Science and Technology,Wuhan 430074,China)
  • Received:2018-07-03 Revised:2018-09-03 Online:2019-04-25 Published:2019-04-25

摘要:

随着大数据和人工智能的高速发展,针对多媒体数据的结构化处理与基于内容的检索受到极大的关注,面对多媒体数据结构化后的海量高维特征向量,如何快速、准确地检索是人工智能处理大规模数据所必须解决的问题。最近提出的分层可通航小世界图HNSW检索算法在多个公开数据集取得了最佳的性能表现,但该算法存在内存开销大的问题。而基于量化编码的检索算法能够压缩数据集向量,大幅度降低内存占用。将量化编码和分层可通航小世界图算法结合,提出了2种基于量化编码改进的HNSW算法,分别是使用标量量化编码向量的HNSWSQ算法和使用乘积量化编码向量的HNSWPQ算法,2种算法使用不同的量化策略存储原始向量编码,以降低内存开销,再通过HNSW算法建立索引达到缩短检索耗时的目的。其中HNSWSQ算法在多个数据集上获得了与HNSW算法相近的查全率和平均检索耗时,而内存开销大幅降低。实验结果表明,HNSWSQ算法在SIFT1M和GIST1M数据集上的内存开销比HNSW算法分别降低了45.1%和70.4%。
 
 

关键词: 近似最近邻检索, 分层可通航小世界图算法, 乘积量化, 标量量化, 相似性搜索, 高维数据索引

Abstract:

With the rapid development of big data and artificial intelligence, structured processing and content-based retrieval for multimedia data have received attention. Facing the massive high-dimensional feature vectors after multimedia data structuralization, how to achieve fast and accurate search results is a problem that artificial intelligence must solve when dealing with largescale data. The recently proposed hierarchical navigable small world (HNSW) graph retrieval algorithm achieves the best performance in a number of public data sets. However, the algorithm suffers large memory overhead. Retrieval algorithms based on quantization coding can greatly compress the vectors of data sets and greatly reduce memory usage. Combining quantization coding with the hierarchical navigable smallworld graph algorithm, we propose two improved hierarchical navigable small world graph algorithms based on quantization coding, namely HNSWSQ algorithm using scalar quantization-encoding vectors and HNSWPQ algorithm using productquantized coding vectors. The two algorithms adopt different quantization strategies to encode and store the original vectors to reduce memory overhead, and then the HNSW algorithm is used to create an index to reduce the timeconsumption of the retrieval. The HNSWSQ algorithm has similar recall rate and average retrieval time to those of the HNSW algorithm on multiple data sets, and the memory overhead is greatly reduced. Experimental results show that compared with the HNSW algorithm, the memory overhead of the HNSWSQ algorithm on the SIFT-1M and GIST-1M data sets is reduced by 45.1% and 70.4%, respectively.

Key words: