Efficient approximate nearest neighbor search on high-dimensional vectors by graph and quantization

Approximate nearest neighbor (ANN) search in high-dimensional Euclidean space has a broad range of applications. Among existing ANN algorithms, graph-based methods currently show superior performance in terms of the time-accuracy trade-off. Yet, they have performance bottlenecks due to the random me...

Full description

Saved in:
Bibliographic Details
Main Author: Gou, Yutong
Other Authors: Long Cheng
Format: Thesis-Master by Research
Language:English
Published: Nanyang Technological University 2024
Subjects:
Online Access:https://hdl.handle.net/10356/181650
Tags: Add Tag
No Tags, Be the first to tag this record!
Institution: Nanyang Technological University
Language: English
id sg-ntu-dr.10356-181650
record_format dspace
spelling sg-ntu-dr.10356-1816502024-12-12T02:55:42Z Efficient approximate nearest neighbor search on high-dimensional vectors by graph and quantization Gou, Yutong Long Cheng College of Computing and Data Science c.long@ntu.edu.sg Computer and Information Science Approximate nearest neighbor (ANN) search in high-dimensional Euclidean space has a broad range of applications. Among existing ANN algorithms, graph-based methods currently show superior performance in terms of the time-accuracy trade-off. Yet, they have performance bottlenecks due to the random memory accesses caused by the searching process on the graph index and the cost of computing exact distances between high-dimensional vectors. To relieve the aforementioned issues, a recent method named NGT-QG makes an initial attempt by integrating quantization and graph. It (1) replicates and stores the quantization codes of a vertex's neighbors compactly so that they can be accessed sequentially; and (2) uses a SIMD-based implementation named FastScan to efficiently estimate distances based on the quantization codes in batch for guiding the searching process. While NGT-QG achieves promising improvements over the vanilla graph-based methods, it has not fully unleashed the potential of integrating quantization and graph. For instance, it entails a re-ranking step to compute exact distances at the end, which introduces extra random memory accesses; its graph structure is not jointly designed considering the in-batch nature of FastScan, which causes wastes of computation in searching. In this work, we present a new method named SymphonyQG, which targets more symphonious integration of quantization and graph (e.g., it avoids an explicit re-ranking step and refines the graph structure to be more aligned with FastScan). Based on our experimental results on several real-world datasets, SymphonyQG establishes a new state of the art in terms of the query performance. Master's degree 2024-12-12T02:55:41Z 2024-12-12T02:55:41Z 2024 Thesis-Master by Research Gou, Y. (2024). Efficient approximate nearest neighbor search on high-dimensional vectors by graph and quantization. Master's thesis, Nanyang Technological University, Singapore. https://hdl.handle.net/10356/181650 https://hdl.handle.net/10356/181650 en This work is licensed under a Creative Commons Attribution-NonCommercial 4.0 International License (CC BY-NC 4.0). application/pdf Nanyang Technological University
institution Nanyang Technological University
building NTU Library
continent Asia
country Singapore
Singapore
content_provider NTU Library
collection DR-NTU
language English
topic Computer and Information Science
spellingShingle Computer and Information Science
Gou, Yutong
Efficient approximate nearest neighbor search on high-dimensional vectors by graph and quantization
description Approximate nearest neighbor (ANN) search in high-dimensional Euclidean space has a broad range of applications. Among existing ANN algorithms, graph-based methods currently show superior performance in terms of the time-accuracy trade-off. Yet, they have performance bottlenecks due to the random memory accesses caused by the searching process on the graph index and the cost of computing exact distances between high-dimensional vectors. To relieve the aforementioned issues, a recent method named NGT-QG makes an initial attempt by integrating quantization and graph. It (1) replicates and stores the quantization codes of a vertex's neighbors compactly so that they can be accessed sequentially; and (2) uses a SIMD-based implementation named FastScan to efficiently estimate distances based on the quantization codes in batch for guiding the searching process. While NGT-QG achieves promising improvements over the vanilla graph-based methods, it has not fully unleashed the potential of integrating quantization and graph. For instance, it entails a re-ranking step to compute exact distances at the end, which introduces extra random memory accesses; its graph structure is not jointly designed considering the in-batch nature of FastScan, which causes wastes of computation in searching. In this work, we present a new method named SymphonyQG, which targets more symphonious integration of quantization and graph (e.g., it avoids an explicit re-ranking step and refines the graph structure to be more aligned with FastScan). Based on our experimental results on several real-world datasets, SymphonyQG establishes a new state of the art in terms of the query performance.
author2 Long Cheng
author_facet Long Cheng
Gou, Yutong
format Thesis-Master by Research
author Gou, Yutong
author_sort Gou, Yutong
title Efficient approximate nearest neighbor search on high-dimensional vectors by graph and quantization
title_short Efficient approximate nearest neighbor search on high-dimensional vectors by graph and quantization
title_full Efficient approximate nearest neighbor search on high-dimensional vectors by graph and quantization
title_fullStr Efficient approximate nearest neighbor search on high-dimensional vectors by graph and quantization
title_full_unstemmed Efficient approximate nearest neighbor search on high-dimensional vectors by graph and quantization
title_sort efficient approximate nearest neighbor search on high-dimensional vectors by graph and quantization
publisher Nanyang Technological University
publishDate 2024
url https://hdl.handle.net/10356/181650
_version_ 1819113071875981312