ACCELERATION OF K-NEAREST NEIGHBOR QUERY WITH TRIANGLE INEQUALITY PRUNING ON PRODUCT QUANTIZATION

This thesis conducts research to accelerate k-Nearest Neighbor (k-NN) query execution on the Product Quantization (PQ) algorithm. Product Quantization is an approximate kNN algorithm for large and high-dimensional datasets. The main idea behind PQ is to decompose the data vectors dimension into s...

Full description

Saved in:
Bibliographic Details
Main Author: Edian, Ikraduya
Format: Final Project
Language:Indonesia
Online Access:https://digilib.itb.ac.id/gdl/view/61872
Tags: Add Tag
No Tags, Be the first to tag this record!
Institution: Institut Teknologi Bandung
Language: Indonesia
Description
Summary:This thesis conducts research to accelerate k-Nearest Neighbor (k-NN) query execution on the Product Quantization (PQ) algorithm. Product Quantization is an approximate kNN algorithm for large and high-dimensional datasets. The main idea behind PQ is to decompose the data vectors dimension into several small subdimensions, to then be quantized separately. The quantized and encoded data are used to approximate the distance between the query to each data point when query answering. The Triangle inequality theorem is used to accelerate PQ query execution by reducing unnecessary distance calculations. The aim is to show that the query response efficiency can be increased without reducing much accuracy of the nearest neighbor without relying on certain types of CPU-specific instructions. As with other methods, the key for the acceleration is SIMD instructions. The triangle inequality (TI) pruning technique was chosen as the acceleration technique because of its success in accelerating the k-NN brute-force algorithm while maintaining the search accuracy and without requiring SIMD instructions. This thesis analyzes how to develop and adapt the triangle inequality pruning technique to the Product Quantization algorithm. Experimental results using several datasets show that this technique successfully accelerates query times 6-7 fold faster with little or no compromising on the search accuracy if the appropriate parameter values are used. In addition, the triangle inequality pruning is quite competitive to speed up query time compared to other acceleration methods that rely on SIMD instructions in terms of query time and search accuracy. Finally, the use of the triangle inequality pruning technique does not limit the flexibility of the PQ configuration, and it does not have an issue with the algorithm portability, such as the method whose key to the acceleration depends on SIMD.