ACCELERATION OF K-NEAREST NEIGHBOR QUERY WITH TRIANGLE INEQUALITY PRUNING ON PRODUCT QUANTIZATION
This thesis conducts research to accelerate k-Nearest Neighbor (k-NN) query execution on the Product Quantization (PQ) algorithm. Product Quantization is an approximate kNN algorithm for large and high-dimensional datasets. The main idea behind PQ is to decompose the data vectors dimension into s...
Saved in:
Main Author: | |
---|---|
Format: | Final Project |
Language: | Indonesia |
Online Access: | https://digilib.itb.ac.id/gdl/view/61872 |
Tags: |
Add Tag
No Tags, Be the first to tag this record!
|
Institution: | Institut Teknologi Bandung |
Language: | Indonesia |
Summary: | This thesis conducts research to accelerate k-Nearest Neighbor (k-NN) query
execution on the Product Quantization (PQ) algorithm. Product Quantization is an
approximate kNN algorithm for large and high-dimensional datasets. The main idea
behind PQ is to decompose the data vectors dimension into several small
subdimensions, to then be quantized separately. The quantized and encoded data
are used to approximate the distance between the query to each data point when
query answering.
The Triangle inequality theorem is used to accelerate PQ query execution by
reducing unnecessary distance calculations. The aim is to show that the query
response efficiency can be increased without reducing much accuracy of the nearest
neighbor without relying on certain types of CPU-specific instructions. As with
other methods, the key for the acceleration is SIMD instructions. The triangle
inequality (TI) pruning technique was chosen as the acceleration technique because
of its success in accelerating the k-NN brute-force algorithm while maintaining the
search accuracy and without requiring SIMD instructions. This thesis analyzes how
to develop and adapt the triangle inequality pruning technique to the Product
Quantization algorithm.
Experimental results using several datasets show that this technique successfully
accelerates query times 6-7 fold faster with little or no compromising on the search
accuracy if the appropriate parameter values are used. In addition, the triangle
inequality pruning is quite competitive to speed up query time compared to other
acceleration methods that rely on SIMD instructions in terms of query time and
search accuracy. Finally, the use of the triangle inequality pruning technique does
not limit the flexibility of the PQ configuration, and it does not have an issue with
the algorithm portability, such as the method whose key to the acceleration depends
on SIMD.
|
---|