DEVELOPMENT OF CLUSTERING MODELS FOR GROUPING NEAR DUPLICATE IMAGES
In image searches on e-commerce like Bukalapak.com, the resulting image is a near-duplicate image. To overcome this, we need a model for clustering that can group images based on image similarities. It is necessary to select the right feature extraction in doing clustering to produce a good clust...
Saved in:
Main Author: | |
---|---|
Format: | Final Project |
Language: | Indonesia |
Online Access: | https://digilib.itb.ac.id/gdl/view/50341 |
Tags: |
Add Tag
No Tags, Be the first to tag this record!
|
Institution: | Institut Teknologi Bandung |
Language: | Indonesia |
id |
id-itb.:50341 |
---|---|
spelling |
id-itb.:503412020-09-23T18:20:42ZDEVELOPMENT OF CLUSTERING MODELS FOR GROUPING NEAR DUPLICATE IMAGES Annas Thoriq Sumarjadi, Trian Indonesia Final Project DBSCAN, near-duplicate, feature extraction, keypoints INSTITUT TEKNOLOGI BANDUNG https://digilib.itb.ac.id/gdl/view/50341 In image searches on e-commerce like Bukalapak.com, the resulting image is a near-duplicate image. To overcome this, we need a model for clustering that can group images based on image similarities. It is necessary to select the right feature extraction in doing clustering to produce a good cluster. This final project discusses the development of a clustering model to be a solution to these problems. The clustering model was built using the DBSCAN (Density-based Spatial Clustering of Applications with Noise) method. In clustering, a feature extraction method is needed for images. In this final project, there are 4 feature extraction methods used, namely SIFT (Scale Invariant Feature Transform), ORB (Oriented FAST and Rotated BRIEF), PCA (Principal Component Analysis) -SIFT, and SURF (Speeded Up Robust Features). Also, a web application near-duplicate retrieval image of the best clustering model has been built. The application also has an interface for adding data to the model. Not only that, this final project also discusses the modification of the DBSCAN method, the performance of each model, and the development of a near-duplicate image retrieval web application. Modifications made aim to calculate the distance between the two images seen from the number of keypoint pairs produced and the addition of a prediction function to predict clusters from the test data. Model performance is measured based on the value of purity and accuracy. The experimental results show that the two methods of feature extraction, SIFT and SURF, are successful in clustering, while ORB and PCA-SIFT fail in clustering. The performance value of the SIFT and SURF models has the same purity value of 1. Based on the accuracy value, the SIFT model has an accuracy of 0.9, slightly better than the SURF model with an accuracy of 0.8. text |
institution |
Institut Teknologi Bandung |
building |
Institut Teknologi Bandung Library |
continent |
Asia |
country |
Indonesia Indonesia |
content_provider |
Institut Teknologi Bandung |
collection |
Digital ITB |
language |
Indonesia |
description |
In image searches on e-commerce like Bukalapak.com, the resulting image is a near-duplicate
image. To overcome this, we need a model for clustering that can group images based on image
similarities. It is necessary to select the right feature extraction in doing clustering to produce a
good cluster. This final project discusses the development of a clustering model to be a solution to
these problems.
The clustering model was built using the DBSCAN (Density-based Spatial Clustering of
Applications with Noise) method. In clustering, a feature extraction method is needed for images.
In this final project, there are 4 feature extraction methods used, namely SIFT (Scale Invariant
Feature Transform), ORB (Oriented FAST and Rotated BRIEF), PCA (Principal Component
Analysis) -SIFT, and SURF (Speeded Up Robust Features). Also, a web application near-duplicate
retrieval image of the best clustering model has been built. The application also has an interface
for adding data to the model. Not only that, this final project also discusses the modification of the
DBSCAN method, the performance of each model, and the development of a near-duplicate image
retrieval web application.
Modifications made aim to calculate the distance between the two images seen from the number
of keypoint pairs produced and the addition of a prediction function to predict clusters from the
test data. Model performance is measured based on the value of purity and accuracy.
The experimental results show that the two methods of feature extraction, SIFT and SURF, are
successful in clustering, while ORB and PCA-SIFT fail in clustering. The performance value of
the SIFT and SURF models has the same purity value of 1. Based on the accuracy value, the SIFT
model has an accuracy of 0.9, slightly better than the SURF model with an accuracy of 0.8.
|
format |
Final Project |
author |
Annas Thoriq Sumarjadi, Trian |
spellingShingle |
Annas Thoriq Sumarjadi, Trian DEVELOPMENT OF CLUSTERING MODELS FOR GROUPING NEAR DUPLICATE IMAGES |
author_facet |
Annas Thoriq Sumarjadi, Trian |
author_sort |
Annas Thoriq Sumarjadi, Trian |
title |
DEVELOPMENT OF CLUSTERING MODELS FOR GROUPING NEAR DUPLICATE IMAGES |
title_short |
DEVELOPMENT OF CLUSTERING MODELS FOR GROUPING NEAR DUPLICATE IMAGES |
title_full |
DEVELOPMENT OF CLUSTERING MODELS FOR GROUPING NEAR DUPLICATE IMAGES |
title_fullStr |
DEVELOPMENT OF CLUSTERING MODELS FOR GROUPING NEAR DUPLICATE IMAGES |
title_full_unstemmed |
DEVELOPMENT OF CLUSTERING MODELS FOR GROUPING NEAR DUPLICATE IMAGES |
title_sort |
development of clustering models for grouping near duplicate images |
url |
https://digilib.itb.ac.id/gdl/view/50341 |
_version_ |
1822928426709811200 |