DEVELOPMENT OF CLUSTERING MODELS FOR GROUPING NEAR DUPLICATE IMAGES

In image searches on e-commerce like Bukalapak.com, the resulting image is a near-duplicate image. To overcome this, we need a model for clustering that can group images based on image similarities. It is necessary to select the right feature extraction in doing clustering to produce a good clust...

Full description

Saved in:
Bibliographic Details
Main Author: Annas Thoriq Sumarjadi, Trian
Format: Final Project
Language:Indonesia
Online Access:https://digilib.itb.ac.id/gdl/view/50341
Tags: Add Tag
No Tags, Be the first to tag this record!
Institution: Institut Teknologi Bandung
Language: Indonesia
id id-itb.:50341
spelling id-itb.:503412020-09-23T18:20:42ZDEVELOPMENT OF CLUSTERING MODELS FOR GROUPING NEAR DUPLICATE IMAGES Annas Thoriq Sumarjadi, Trian Indonesia Final Project DBSCAN, near-duplicate, feature extraction, keypoints INSTITUT TEKNOLOGI BANDUNG https://digilib.itb.ac.id/gdl/view/50341 In image searches on e-commerce like Bukalapak.com, the resulting image is a near-duplicate image. To overcome this, we need a model for clustering that can group images based on image similarities. It is necessary to select the right feature extraction in doing clustering to produce a good cluster. This final project discusses the development of a clustering model to be a solution to these problems. The clustering model was built using the DBSCAN (Density-based Spatial Clustering of Applications with Noise) method. In clustering, a feature extraction method is needed for images. In this final project, there are 4 feature extraction methods used, namely SIFT (Scale Invariant Feature Transform), ORB (Oriented FAST and Rotated BRIEF), PCA (Principal Component Analysis) -SIFT, and SURF (Speeded Up Robust Features). Also, a web application near-duplicate retrieval image of the best clustering model has been built. The application also has an interface for adding data to the model. Not only that, this final project also discusses the modification of the DBSCAN method, the performance of each model, and the development of a near-duplicate image retrieval web application. Modifications made aim to calculate the distance between the two images seen from the number of keypoint pairs produced and the addition of a prediction function to predict clusters from the test data. Model performance is measured based on the value of purity and accuracy. The experimental results show that the two methods of feature extraction, SIFT and SURF, are successful in clustering, while ORB and PCA-SIFT fail in clustering. The performance value of the SIFT and SURF models has the same purity value of 1. Based on the accuracy value, the SIFT model has an accuracy of 0.9, slightly better than the SURF model with an accuracy of 0.8. text
institution Institut Teknologi Bandung
building Institut Teknologi Bandung Library
continent Asia
country Indonesia
Indonesia
content_provider Institut Teknologi Bandung
collection Digital ITB
language Indonesia
description In image searches on e-commerce like Bukalapak.com, the resulting image is a near-duplicate image. To overcome this, we need a model for clustering that can group images based on image similarities. It is necessary to select the right feature extraction in doing clustering to produce a good cluster. This final project discusses the development of a clustering model to be a solution to these problems. The clustering model was built using the DBSCAN (Density-based Spatial Clustering of Applications with Noise) method. In clustering, a feature extraction method is needed for images. In this final project, there are 4 feature extraction methods used, namely SIFT (Scale Invariant Feature Transform), ORB (Oriented FAST and Rotated BRIEF), PCA (Principal Component Analysis) -SIFT, and SURF (Speeded Up Robust Features). Also, a web application near-duplicate retrieval image of the best clustering model has been built. The application also has an interface for adding data to the model. Not only that, this final project also discusses the modification of the DBSCAN method, the performance of each model, and the development of a near-duplicate image retrieval web application. Modifications made aim to calculate the distance between the two images seen from the number of keypoint pairs produced and the addition of a prediction function to predict clusters from the test data. Model performance is measured based on the value of purity and accuracy. The experimental results show that the two methods of feature extraction, SIFT and SURF, are successful in clustering, while ORB and PCA-SIFT fail in clustering. The performance value of the SIFT and SURF models has the same purity value of 1. Based on the accuracy value, the SIFT model has an accuracy of 0.9, slightly better than the SURF model with an accuracy of 0.8.
format Final Project
author Annas Thoriq Sumarjadi, Trian
spellingShingle Annas Thoriq Sumarjadi, Trian
DEVELOPMENT OF CLUSTERING MODELS FOR GROUPING NEAR DUPLICATE IMAGES
author_facet Annas Thoriq Sumarjadi, Trian
author_sort Annas Thoriq Sumarjadi, Trian
title DEVELOPMENT OF CLUSTERING MODELS FOR GROUPING NEAR DUPLICATE IMAGES
title_short DEVELOPMENT OF CLUSTERING MODELS FOR GROUPING NEAR DUPLICATE IMAGES
title_full DEVELOPMENT OF CLUSTERING MODELS FOR GROUPING NEAR DUPLICATE IMAGES
title_fullStr DEVELOPMENT OF CLUSTERING MODELS FOR GROUPING NEAR DUPLICATE IMAGES
title_full_unstemmed DEVELOPMENT OF CLUSTERING MODELS FOR GROUPING NEAR DUPLICATE IMAGES
title_sort development of clustering models for grouping near duplicate images
url https://digilib.itb.ac.id/gdl/view/50341
_version_ 1822928426709811200