DEVELOPMENT OF CLUSTERING MODELS FOR GROUPING NEAR DUPLICATE IMAGES

In image searches on e-commerce like Bukalapak.com, the resulting image is a near-duplicate image. To overcome this, we need a model for clustering that can group images based on image similarities. It is necessary to select the right feature extraction in doing clustering to produce a good clust...

Full description

Saved in:

Bibliographic Details
Main Author:	Annas Thoriq Sumarjadi, Trian
Format:	Final Project
Language:	Indonesia
Online Access:	https://digilib.itb.ac.id/gdl/view/50341
Tags:	Add Tag No Tags, Be the first to tag this record!
Institution:	Institut Teknologi Bandung
Language:	Indonesia

id	id-itb.:50341
spelling	id-itb.:503412020-09-23T18:20:42ZDEVELOPMENT OF CLUSTERING MODELS FOR GROUPING NEAR DUPLICATE IMAGES Annas Thoriq Sumarjadi, Trian Indonesia Final Project DBSCAN, near-duplicate, feature extraction, keypoints INSTITUT TEKNOLOGI BANDUNG https://digilib.itb.ac.id/gdl/view/50341 In image searches on e-commerce like Bukalapak.com, the resulting image is a near-duplicate image. To overcome this, we need a model for clustering that can group images based on image similarities. It is necessary to select the right feature extraction in doing clustering to produce a good cluster. This final project discusses the development of a clustering model to be a solution to these problems. The clustering model was built using the DBSCAN (Density-based Spatial Clustering of Applications with Noise) method. In clustering, a feature extraction method is needed for images. In this final project, there are 4 feature extraction methods used, namely SIFT (Scale Invariant Feature Transform), ORB (Oriented FAST and Rotated BRIEF), PCA (Principal Component Analysis) -SIFT, and SURF (Speeded Up Robust Features). Also, a web application near-duplicate retrieval image of the best clustering model has been built. The application also has an interface for adding data to the model. Not only that, this final project also discusses the modification of the DBSCAN method, the performance of each model, and the development of a near-duplicate image retrieval web application. Modifications made aim to calculate the distance between the two images seen from the number of keypoint pairs produced and the addition of a prediction function to predict clusters from the test data. Model performance is measured based on the value of purity and accuracy. The experimental results show that the two methods of feature extraction, SIFT and SURF, are successful in clustering, while ORB and PCA-SIFT fail in clustering. The performance value of the SIFT and SURF models has the same purity value of 1. Based on the accuracy value, the SIFT model has an accuracy of 0.9, slightly better than the SURF model with an accuracy of 0.8. text
institution	Institut Teknologi Bandung
building	Institut Teknologi Bandung Library
continent	Asia
country	Indonesia Indonesia
content_provider	Institut Teknologi Bandung
collection	Digital ITB
language	Indonesia
description	In image searches on e-commerce like Bukalapak.com, the resulting image is a near-duplicate image. To overcome this, we need a model for clustering that can group images based on image similarities. It is necessary to select the right feature extraction in doing clustering to produce a good cluster. This final project discusses the development of a clustering model to be a solution to these problems. The clustering model was built using the DBSCAN (Density-based Spatial Clustering of Applications with Noise) method. In clustering, a feature extraction method is needed for images. In this final project, there are 4 feature extraction methods used, namely SIFT (Scale Invariant Feature Transform), ORB (Oriented FAST and Rotated BRIEF), PCA (Principal Component Analysis) -SIFT, and SURF (Speeded Up Robust Features). Also, a web application near-duplicate retrieval image of the best clustering model has been built. The application also has an interface for adding data to the model. Not only that, this final project also discusses the modification of the DBSCAN method, the performance of each model, and the development of a near-duplicate image retrieval web application. Modifications made aim to calculate the distance between the two images seen from the number of keypoint pairs produced and the addition of a prediction function to predict clusters from the test data. Model performance is measured based on the value of purity and accuracy. The experimental results show that the two methods of feature extraction, SIFT and SURF, are successful in clustering, while ORB and PCA-SIFT fail in clustering. The performance value of the SIFT and SURF models has the same purity value of 1. Based on the accuracy value, the SIFT model has an accuracy of 0.9, slightly better than the SURF model with an accuracy of 0.8.
format	Final Project
author	Annas Thoriq Sumarjadi, Trian
spellingShingle	Annas Thoriq Sumarjadi, Trian DEVELOPMENT OF CLUSTERING MODELS FOR GROUPING NEAR DUPLICATE IMAGES
author_facet	Annas Thoriq Sumarjadi, Trian
author_sort	Annas Thoriq Sumarjadi, Trian
title	DEVELOPMENT OF CLUSTERING MODELS FOR GROUPING NEAR DUPLICATE IMAGES
title_short	DEVELOPMENT OF CLUSTERING MODELS FOR GROUPING NEAR DUPLICATE IMAGES
title_full	DEVELOPMENT OF CLUSTERING MODELS FOR GROUPING NEAR DUPLICATE IMAGES
title_fullStr	DEVELOPMENT OF CLUSTERING MODELS FOR GROUPING NEAR DUPLICATE IMAGES
title_full_unstemmed	DEVELOPMENT OF CLUSTERING MODELS FOR GROUPING NEAR DUPLICATE IMAGES
title_sort	development of clustering models for grouping near duplicate images
url	https://digilib.itb.ac.id/gdl/view/50341
_version_	1822928426709811200

DEVELOPMENT OF CLUSTERING MODELS FOR GROUPING NEAR DUPLICATE IMAGES

Similar Items