IMPLEMENTATION OF TOPOLOGICAL DATA ANALYSIS AND SUPPORT VECTOR MACHINE FOR MNIST DATASET CLASSIFICATION

The advancement of information technology and artificial intelligence has fostered innovation in pattern recognition, particularly on the MNIST dataset, a classic collection of handwritten digits. MNIST comprises two main components: image data X and labels y. This research focuses on exploring t...

Full description

Saved in:
Bibliographic Details
Main Author: Nilam Sari, Nur
Format: Final Project
Language:Indonesia
Online Access:https://digilib.itb.ac.id/gdl/view/83398
Tags: Add Tag
No Tags, Be the first to tag this record!
Institution: Institut Teknologi Bandung
Language: Indonesia
Description
Summary:The advancement of information technology and artificial intelligence has fostered innovation in pattern recognition, particularly on the MNIST dataset, a classic collection of handwritten digits. MNIST comprises two main components: image data X and labels y. This research focuses on exploring the application of topological data analysis concepts, specifically through persistence barcode analysis. Furthermore, the classification process employs machine learning techniques, specifically the support vector machine with a Radial Basis Function (RBF) kernel. Each digit in the MNIST dataset is represented as a 28x28 matrix, with matrix elements ranging from 1 to 255. The preprocessing steps include converting grayscale matrices to binary, skeletonization using the Zhang-Suen thinning method, forming embedded graphs, determining filtration values, and constructing persistence barcodes. Features are extracted from the persistence barcodes using the Adcock-Carlsson Coordinates method. To enhance accuracy, each image in the MNIST dataset undergoes four rotations (north, south, west, east), resulting in 32 extracted features per image. These features serve as inputs for the classification algorithm. The MNIST dataset is divided into training data (80% 56,000 samples) and test data (20% 14,000 samples). The chosen parameters include a gamma value of 0.006551285568595509 and a C value of 138.94954943731375. Through these processes, the achieved accuracy on the test data reaches 70%