IMPLEMENTATION OF TOPOLOGICAL DATA ANALYSIS AND SUPPORT VECTOR MACHINE FOR MNIST DATASET CLASSIFICATION
The advancement of information technology and artificial intelligence has fostered innovation in pattern recognition, particularly on the MNIST dataset, a classic collection of handwritten digits. MNIST comprises two main components: image data X and labels y. This research focuses on exploring t...
Saved in:
Main Author: | |
---|---|
Format: | Final Project |
Language: | Indonesia |
Online Access: | https://digilib.itb.ac.id/gdl/view/83398 |
Tags: |
Add Tag
No Tags, Be the first to tag this record!
|
Institution: | Institut Teknologi Bandung |
Language: | Indonesia |
Summary: | The advancement of information technology and artificial intelligence has fostered innovation
in pattern recognition, particularly on the MNIST dataset, a classic collection
of handwritten digits. MNIST comprises two main components: image data X and labels
y. This research focuses on exploring the application of topological data analysis
concepts, specifically through persistence barcode analysis. Furthermore, the classification
process employs machine learning techniques, specifically the support vector
machine with a Radial Basis Function (RBF) kernel. Each digit in the MNIST dataset
is represented as a 28x28 matrix, with matrix elements ranging from 1 to 255.
The preprocessing steps include converting grayscale matrices to binary, skeletonization
using the Zhang-Suen thinning method, forming embedded graphs, determining
filtration values, and constructing persistence barcodes. Features are extracted from
the persistence barcodes using the Adcock-Carlsson Coordinates method. To enhance
accuracy, each image in the MNIST dataset undergoes four rotations (north, south,
west, east), resulting in 32 extracted features per image. These features serve as inputs
for the classification algorithm. The MNIST dataset is divided into training data (80%
56,000 samples) and test data (20% 14,000 samples). The chosen parameters include a
gamma value of 0.006551285568595509 and a C value of 138.94954943731375. Through
these processes, the achieved accuracy on the test data reaches 70% |
---|