Classification with large datasets

Big data Classification is a relevant modern-day problem and various techniques are being researched on to find optimal solutions to this problem. The sheer size of data nowadays renders many state-of-the-art classification algorithms reduntant as training the classifiers on very large datasets degr...

Full description

Saved in:

Bibliographic Details
Main Author:	Souryadeep Sen
Other Authors:	Ponnuthurai N. Suganthan
Format:	Theses and Dissertations
Language:	English
Published:	2018
Subjects:	DRNTU::Engineering::Electrical and electronic engineering
Online Access:	http://hdl.handle.net/10356/76017
Tags:	Add Tag No Tags, Be the first to tag this record!
Institution:	Nanyang Technological University
Language:	English

id	sg-ntu-dr.10356-76017
record_format	dspace
spelling	sg-ntu-dr.10356-760172023-07-04T15:56:10Z Classification with large datasets Souryadeep Sen Ponnuthurai N. Suganthan School of Electrical and Electronic Engineering DRNTU::Engineering::Electrical and electronic engineering Big data Classification is a relevant modern-day problem and various techniques are being researched on to find optimal solutions to this problem. The sheer size of data nowadays renders many state-of-the-art classification algorithms reduntant as training the classifiers on very large datasets degrades time complexity and may also affect the accuracy achieved. Random Vector Functional Link (RVFL) networks is one such base classifier which does not perform well on very large datasets. Various techniques such as dimensionality reduction and divide and conquer can be used to overcome this problem of compromised performance. In this dissertation we experiment extensively with dimension reduction techniques such as Random Projection and Principal Component Analysis before applying the classification algorithm to the dataset. Intuitively, applying dimension reduction techniques to the datasets would lead to a loss in accuracy of classification, compared to when the entire dataset was used, as information is lost. However, this is traded of with improved time complexity. The accuracy of the classifiers on reduced datasets is attempted to be improved by using ensemble variants of base classifers. The accuracies are dataset dependant, but, accuracies achieved using these ensemble variants on reduced datasets are highly competitive with state-of-the-art results of classification on these datasets (these state-of-the-art results do not use reduction techniques). The time complexity with respect to the state-of-the-art result classification methods is considerably improved in our case as dimensionality reduction techniques have been employed. The results are highly promising as time complexity is improved and competitive accuracies are achieved even with reduced dimesions. Master of Science (Electronics) 2018-09-18T05:28:02Z 2018-09-18T05:28:02Z 2018 Thesis http://hdl.handle.net/10356/76017 en 69 p. application/pdf
institution	Nanyang Technological University
building	NTU Library
continent	Asia
country	Singapore Singapore
content_provider	NTU Library
collection	DR-NTU
language	English
topic	DRNTU::Engineering::Electrical and electronic engineering
spellingShingle	DRNTU::Engineering::Electrical and electronic engineering Souryadeep Sen Classification with large datasets
description	Big data Classification is a relevant modern-day problem and various techniques are being researched on to find optimal solutions to this problem. The sheer size of data nowadays renders many state-of-the-art classification algorithms reduntant as training the classifiers on very large datasets degrades time complexity and may also affect the accuracy achieved. Random Vector Functional Link (RVFL) networks is one such base classifier which does not perform well on very large datasets. Various techniques such as dimensionality reduction and divide and conquer can be used to overcome this problem of compromised performance. In this dissertation we experiment extensively with dimension reduction techniques such as Random Projection and Principal Component Analysis before applying the classification algorithm to the dataset. Intuitively, applying dimension reduction techniques to the datasets would lead to a loss in accuracy of classification, compared to when the entire dataset was used, as information is lost. However, this is traded of with improved time complexity. The accuracy of the classifiers on reduced datasets is attempted to be improved by using ensemble variants of base classifers. The accuracies are dataset dependant, but, accuracies achieved using these ensemble variants on reduced datasets are highly competitive with state-of-the-art results of classification on these datasets (these state-of-the-art results do not use reduction techniques). The time complexity with respect to the state-of-the-art result classification methods is considerably improved in our case as dimensionality reduction techniques have been employed. The results are highly promising as time complexity is improved and competitive accuracies are achieved even with reduced dimesions.
author2	Ponnuthurai N. Suganthan
author_facet	Ponnuthurai N. Suganthan Souryadeep Sen
format	Theses and Dissertations
author	Souryadeep Sen
author_sort	Souryadeep Sen
title	Classification with large datasets
title_short	Classification with large datasets
title_full	Classification with large datasets
title_fullStr	Classification with large datasets
title_full_unstemmed	Classification with large datasets
title_sort	classification with large datasets
publishDate	2018
url	http://hdl.handle.net/10356/76017
_version_	1772828405778087936

Classification with large datasets

Similar Items