Discover underlying concepts from real data

With advances of digital technology and signal acquisition tools, data in various forms have been generated and exchanged at an explosive rate. This creates tremendous needs and good opportunities for developing techniques that can systematically and timely discover the underlying concepts from larg...

Full description

Saved in:

Bibliographic Details
Main Author:	Patwardhan, Shree Balwant.
Other Authors:	Cao Hong
Format:	Final Year Project
Language:	English
Published:	2012
Subjects:	DRNTU::Engineering::Electrical and electronic engineering::Computer hardware, software and systems
Online Access:	http://hdl.handle.net/10356/49459
Tags:	Add Tag No Tags, Be the first to tag this record!
Institution:	Nanyang Technological University
Language:	English

id	sg-ntu-dr.10356-49459
record_format	dspace
spelling	sg-ntu-dr.10356-494592023-07-07T15:53:46Z Discover underlying concepts from real data Patwardhan, Shree Balwant. Cao Hong Chen Lihui School of Electrical and Electronic Engineering A*STAR Institute for Infocomm Research DRNTU::Engineering::Electrical and electronic engineering::Computer hardware, software and systems With advances of digital technology and signal acquisition tools, data in various forms have been generated and exchanged at an explosive rate. This creates tremendous needs and good opportunities for developing techniques that can systematically and timely discover the underlying concepts from large amounts of real data in an effective manner. Since real data are often unevenly distributed with both majority concepts (concept with large amount of data) and minority concepts present, this adds another dimension of challenge for comprehensive data mining and learning since both the majority and the minority concepts could carry equal importance in practice. This poses a problem in data mining and machine learning. The fundamental problem of using imbalanced datasets with most existing, standard machine learning algorithms is the significantly compromised performance of these algorithms. Existing algorithms have been designed assuming balanced data sets as input. When confronted with such imbalanced datasets as alluded to previously, there is significant degradation of performance. Therefore, it is crucial that the imbalance in datasets be corrected in order to ensure the efficacy of existing algorithms in learning from such datasets. This project aims to develop a RapidMiner tool to correct the ‘class imbalance’ problem in machine learning using the Structure Preserving Oversampling (SPO) algorithm. In the popular RapidMiner platform, this tool can be used to synthetically generate samples belonging to the minority class, in order to create a balanced dataset for learning algorithms. The algorithm is implemented as a module in a commercially available machine learning environment known as RapidMiner. Bachelor of Engineering 2012-05-18T08:26:16Z 2012-05-18T08:26:16Z 2012 2012 Final Year Project (FYP) http://hdl.handle.net/10356/49459 en Nanyang Technological University 55 p. application/pdf
institution	Nanyang Technological University
building	NTU Library
continent	Asia
country	Singapore Singapore
content_provider	NTU Library
collection	DR-NTU
language	English
topic	DRNTU::Engineering::Electrical and electronic engineering::Computer hardware, software and systems
spellingShingle	DRNTU::Engineering::Electrical and electronic engineering::Computer hardware, software and systems Patwardhan, Shree Balwant. Discover underlying concepts from real data
description	With advances of digital technology and signal acquisition tools, data in various forms have been generated and exchanged at an explosive rate. This creates tremendous needs and good opportunities for developing techniques that can systematically and timely discover the underlying concepts from large amounts of real data in an effective manner. Since real data are often unevenly distributed with both majority concepts (concept with large amount of data) and minority concepts present, this adds another dimension of challenge for comprehensive data mining and learning since both the majority and the minority concepts could carry equal importance in practice. This poses a problem in data mining and machine learning. The fundamental problem of using imbalanced datasets with most existing, standard machine learning algorithms is the significantly compromised performance of these algorithms. Existing algorithms have been designed assuming balanced data sets as input. When confronted with such imbalanced datasets as alluded to previously, there is significant degradation of performance. Therefore, it is crucial that the imbalance in datasets be corrected in order to ensure the efficacy of existing algorithms in learning from such datasets. This project aims to develop a RapidMiner tool to correct the ‘class imbalance’ problem in machine learning using the Structure Preserving Oversampling (SPO) algorithm. In the popular RapidMiner platform, this tool can be used to synthetically generate samples belonging to the minority class, in order to create a balanced dataset for learning algorithms. The algorithm is implemented as a module in a commercially available machine learning environment known as RapidMiner.
author2	Cao Hong
author_facet	Cao Hong Patwardhan, Shree Balwant.
format	Final Year Project
author	Patwardhan, Shree Balwant.
author_sort	Patwardhan, Shree Balwant.
title	Discover underlying concepts from real data
title_short	Discover underlying concepts from real data
title_full	Discover underlying concepts from real data
title_fullStr	Discover underlying concepts from real data
title_full_unstemmed	Discover underlying concepts from real data
title_sort	discover underlying concepts from real data
publishDate	2012
url	http://hdl.handle.net/10356/49459
_version_	1772826730825777152

Discover underlying concepts from real data

Similar Items