Unsupervised feature selection based on principal components analysis

An important issue related to mining large data sets, both in dimension and size, is of selecting a subset of the original features. In this thesis, we describe an unsupervised feature selection algorithm suitable for data sets, large in both dimension and size. The algorithm consists of two steps—...

Full description

Saved in:
Bibliographic Details
Main Author: Fang, Ji
Other Authors: Mao, Kezhi
Format: Theses and Dissertations
Published: 2008
Subjects:
Online Access:http://hdl.handle.net/10356/4238
Tags: Add Tag
No Tags, Be the first to tag this record!
Institution: Nanyang Technological University
Description
Summary:An important issue related to mining large data sets, both in dimension and size, is of selecting a subset of the original features. In this thesis, we describe an unsupervised feature selection algorithm suitable for data sets, large in both dimension and size. The algorithm consists of two steps— Pre-selection and selection. Pre-selection is based on Procrustes Analysis, which keeps the original characters as many as possible. The second step is based on feature similarity measure, with the aim of reducing the feature redundancy.