Challenging issues in classification problems : sparisty control, key instance detection, and imbalanced data

This thesis deals with the difficulties in classification problems caused by three types of sparsity characteristics - feature, label, and instance sparsity. First, feature spar- sity is usually used as prior knowledge by inducing parameter sparsity of the learned model. We show that only an appropr...

Full description

Saved in:
Bibliographic Details
Main Author: Liu, Guoqing.
Other Authors: School of Computer Engineering
Format: Theses and Dissertations
Language:English
Published: 2013
Subjects:
Online Access:http://hdl.handle.net/10356/52422
Tags: Add Tag
No Tags, Be the first to tag this record!
Institution: Nanyang Technological University
Language: English
id sg-ntu-dr.10356-52422
record_format dspace
spelling sg-ntu-dr.10356-524222023-03-04T00:34:10Z Challenging issues in classification problems : sparisty control, key instance detection, and imbalanced data Liu, Guoqing. School of Computer Engineering Centre for Computational Intelligence Wu Jianxin DRNTU::Engineering::Computer science and engineering::Computer applications::Computers in other systems This thesis deals with the difficulties in classification problems caused by three types of sparsity characteristics - feature, label, and instance sparsity. First, feature spar- sity is usually used as prior knowledge by inducing parameter sparsity of the learned model. We show that only an appropriate degree of parameter sparsity is beneficial, and both over-sparsity and under-sparsity are harmful for classification. Second, label sparsity means that only a fraction of training instances are labeled, which causes fail- ure of classic classification methods in these cases. Third, instance sparsity is caused by imbalanced composition of different categories, and instances from one category significantly outnumber the ones from the other. This always makes the classification boundary biased towards the majority category. Consequently, three contributions - sparsity control, key instance detection, and imbal- anced classification - are presented to address these challenges. Sparsity control aims to regularize the sparsity of model parameter at an appropriate level according to the intrinsic feature sparsity in data. It is proposed based on the ob- servation that this sparsity is not always desirable in real problems, and only a proper de- gree of sparsity is beneficial. To address this issue, we propose a novel probit classifier using generalized Gaussian scale mixture (GGSM) priors that can adjust the induced sparsity by tuning the shape parameter of GGSM, and consequently provide either a sparse or non-sparse solution based on the intrinsic feature sparsity. Model learning is carried out by an efficient modified maximum a posteriori estimation. We show rela- tionships of the proposed approach to the previous methods. We also study different types of likelihood working with the GGSM priors in a kernel-based setup, based on which an improved kernel-based approach is presented. Experiments demonstrate that the proposed method has better or comparable performance in both linear and non-linear classification. Doctor of Philosophy (SCE) 2013-05-07T04:57:46Z 2013-05-07T04:57:46Z 2013 2013 Thesis http://hdl.handle.net/10356/52422 en 120 p. application/pdf
institution Nanyang Technological University
building NTU Library
continent Asia
country Singapore
Singapore
content_provider NTU Library
collection DR-NTU
language English
topic DRNTU::Engineering::Computer science and engineering::Computer applications::Computers in other systems
spellingShingle DRNTU::Engineering::Computer science and engineering::Computer applications::Computers in other systems
Liu, Guoqing.
Challenging issues in classification problems : sparisty control, key instance detection, and imbalanced data
description This thesis deals with the difficulties in classification problems caused by three types of sparsity characteristics - feature, label, and instance sparsity. First, feature spar- sity is usually used as prior knowledge by inducing parameter sparsity of the learned model. We show that only an appropriate degree of parameter sparsity is beneficial, and both over-sparsity and under-sparsity are harmful for classification. Second, label sparsity means that only a fraction of training instances are labeled, which causes fail- ure of classic classification methods in these cases. Third, instance sparsity is caused by imbalanced composition of different categories, and instances from one category significantly outnumber the ones from the other. This always makes the classification boundary biased towards the majority category. Consequently, three contributions - sparsity control, key instance detection, and imbal- anced classification - are presented to address these challenges. Sparsity control aims to regularize the sparsity of model parameter at an appropriate level according to the intrinsic feature sparsity in data. It is proposed based on the ob- servation that this sparsity is not always desirable in real problems, and only a proper de- gree of sparsity is beneficial. To address this issue, we propose a novel probit classifier using generalized Gaussian scale mixture (GGSM) priors that can adjust the induced sparsity by tuning the shape parameter of GGSM, and consequently provide either a sparse or non-sparse solution based on the intrinsic feature sparsity. Model learning is carried out by an efficient modified maximum a posteriori estimation. We show rela- tionships of the proposed approach to the previous methods. We also study different types of likelihood working with the GGSM priors in a kernel-based setup, based on which an improved kernel-based approach is presented. Experiments demonstrate that the proposed method has better or comparable performance in both linear and non-linear classification.
author2 School of Computer Engineering
author_facet School of Computer Engineering
Liu, Guoqing.
format Theses and Dissertations
author Liu, Guoqing.
author_sort Liu, Guoqing.
title Challenging issues in classification problems : sparisty control, key instance detection, and imbalanced data
title_short Challenging issues in classification problems : sparisty control, key instance detection, and imbalanced data
title_full Challenging issues in classification problems : sparisty control, key instance detection, and imbalanced data
title_fullStr Challenging issues in classification problems : sparisty control, key instance detection, and imbalanced data
title_full_unstemmed Challenging issues in classification problems : sparisty control, key instance detection, and imbalanced data
title_sort challenging issues in classification problems : sparisty control, key instance detection, and imbalanced data
publishDate 2013
url http://hdl.handle.net/10356/52422
_version_ 1759856230519537664