Challenging issues in classification problems : sparisty control, key instance detection, and imbalanced data
This thesis deals with the difficulties in classification problems caused by three types of sparsity characteristics - feature, label, and instance sparsity. First, feature spar- sity is usually used as prior knowledge by inducing parameter sparsity of the learned model. We show that only an appropr...
Saved in:
Main Author: | |
---|---|
Other Authors: | |
Format: | Theses and Dissertations |
Language: | English |
Published: |
2013
|
Subjects: | |
Online Access: | http://hdl.handle.net/10356/52422 |
Tags: |
Add Tag
No Tags, Be the first to tag this record!
|
Institution: | Nanyang Technological University |
Language: | English |
id |
sg-ntu-dr.10356-52422 |
---|---|
record_format |
dspace |
spelling |
sg-ntu-dr.10356-524222023-03-04T00:34:10Z Challenging issues in classification problems : sparisty control, key instance detection, and imbalanced data Liu, Guoqing. School of Computer Engineering Centre for Computational Intelligence Wu Jianxin DRNTU::Engineering::Computer science and engineering::Computer applications::Computers in other systems This thesis deals with the difficulties in classification problems caused by three types of sparsity characteristics - feature, label, and instance sparsity. First, feature spar- sity is usually used as prior knowledge by inducing parameter sparsity of the learned model. We show that only an appropriate degree of parameter sparsity is beneficial, and both over-sparsity and under-sparsity are harmful for classification. Second, label sparsity means that only a fraction of training instances are labeled, which causes fail- ure of classic classification methods in these cases. Third, instance sparsity is caused by imbalanced composition of different categories, and instances from one category significantly outnumber the ones from the other. This always makes the classification boundary biased towards the majority category. Consequently, three contributions - sparsity control, key instance detection, and imbal- anced classification - are presented to address these challenges. Sparsity control aims to regularize the sparsity of model parameter at an appropriate level according to the intrinsic feature sparsity in data. It is proposed based on the ob- servation that this sparsity is not always desirable in real problems, and only a proper de- gree of sparsity is beneficial. To address this issue, we propose a novel probit classifier using generalized Gaussian scale mixture (GGSM) priors that can adjust the induced sparsity by tuning the shape parameter of GGSM, and consequently provide either a sparse or non-sparse solution based on the intrinsic feature sparsity. Model learning is carried out by an efficient modified maximum a posteriori estimation. We show rela- tionships of the proposed approach to the previous methods. We also study different types of likelihood working with the GGSM priors in a kernel-based setup, based on which an improved kernel-based approach is presented. Experiments demonstrate that the proposed method has better or comparable performance in both linear and non-linear classification. Doctor of Philosophy (SCE) 2013-05-07T04:57:46Z 2013-05-07T04:57:46Z 2013 2013 Thesis http://hdl.handle.net/10356/52422 en 120 p. application/pdf |
institution |
Nanyang Technological University |
building |
NTU Library |
continent |
Asia |
country |
Singapore Singapore |
content_provider |
NTU Library |
collection |
DR-NTU |
language |
English |
topic |
DRNTU::Engineering::Computer science and engineering::Computer applications::Computers in other systems |
spellingShingle |
DRNTU::Engineering::Computer science and engineering::Computer applications::Computers in other systems Liu, Guoqing. Challenging issues in classification problems : sparisty control, key instance detection, and imbalanced data |
description |
This thesis deals with the difficulties in classification problems caused by three types of sparsity characteristics - feature, label, and instance sparsity. First, feature spar- sity is usually used as prior knowledge by inducing parameter sparsity of the learned model. We show that only an appropriate degree of parameter sparsity is beneficial, and both over-sparsity and under-sparsity are harmful for classification. Second, label sparsity means that only a fraction of training instances are labeled, which causes fail- ure of classic classification methods in these cases. Third, instance sparsity is caused by imbalanced composition of different categories, and instances from one category significantly outnumber the ones from the other. This always makes the classification boundary biased towards the majority category. Consequently, three contributions - sparsity control, key instance detection, and imbal- anced classification - are presented to address these challenges. Sparsity control aims to regularize the sparsity of model parameter at an appropriate level according to the intrinsic feature sparsity in data. It is proposed based on the ob- servation that this sparsity is not always desirable in real problems, and only a proper de- gree of sparsity is beneficial. To address this issue, we propose a novel probit classifier using generalized Gaussian scale mixture (GGSM) priors that can adjust the induced sparsity by tuning the shape parameter of GGSM, and consequently provide either a sparse or non-sparse solution based on the intrinsic feature sparsity. Model learning is carried out by an efficient modified maximum a posteriori estimation. We show rela- tionships of the proposed approach to the previous methods. We also study different types of likelihood working with the GGSM priors in a kernel-based setup, based on which an improved kernel-based approach is presented. Experiments demonstrate that the proposed method has better or comparable performance in both linear and non-linear classification. |
author2 |
School of Computer Engineering |
author_facet |
School of Computer Engineering Liu, Guoqing. |
format |
Theses and Dissertations |
author |
Liu, Guoqing. |
author_sort |
Liu, Guoqing. |
title |
Challenging issues in classification problems : sparisty control, key instance detection, and imbalanced data |
title_short |
Challenging issues in classification problems : sparisty control, key instance detection, and imbalanced data |
title_full |
Challenging issues in classification problems : sparisty control, key instance detection, and imbalanced data |
title_fullStr |
Challenging issues in classification problems : sparisty control, key instance detection, and imbalanced data |
title_full_unstemmed |
Challenging issues in classification problems : sparisty control, key instance detection, and imbalanced data |
title_sort |
challenging issues in classification problems : sparisty control, key instance detection, and imbalanced data |
publishDate |
2013 |
url |
http://hdl.handle.net/10356/52422 |
_version_ |
1759856230519537664 |