Towards efficient large-scale learning by exploiting sparsity

The last decade has witnessed explosive growth in data. The ultrahigh-dimensional and large volume data have brought many critical issues, such as the storage disaster, the scalability issues for data analysis, and so on. To enable efficient and effective big data analysis, this thesis exploits the...

Full description

Saved in:

Bibliographic Details
Main Author:	Tan, Ming Kui
Other Authors:	School of Computer Engineering
Format:	Theses and Dissertations
Language:	English
Published:	2014
Subjects:	DRNTU::Engineering::Computer science and engineering::Computing methodologies::Artificial intelligence
Online Access:	https://hdl.handle.net/10356/61881
Tags:	Add Tag No Tags, Be the first to tag this record!
Institution:	Nanyang Technological University
Language:	English

id	sg-ntu-dr.10356-61881
record_format	dspace
spelling	sg-ntu-dr.10356-618812023-03-04T00:42:22Z Towards efficient large-scale learning by exploiting sparsity Tan, Ming Kui School of Computer Engineering Centre for Computational Intelligence Ivor W. Tsang DRNTU::Engineering::Computer science and engineering::Computing methodologies::Artificial intelligence The last decade has witnessed explosive growth in data. The ultrahigh-dimensional and large volume data have brought many critical issues, such as the storage disaster, the scalability issues for data analysis, and so on. To enable efficient and effective big data analysis, this thesis exploits the sparsity constraints of learning tasks and investigates large-scale learning in three directions, namely feature selection for classification tasks, sparse recovery for signal processing, and matrix recovery problem. %Focusing on the scalability challenges A {Feature Generating Machine} (FGM) is proposed to address the large-scale and ultrahigh-dimensional feature selection for classification tasks (e.g. $O(10^{12})$ features). Unlike traditional gradient-based approaches that conduct optimization on all features, FGM iteratively activates a group of features, and solves a sequence of subproblems w.r.t. the activated features only. As a result, it effectively avoids the storage disaster, and scales well on \emph{big data}. %FGM also tackles two challenging tasks -- feature selection with complex structures and nonlinear %feature selection with explicit feature mappings. A {Matching Pursuit LASSO} (MPL) algorithm is developed to address the large-scale sparse recovery problem. MPL is guaranteed to converge to a global solution, and greatly reduces the computational cost under \emph{big dictionary} (e.g. with 1 million atoms). In particular, by taking the advantage of its optimization scheme, a batch-mode MPL is developed to vastly speed up the optimization with many signals. A {Riemannian Pursuit} (RP) algorithm is proposed to address the low-rank {matrix recovery} problem. RP consists of a sequence of fixed-rank optimization problems. Each subproblem, solved by a nonlinear Riemannian conjugate gradient method. Compared to existing methods, RP does not require the rank estimation and performs stably on ill-conditioned big matrices. Extensive experiments on both synthetic and real-world problems demonstrate that the proposed methods achieve superior scalability and comparable or even better performance than the considered state-of-the-art baselines. DOCTOR OF PHILOSOPHY (SCE) 2014-12-04T09:02:55Z 2014-12-04T09:02:55Z 2014 2014 Thesis Tan, M. K. (2014). Towards efficient large-scale learning by exploiting sparsity. Doctoral thesis, Nanyang Technological University, Singapore. https://hdl.handle.net/10356/61881 10.32657/10356/61881 en 237 p. application/pdf
institution	Nanyang Technological University
building	NTU Library
continent	Asia
country	Singapore Singapore
content_provider	NTU Library
collection	DR-NTU
language	English
topic	DRNTU::Engineering::Computer science and engineering::Computing methodologies::Artificial intelligence
spellingShingle	DRNTU::Engineering::Computer science and engineering::Computing methodologies::Artificial intelligence Tan, Ming Kui Towards efficient large-scale learning by exploiting sparsity
description	The last decade has witnessed explosive growth in data. The ultrahigh-dimensional and large volume data have brought many critical issues, such as the storage disaster, the scalability issues for data analysis, and so on. To enable efficient and effective big data analysis, this thesis exploits the sparsity constraints of learning tasks and investigates large-scale learning in three directions, namely feature selection for classification tasks, sparse recovery for signal processing, and matrix recovery problem. %Focusing on the scalability challenges A {Feature Generating Machine} (FGM) is proposed to address the large-scale and ultrahigh-dimensional feature selection for classification tasks (e.g. $O(10^{12})$ features). Unlike traditional gradient-based approaches that conduct optimization on all features, FGM iteratively activates a group of features, and solves a sequence of subproblems w.r.t. the activated features only. As a result, it effectively avoids the storage disaster, and scales well on \emph{big data}. %FGM also tackles two challenging tasks -- feature selection with complex structures and nonlinear %feature selection with explicit feature mappings. A {Matching Pursuit LASSO} (MPL) algorithm is developed to address the large-scale sparse recovery problem. MPL is guaranteed to converge to a global solution, and greatly reduces the computational cost under \emph{big dictionary} (e.g. with 1 million atoms). In particular, by taking the advantage of its optimization scheme, a batch-mode MPL is developed to vastly speed up the optimization with many signals. A {Riemannian Pursuit} (RP) algorithm is proposed to address the low-rank {matrix recovery} problem. RP consists of a sequence of fixed-rank optimization problems. Each subproblem, solved by a nonlinear Riemannian conjugate gradient method. Compared to existing methods, RP does not require the rank estimation and performs stably on ill-conditioned big matrices. Extensive experiments on both synthetic and real-world problems demonstrate that the proposed methods achieve superior scalability and comparable or even better performance than the considered state-of-the-art baselines.
author2	School of Computer Engineering
author_facet	School of Computer Engineering Tan, Ming Kui
format	Theses and Dissertations
author	Tan, Ming Kui
author_sort	Tan, Ming Kui
title	Towards efficient large-scale learning by exploiting sparsity
title_short	Towards efficient large-scale learning by exploiting sparsity
title_full	Towards efficient large-scale learning by exploiting sparsity
title_fullStr	Towards efficient large-scale learning by exploiting sparsity
title_full_unstemmed	Towards efficient large-scale learning by exploiting sparsity
title_sort	towards efficient large-scale learning by exploiting sparsity
publishDate	2014
url	https://hdl.handle.net/10356/61881
_version_	1759854765599096832

Towards efficient large-scale learning by exploiting sparsity

Similar Items