Dissimilarity-based semi-supervised subset selection

Extracting useful information from large-scale data is a major challenge in the era of big data. As an effective means of information filtering and data summarization, the subset selection method selects the most informative subset from large-scale data to represent the entire data set to reduce the...

Full description

Saved in:
Bibliographic Details
Main Author: Lei, Yiran
Other Authors: Tan Yap Peng
Format: Thesis-Master by Coursework
Language:English
Published: Nanyang Technological University 2020
Subjects:
Online Access:https://hdl.handle.net/10356/140899
Tags: Add Tag
No Tags, Be the first to tag this record!
Institution: Nanyang Technological University
Language: English
id sg-ntu-dr.10356-140899
record_format dspace
spelling sg-ntu-dr.10356-1408992023-07-04T16:29:44Z Dissimilarity-based semi-supervised subset selection Lei, Yiran Tan Yap Peng School of Electrical and Electronic Engineering EYPTan@ntu.edu.sg Engineering::Electrical and electronic engineering Extracting useful information from large-scale data is a major challenge in the era of big data. As an effective means of information filtering and data summarization, the subset selection method selects the most informative subset from large-scale data to represent the entire data set to reduce the size of the data that needs to be processed. In this thesis, a kind of dissimilarity-based semi-supervised subset selection method is proposed. To begin with, the subset selection problem is treated as an convex optimization process with regularization. Thus the wanted subset is modeled as an unknown sparse matrix, which non-zero rows represent the target set by the source set. Then alternating optimization method is used to solve the Lagrangian form of the objective function. To utilize the information implicated in the labels of samples, semi-supervised algorithm is proposed to do unsupervised clustering and supervised representatives judgement. Afterwards, the iterative process will update the distribution of representatives based on the overall correlation coefficients of each category of target set. In the end, the optimal matrix and representatives will be output. Master of Science (Signal Processing) 2020-06-02T12:40:32Z 2020-06-02T12:40:32Z 2020 Thesis-Master by Coursework https://hdl.handle.net/10356/140899 en application/pdf Nanyang Technological University
institution Nanyang Technological University
building NTU Library
continent Asia
country Singapore
Singapore
content_provider NTU Library
collection DR-NTU
language English
topic Engineering::Electrical and electronic engineering
spellingShingle Engineering::Electrical and electronic engineering
Lei, Yiran
Dissimilarity-based semi-supervised subset selection
description Extracting useful information from large-scale data is a major challenge in the era of big data. As an effective means of information filtering and data summarization, the subset selection method selects the most informative subset from large-scale data to represent the entire data set to reduce the size of the data that needs to be processed. In this thesis, a kind of dissimilarity-based semi-supervised subset selection method is proposed. To begin with, the subset selection problem is treated as an convex optimization process with regularization. Thus the wanted subset is modeled as an unknown sparse matrix, which non-zero rows represent the target set by the source set. Then alternating optimization method is used to solve the Lagrangian form of the objective function. To utilize the information implicated in the labels of samples, semi-supervised algorithm is proposed to do unsupervised clustering and supervised representatives judgement. Afterwards, the iterative process will update the distribution of representatives based on the overall correlation coefficients of each category of target set. In the end, the optimal matrix and representatives will be output.
author2 Tan Yap Peng
author_facet Tan Yap Peng
Lei, Yiran
format Thesis-Master by Coursework
author Lei, Yiran
author_sort Lei, Yiran
title Dissimilarity-based semi-supervised subset selection
title_short Dissimilarity-based semi-supervised subset selection
title_full Dissimilarity-based semi-supervised subset selection
title_fullStr Dissimilarity-based semi-supervised subset selection
title_full_unstemmed Dissimilarity-based semi-supervised subset selection
title_sort dissimilarity-based semi-supervised subset selection
publisher Nanyang Technological University
publishDate 2020
url https://hdl.handle.net/10356/140899
_version_ 1772826665283485696