Clustering and semi-supervised classification with application to driver distraction detection

Clustering and Semi-Supervised Classification (SSC) algorithms can make use of unlabeled training data and thus have the potential to alleviate labeling costs. For example, Extreme Learning Machine (ELM) was recently extended to semi-supervised learning and clustering with promising performance. Mea...

Full description

Saved in:
Bibliographic Details
Main Author: Liu, Tianchi
Other Authors: Huang Guangbin
Format: Theses and Dissertations
Language:English
Published: 2018
Subjects:
Online Access:https://hdl.handle.net/10356/89229
http://hdl.handle.net/10220/46179
Tags: Add Tag
No Tags, Be the first to tag this record!
Institution: Nanyang Technological University
Language: English
id sg-ntu-dr.10356-89229
record_format dspace
institution Nanyang Technological University
building NTU Library
continent Asia
country Singapore
Singapore
content_provider NTU Library
collection DR-NTU
language English
topic DRNTU::Engineering::Computer science and engineering::Computing methodologies::Pattern recognition
DRNTU::Engineering::Electrical and electronic engineering::Computer hardware, software and systems
spellingShingle DRNTU::Engineering::Computer science and engineering::Computing methodologies::Pattern recognition
DRNTU::Engineering::Electrical and electronic engineering::Computer hardware, software and systems
Liu, Tianchi
Clustering and semi-supervised classification with application to driver distraction detection
description Clustering and Semi-Supervised Classification (SSC) algorithms can make use of unlabeled training data and thus have the potential to alleviate labeling costs. For example, Extreme Learning Machine (ELM) was recently extended to semi-supervised learning and clustering with promising performance. Meanwhile, it is either costly or infeasible to obtain labeled training samples in some real-world applications. The thesis investigates clustering and SSC algorithms with application to driver distraction detection. Firstly, the thesis investigates embedding-based clustering. The desirable properties of embedding are reviewed in the literature, e.g., preserving the intrinsic data structure and maximizing the class separability. To obtain better embedding for clustering, the thesis considers both properties together and develops a novel clustering algorithm referred to as ELM for Joint Embedding and Clustering (ELM-JEC). Experimental studies on a wide range of benchmark datasets have show that ELM-JEC is competitive with the related methods. Secondly, the thesis investigates graph-based clustering. One limitation of existing graph learning methods is that they adjust the graph based on either the original data or the linearly projected data, which may not effectively reveal the underlying low- dimensional structures. To address this limitation, this thesis develops dual data representations, i.e., the original data and their nonlinear embedding obtained via an ELM- based neural network, and uses them as the basis for graph learning. The resulting algorithm is named as clustering based on ELM and Constrained Laplacian Rank (ELM- CLR). The experimental results show that ELM-CLR outperforms other adaptive graph learning methods on most benchmark datasets. Finally, the thesis applies the proposed clustering algorithms, i.e., ELM-JEC and ELM- CLR, and several SSC algorithms to driver distraction detection. The clustering algorithms are used on unlabeled data to generate preliminary labels as reference to assist human experts in the labeling process. In terms of the clustering accuracy, both proposed clustering algorithms perform better or on par with the related algorithms. The best clustering accuracy is achieved by ELM-JEC. Moreover, the research question of “which type of SSC method is more suitable for driver distraction detection?” is answered by evaluating two popular types of semi-supervised methods on a real-world dataset of drivers’ eye and head movements. The experimental results show that the graph-based methods achieve twice the improvement by the low-density-separation based method. It has also been shown that 1) the graph-based methods reduce the required amount of labeled training data, and 2) the benefits in detection accuracy increase with the size of unlabeled datasets. Overall, the thesis contributes two novel clustering algorithms by making use of ELM- based embedding and discovers that 1) better clustering performance on some datasets is expected, if the embedding preserves the intrinsic local structure and maximizes the class separability simultaneously, and 2) Both original and nonlinear embedded spaces are crucial to learning graphs with clear clusters. Moreover, the thesis contributes to the research on driver distraction detection by putting forward a semi-supervised driver distraction detection system with efficient labeling assistance and verifies it on an on- road driver distraction dataset.
author2 Huang Guangbin
author_facet Huang Guangbin
Liu, Tianchi
format Theses and Dissertations
author Liu, Tianchi
author_sort Liu, Tianchi
title Clustering and semi-supervised classification with application to driver distraction detection
title_short Clustering and semi-supervised classification with application to driver distraction detection
title_full Clustering and semi-supervised classification with application to driver distraction detection
title_fullStr Clustering and semi-supervised classification with application to driver distraction detection
title_full_unstemmed Clustering and semi-supervised classification with application to driver distraction detection
title_sort clustering and semi-supervised classification with application to driver distraction detection
publishDate 2018
url https://hdl.handle.net/10356/89229
http://hdl.handle.net/10220/46179
_version_ 1772827956639432704
spelling sg-ntu-dr.10356-892292023-07-04T16:33:15Z Clustering and semi-supervised classification with application to driver distraction detection Liu, Tianchi Huang Guangbin Lin Zhiping School of Electrical and Electronic Engineering DRNTU::Engineering::Computer science and engineering::Computing methodologies::Pattern recognition DRNTU::Engineering::Electrical and electronic engineering::Computer hardware, software and systems Clustering and Semi-Supervised Classification (SSC) algorithms can make use of unlabeled training data and thus have the potential to alleviate labeling costs. For example, Extreme Learning Machine (ELM) was recently extended to semi-supervised learning and clustering with promising performance. Meanwhile, it is either costly or infeasible to obtain labeled training samples in some real-world applications. The thesis investigates clustering and SSC algorithms with application to driver distraction detection. Firstly, the thesis investigates embedding-based clustering. The desirable properties of embedding are reviewed in the literature, e.g., preserving the intrinsic data structure and maximizing the class separability. To obtain better embedding for clustering, the thesis considers both properties together and develops a novel clustering algorithm referred to as ELM for Joint Embedding and Clustering (ELM-JEC). Experimental studies on a wide range of benchmark datasets have show that ELM-JEC is competitive with the related methods. Secondly, the thesis investigates graph-based clustering. One limitation of existing graph learning methods is that they adjust the graph based on either the original data or the linearly projected data, which may not effectively reveal the underlying low- dimensional structures. To address this limitation, this thesis develops dual data representations, i.e., the original data and their nonlinear embedding obtained via an ELM- based neural network, and uses them as the basis for graph learning. The resulting algorithm is named as clustering based on ELM and Constrained Laplacian Rank (ELM- CLR). The experimental results show that ELM-CLR outperforms other adaptive graph learning methods on most benchmark datasets. Finally, the thesis applies the proposed clustering algorithms, i.e., ELM-JEC and ELM- CLR, and several SSC algorithms to driver distraction detection. The clustering algorithms are used on unlabeled data to generate preliminary labels as reference to assist human experts in the labeling process. In terms of the clustering accuracy, both proposed clustering algorithms perform better or on par with the related algorithms. The best clustering accuracy is achieved by ELM-JEC. Moreover, the research question of “which type of SSC method is more suitable for driver distraction detection?” is answered by evaluating two popular types of semi-supervised methods on a real-world dataset of drivers’ eye and head movements. The experimental results show that the graph-based methods achieve twice the improvement by the low-density-separation based method. It has also been shown that 1) the graph-based methods reduce the required amount of labeled training data, and 2) the benefits in detection accuracy increase with the size of unlabeled datasets. Overall, the thesis contributes two novel clustering algorithms by making use of ELM- based embedding and discovers that 1) better clustering performance on some datasets is expected, if the embedding preserves the intrinsic local structure and maximizes the class separability simultaneously, and 2) Both original and nonlinear embedded spaces are crucial to learning graphs with clear clusters. Moreover, the thesis contributes to the research on driver distraction detection by putting forward a semi-supervised driver distraction detection system with efficient labeling assistance and verifies it on an on- road driver distraction dataset. Doctor of Philosophy 2018-10-02T07:09:48Z 2019-12-06T17:20:42Z 2018-10-02T07:09:48Z 2019-12-06T17:20:42Z 2018 Thesis Liu, T. (2018). Clustering and semi-supervised classification with application to driver distraction detection. Doctoral thesis, Nanyang Technological University, Singapore. https://hdl.handle.net/10356/89229 http://hdl.handle.net/10220/46179 10.32657/10220/46179 en 146 p. application/pdf