Clustering and semi-supervised classification with application to driver distraction detection

Clustering and Semi-Supervised Classification (SSC) algorithms can make use of unlabeled training data and thus have the potential to alleviate labeling costs. For example, Extreme Learning Machine (ELM) was recently extended to semi-supervised learning and clustering with promising performance. Mea...

Full description

Saved in:
Bibliographic Details
Main Author: Liu, Tianchi
Other Authors: Huang Guangbin
Format: Theses and Dissertations
Language:English
Published: 2018
Subjects:
Online Access:https://hdl.handle.net/10356/89229
http://hdl.handle.net/10220/46179
Tags: Add Tag
No Tags, Be the first to tag this record!
Institution: Nanyang Technological University
Language: English
Description
Summary:Clustering and Semi-Supervised Classification (SSC) algorithms can make use of unlabeled training data and thus have the potential to alleviate labeling costs. For example, Extreme Learning Machine (ELM) was recently extended to semi-supervised learning and clustering with promising performance. Meanwhile, it is either costly or infeasible to obtain labeled training samples in some real-world applications. The thesis investigates clustering and SSC algorithms with application to driver distraction detection. Firstly, the thesis investigates embedding-based clustering. The desirable properties of embedding are reviewed in the literature, e.g., preserving the intrinsic data structure and maximizing the class separability. To obtain better embedding for clustering, the thesis considers both properties together and develops a novel clustering algorithm referred to as ELM for Joint Embedding and Clustering (ELM-JEC). Experimental studies on a wide range of benchmark datasets have show that ELM-JEC is competitive with the related methods. Secondly, the thesis investigates graph-based clustering. One limitation of existing graph learning methods is that they adjust the graph based on either the original data or the linearly projected data, which may not effectively reveal the underlying low- dimensional structures. To address this limitation, this thesis develops dual data representations, i.e., the original data and their nonlinear embedding obtained via an ELM- based neural network, and uses them as the basis for graph learning. The resulting algorithm is named as clustering based on ELM and Constrained Laplacian Rank (ELM- CLR). The experimental results show that ELM-CLR outperforms other adaptive graph learning methods on most benchmark datasets. Finally, the thesis applies the proposed clustering algorithms, i.e., ELM-JEC and ELM- CLR, and several SSC algorithms to driver distraction detection. The clustering algorithms are used on unlabeled data to generate preliminary labels as reference to assist human experts in the labeling process. In terms of the clustering accuracy, both proposed clustering algorithms perform better or on par with the related algorithms. The best clustering accuracy is achieved by ELM-JEC. Moreover, the research question of “which type of SSC method is more suitable for driver distraction detection?” is answered by evaluating two popular types of semi-supervised methods on a real-world dataset of drivers’ eye and head movements. The experimental results show that the graph-based methods achieve twice the improvement by the low-density-separation based method. It has also been shown that 1) the graph-based methods reduce the required amount of labeled training data, and 2) the benefits in detection accuracy increase with the size of unlabeled datasets. Overall, the thesis contributes two novel clustering algorithms by making use of ELM- based embedding and discovers that 1) better clustering performance on some datasets is expected, if the embedding preserves the intrinsic local structure and maximizes the class separability simultaneously, and 2) Both original and nonlinear embedded spaces are crucial to learning graphs with clear clusters. Moreover, the thesis contributes to the research on driver distraction detection by putting forward a semi-supervised driver distraction detection system with efficient labeling assistance and verifies it on an on- road driver distraction dataset.