Variational maximization-maximization of Bayesian mixture models and application to unsupervised image classification

This thesis mainly propose variational inference for Bayesian mixture models and their applications to solve machine learning problems. The mixture models addressed are the Gaussian mixture model (GMM), Dirichlet process mixture (DPM), the sparse coding based Gaussian mixture model (sGMM) and the Fi...

Full description

Saved in:
Bibliographic Details
Main Author: Lim, Kart-Leong
Other Authors: Wang Han
Format: Theses and Dissertations
Language:English
Published: 2018
Subjects:
Online Access:http://hdl.handle.net/10356/73199
Tags: Add Tag
No Tags, Be the first to tag this record!
Institution: Nanyang Technological University
Language: English
id sg-ntu-dr.10356-73199
record_format dspace
institution Nanyang Technological University
building NTU Library
continent Asia
country Singapore
Singapore
content_provider NTU Library
collection DR-NTU
language English
topic DRNTU::Engineering::Computer science and engineering::Computing methodologies::Artificial intelligence
spellingShingle DRNTU::Engineering::Computer science and engineering::Computing methodologies::Artificial intelligence
Lim, Kart-Leong
Variational maximization-maximization of Bayesian mixture models and application to unsupervised image classification
description This thesis mainly propose variational inference for Bayesian mixture models and their applications to solve machine learning problems. The mixture models addressed are the Gaussian mixture model (GMM), Dirichlet process mixture (DPM), the sparse coding based Gaussian mixture model (sGMM) and the Field-of-Gaussian (FoG) mixture model. In most mixture models, when using a Bayesian approach, usually an inherent problem will arise from analytically intractable posterior distributions of the mixture models. Most recent works either apply Gibbs sampling or variational inference to solve this problem. Variational inference rely on two assumptions mainly the mean field theory and conjugate priors. Despite proven faster convergence than Gibbs sampling, the learning algorithm of variational inference is not computationally efficient. The learning of variational inference can be widely seen as first estimating the class assignment variable and then applying it to estimate parameters of the mixture model. The estimate is usually performed by computing the expectations of the prior models. However, learning is not exclusive to expectation. Welling and Kurihara saw other possible configurations that use different combinations of maximization or expectation for the estimation. For instance, the learner in variational inference is generalized under the expectation-expectation (EE) algorithm. Inspired by this, another variant of EE known as the maximization-maximization (MM) algorithm is proposed for the Bayesian mixture models mentioned earlier. However, MM is not without issue. Firstly, it is very rare to find any theoretical study comparing MM to EE. Secondly, the computational efficiency and the accuracy of MM is rarely compared to EE. Hence, it is difficult to convince the use of MM over a mainstream learner such as EE or even Gibbs sampling. In this thesis, a revisit to the learning objective of EE on a Bayesian GMM case is made with comparison to the MM objective. In Bayesian nonparametrics, variational inference is not the mainstream in fast algorithms for DPM mainly due to high computational cost. Instead, most fast algorithms are largely based on MAP estimation of Gibbs sampling probabilities. However, they usually face intractable posterior and typically degenerate the conditional likelihood to overcome the inefficiency. Inspired by fast DPM algorithms, the MM algorithm is proposed for approximating expectations of variational posteriors for DPM. The proposed method outperforms variational inference in terms of clustering accuracy, model selection and has comparable computational efficiency to fast DPM algorithms. The Sparse Coding based Fisher Vector extends traditional GMM based Fisher Vector with a sparse term. In the original work, an off-the-shelf Sparse Coding solver is required. From a statistical perspective, an off-the-shelf solver may appear as a black-box. A more elegant way is to use a probablistic model to learn Sparse Coding. In this thesis a new GMM known as the Sparse Coding based GMM (sGMM) is proposed. It differs from standard GMM by an additional sparse coefficient hidden variable. Variational inference using the MM algorithm is then proposed to perform parameter estimation of sGMM. These parameters are then used to compute a Sparse Coding based Fisher Vector for image representation. Most unsupervised learners such as K-means, GMM or Sparse Coding do not use structure information found from the neighbourhood of an image patch for parameter estimation. A recent image prior technique combine unsupervised learning with Markov Random Field (MRF) by replacing the MRF potential with an unsupervised learner. It uses a neighbourhood of image patches for learning the parameters of the unsupervised learner, thus capturing image structure information. In this thesis, the above image prior technique is applied to the GMM parameter estimation so that the learnt model is more expressive than GMM learnt from a single image patch alone.
author2 Wang Han
author_facet Wang Han
Lim, Kart-Leong
format Theses and Dissertations
author Lim, Kart-Leong
author_sort Lim, Kart-Leong
title Variational maximization-maximization of Bayesian mixture models and application to unsupervised image classification
title_short Variational maximization-maximization of Bayesian mixture models and application to unsupervised image classification
title_full Variational maximization-maximization of Bayesian mixture models and application to unsupervised image classification
title_fullStr Variational maximization-maximization of Bayesian mixture models and application to unsupervised image classification
title_full_unstemmed Variational maximization-maximization of Bayesian mixture models and application to unsupervised image classification
title_sort variational maximization-maximization of bayesian mixture models and application to unsupervised image classification
publishDate 2018
url http://hdl.handle.net/10356/73199
_version_ 1772828147777011712
spelling sg-ntu-dr.10356-731992023-07-04T17:25:24Z Variational maximization-maximization of Bayesian mixture models and application to unsupervised image classification Lim, Kart-Leong Wang Han School of Electrical and Electronic Engineering DRNTU::Engineering::Computer science and engineering::Computing methodologies::Artificial intelligence This thesis mainly propose variational inference for Bayesian mixture models and their applications to solve machine learning problems. The mixture models addressed are the Gaussian mixture model (GMM), Dirichlet process mixture (DPM), the sparse coding based Gaussian mixture model (sGMM) and the Field-of-Gaussian (FoG) mixture model. In most mixture models, when using a Bayesian approach, usually an inherent problem will arise from analytically intractable posterior distributions of the mixture models. Most recent works either apply Gibbs sampling or variational inference to solve this problem. Variational inference rely on two assumptions mainly the mean field theory and conjugate priors. Despite proven faster convergence than Gibbs sampling, the learning algorithm of variational inference is not computationally efficient. The learning of variational inference can be widely seen as first estimating the class assignment variable and then applying it to estimate parameters of the mixture model. The estimate is usually performed by computing the expectations of the prior models. However, learning is not exclusive to expectation. Welling and Kurihara saw other possible configurations that use different combinations of maximization or expectation for the estimation. For instance, the learner in variational inference is generalized under the expectation-expectation (EE) algorithm. Inspired by this, another variant of EE known as the maximization-maximization (MM) algorithm is proposed for the Bayesian mixture models mentioned earlier. However, MM is not without issue. Firstly, it is very rare to find any theoretical study comparing MM to EE. Secondly, the computational efficiency and the accuracy of MM is rarely compared to EE. Hence, it is difficult to convince the use of MM over a mainstream learner such as EE or even Gibbs sampling. In this thesis, a revisit to the learning objective of EE on a Bayesian GMM case is made with comparison to the MM objective. In Bayesian nonparametrics, variational inference is not the mainstream in fast algorithms for DPM mainly due to high computational cost. Instead, most fast algorithms are largely based on MAP estimation of Gibbs sampling probabilities. However, they usually face intractable posterior and typically degenerate the conditional likelihood to overcome the inefficiency. Inspired by fast DPM algorithms, the MM algorithm is proposed for approximating expectations of variational posteriors for DPM. The proposed method outperforms variational inference in terms of clustering accuracy, model selection and has comparable computational efficiency to fast DPM algorithms. The Sparse Coding based Fisher Vector extends traditional GMM based Fisher Vector with a sparse term. In the original work, an off-the-shelf Sparse Coding solver is required. From a statistical perspective, an off-the-shelf solver may appear as a black-box. A more elegant way is to use a probablistic model to learn Sparse Coding. In this thesis a new GMM known as the Sparse Coding based GMM (sGMM) is proposed. It differs from standard GMM by an additional sparse coefficient hidden variable. Variational inference using the MM algorithm is then proposed to perform parameter estimation of sGMM. These parameters are then used to compute a Sparse Coding based Fisher Vector for image representation. Most unsupervised learners such as K-means, GMM or Sparse Coding do not use structure information found from the neighbourhood of an image patch for parameter estimation. A recent image prior technique combine unsupervised learning with Markov Random Field (MRF) by replacing the MRF potential with an unsupervised learner. It uses a neighbourhood of image patches for learning the parameters of the unsupervised learner, thus capturing image structure information. In this thesis, the above image prior technique is applied to the GMM parameter estimation so that the learnt model is more expressive than GMM learnt from a single image patch alone. Doctor of Philosophy (EEE) 2018-01-19T04:59:11Z 2018-01-19T04:59:11Z 2018 Thesis Lim, K.-L. (2018). Variational maximization-maximization of Bayesian mixture models and application to unsupervised image classification. Doctoral thesis, Nanyang Technological University, Singapore. http://hdl.handle.net/10356/73199 10.32657/10356/73199 en 186 p. application/pdf