Stochastic optimization methods for structure learning in Gaussian graphical models and the general log-determinant optimization

Graphical models compactly represent the most significant interactions of multivariate probability distributions, provide an efficient inference framework to answer challenging statistical queries, and incorporate both expert knowledge with data to extract information from complex systems. When the...

Full description

Saved in:
Bibliographic Details
Main Author: Wu, Songwei
Other Authors: Justin Dauwels
Format: Thesis-Doctor of Philosophy
Language:English
Published: Nanyang Technological University 2020
Subjects:
Online Access:https://hdl.handle.net/10356/142033
Tags: Add Tag
No Tags, Be the first to tag this record!
Institution: Nanyang Technological University
Language: English
Description
Summary:Graphical models compactly represent the most significant interactions of multivariate probability distributions, provide an efficient inference framework to answer challenging statistical queries, and incorporate both expert knowledge with data to extract information from complex systems. When the graphical model is assumed to be Gaussian, the resulting model features attractive properties and appears frequently in cutting-edge applications. In the application of Gaussian graphical models, it is fundamental and challenging to learn the graph structure from given multivariate data. The time complexity of existing methods is subject to the cumbersome operations such as matrix inversion, full spectral decomposition, and Cholesky decompositions, which generally take $\mathcal O(N^3)$ arithmetic operations. This computational bottleneck severely impedes the scalability of those methods into problems at large scale. Moreover, structure learning in Gaussian graphical model is a specialized case of the general log-determinant optimization, which entails general matrix constraints and covers much broader fields of applications. The bottleneck of the cubic time complexity also occurs in existing methods for the general log-determinant optimization. In this thesis, we propose a novel method to learn the structure of the Gaussian graphical model with the aid of stochastic approximation so that the time complexity can be reduced from $\mathcal O(N^3)$ to $\mathcal O(N^2)$. In addition, we generalize the proposed method within the framework of primal-dual optimization so that we can settle the general log-determinant optimization problem with similar efficiency. We first address structure learning in Gaussian graphical models. Given multivariate data, the problem can be formulated as estimating a sparse precision matrix, because of the equivalence between the graph structure of the graphical model, the conditional independence relationship of the underlying Gaussian distribution, and the zero pattern of the precision matrix. The resulting objective function leads to a non-smooth positive definite convex programming problem, which contains a complex log-determinant term. Generally, there are two important questions for all methods: how to update the precision matrix and how to ensure positive definiteness of the estimate. To address the first question, we construct an efficient stochastic approximation of the matrix inverse term, obtain the stochastic realization of the subgradient at much reduced cost, and iteratively update the precision matrix following the opposite direction of the subgradient. The resulting time complexity of the stochastic subgradient is only $\mathcal O(N^2)$. As for the second challenge, we repeatedly evaluate the smallest eigenvalue of the updated precision matrix by the implicitly restarted Lanczos method, search for a proper step size, and render the estimate in the positive definite cone. The implicitly restarted Lanczos method takes only $\mathcal O(N^2)$ arithmetic operations to estimate the extreme eigenvalue of a $N$-dimensional matrix. Furthermore, we propose an adaptive step size which mitigates the influence of noise, avoids overshooting, and accelerates convergence. We theoretically and empirically analyze the resulting method. Both theoretical and numerical results demonstrate the improved scalability, certain convergence, and favorable properties of the proposed method. The general log-determinant problem imposes general linear matrix constraints on the objective function. The proposed method for structure learning in Gaussian graphical models can only solve the unconstrained optimization problem. Therefore, the matrix constraints render the proposed method inapplicable to the general log-determinant problem. In order to generalize the proposed method, we analyze the augmented Lagrangian function, establish the equivalence between the primal-dual pair and the saddle point, and convert the optimization problem into searching for a saddle point of the augmented Lagrangian function. Following this analysis, the generalized method extends the Uzawa iterations to stochastic optimization, incorporates the techniques employed in structure learning in Gaussian graphical models, and obtains the optimum at the cost of only $\mathcal O(N^2)$ arithmetic operations. Furthermore, we investigate the theoretical performance and establish the guarantee of convergence. Besides the theoretical analysis, we also numerically analyze the performance on both synthetic and real data. Through numerical experiments, we compare various values of important parameters, suggest informative guidelines on parameter settings, validate the main theorem, and provide evidence for the reduction of the time complexity from cubic to quadratic. In conclusion, we propose a novel method for structure learning in Gaussian graphical models with the aid of stochastic optimization and further extend the method into primal-dual optimization to settle the general log-determinant optimization problem. Such methods overcome the computational bottleneck of existing methods and improve the scalability for high-dimensional problems.