Topics in spectral analysis of large sample covariance matrices

This thesis addresses two topics concerning spectral properties of sample covariance matrices when the data dimensionality M scales proportionally with the sample size N. In the first part, we consider the left and right singular vectors u_i and v_i of an M×N data matrix Y=Σ^(1/2) X. We establish th...

Full description

Saved in:
Bibliographic Details
Main Author: Lin, Zeqin
Other Authors: Pan Guangming
Format: Thesis-Doctor of Philosophy
Language:English
Published: Nanyang Technological University 2024
Subjects:
Online Access:https://hdl.handle.net/10356/181489
Tags: Add Tag
No Tags, Be the first to tag this record!
Institution: Nanyang Technological University
Language: English
Description
Summary:This thesis addresses two topics concerning spectral properties of sample covariance matrices when the data dimensionality M scales proportionally with the sample size N. In the first part, we consider the left and right singular vectors u_i and v_i of an M×N data matrix Y=Σ^(1/2) X. We establish the convergence in probability of the singular vector overlaps ⟨u_i,D_1 u_j⟩, ⟨v_i,D_2 v_j⟩ and ⟨u_i,D_3 v_j⟩ towards their deterministic counterparts, where the D_k's are general deterministic matrices with bounded operator norms. Building on these findings, we offer a more precise characterization of the loss associated with Ledoit and Wolf's nonlinear shrinkage estimators. The second part examines large signal-plus-noise data matrices of the form S+Σ^(1/2) X, where S is an M×N low-rank deterministic signal matrix and Σ^(1/2) X represents the noise matrix. Under general assumptions concerning the structure of (S,Σ) and the distribution of X, we establish the asymptotic joint distribution of the spiked singular values of the model when the signals are supercritical. It turns out that the asymptotic distributions exhibit nonuniversality in the sense of dependence on the distributions of X. As a corollary, we obtain the asymptotic distribution of the spiked eigenvalues associated with mixture models.