Topics in spectral analysis of large sample covariance matrices

This thesis addresses two topics concerning spectral properties of sample covariance matrices when the data dimensionality M scales proportionally with the sample size N. In the first part, we consider the left and right singular vectors u_i and v_i of an M×N data matrix Y=Σ^(1/2) X. We establish th...

وصف كامل

محفوظ في:
التفاصيل البيبلوغرافية
المؤلف الرئيسي: Lin, Zeqin
مؤلفون آخرون: Pan Guangming
التنسيق: Thesis-Doctor of Philosophy
اللغة:English
منشور في: Nanyang Technological University 2024
الموضوعات:
الوصول للمادة أونلاين:https://hdl.handle.net/10356/181489
الوسوم: إضافة وسم
لا توجد وسوم, كن أول من يضع وسما على هذه التسجيلة!
المؤسسة: Nanyang Technological University
اللغة: English
الوصف
الملخص:This thesis addresses two topics concerning spectral properties of sample covariance matrices when the data dimensionality M scales proportionally with the sample size N. In the first part, we consider the left and right singular vectors u_i and v_i of an M×N data matrix Y=Σ^(1/2) X. We establish the convergence in probability of the singular vector overlaps ⟨u_i,D_1 u_j⟩, ⟨v_i,D_2 v_j⟩ and ⟨u_i,D_3 v_j⟩ towards their deterministic counterparts, where the D_k's are general deterministic matrices with bounded operator norms. Building on these findings, we offer a more precise characterization of the loss associated with Ledoit and Wolf's nonlinear shrinkage estimators. The second part examines large signal-plus-noise data matrices of the form S+Σ^(1/2) X, where S is an M×N low-rank deterministic signal matrix and Σ^(1/2) X represents the noise matrix. Under general assumptions concerning the structure of (S,Σ) and the distribution of X, we establish the asymptotic joint distribution of the spiked singular values of the model when the signals are supercritical. It turns out that the asymptotic distributions exhibit nonuniversality in the sense of dependence on the distributions of X. As a corollary, we obtain the asymptotic distribution of the spiked eigenvalues associated with mixture models.