Mathematical foundation of data science

High-dimensional probability theory bears vital importance in the mathematical foundation of data science. This project involves thoroughly reading a recent monograph “High-DimensionalProbability. An Introduction with Applications in Data Science” by Roman Vershynin. The book i...

Full description

Saved in:

Bibliographic Details
Main Author:	Fang, Xiaowei
Other Authors:	Li, Yi
Format:	Final Year Project
Language:	English
Published:	Nanyang Technological University 2020
Subjects:	Science::Mathematics
Online Access:	https://hdl.handle.net/10356/139274
Tags:	Add Tag No Tags, Be the first to tag this record!
Institution:	Nanyang Technological University
Language:	English

id	sg-ntu-dr.10356-139274
record_format	dspace
spelling	sg-ntu-dr.10356-1392742023-02-28T23:18:53Z Mathematical foundation of data science Fang, Xiaowei Li, Yi School of Physical and Mathematical Sciences yili@ntu.edu.sg Science::Mathematics High-dimensional probability theory bears vital importance in the mathematical foundation of data science. This project involves thoroughly reading a recent monograph “High-DimensionalProbability. An Introduction with Applications in Data Science” by Roman Vershynin. The book integrates high-dimensional probability with applications in data science, coveringthe gap between mathematical sophistication and the theoretical methods used in modern re-search. Well-divided emphasis are placed on three parts - Concentration, Stochastic Processcesand Random Projection & Section. Chapter 1 - 6 acts as the backbone of the book. We firstsaw concentration inequalities involving random vectors, random matrices, and random projec-tions, from which applications about semidefi- nite programming and maximum cut for graphsare developed. We were then introduced to covering and packing arguments, which prompts theapplications about error correcting codes, community detection in networks, covariance estimationand clustering via the bounds on sub-gaussian random matrices. Then follows concentration ofLipschitz functions, which allows the establishment of Johnson-Linderstrauss Lemma, Communitydetection in sparse networks and covariance estimation for general distribution. We also learneddecoupling and symmetrization tricks. The application about matrix completion stems from them. The second part delineates the deduction process of bounding the expected supremum of randomprocesses, which would grease the wheels of the last part of the book. The theoretical tools includesa bunch of comparison inequalities for Gaussian processes and the technique of Gaussian interpo-lation, which helps us ferret out the bound on the operator norm of Gaussian random matrix anda lowered bound on the Gaussian width, as well as bounds on the diameter of random projectionof sets. Later, the method of chaining and the combinatorial reasoning based on the VC dimensionenables us to bound the subgaussian-incremented processes and random quadratic form, extending two applications about empirical processes and statistical learning theory. The last bulk of thebook commenced with a remarkably useful uniform deviation inequality for random matrices andrandom projections, whose consequences embrace several recoveries of the results proved earlierby different methods and two innovated results - M bound and the Escape theorem. Whereafter, we immerse ourselves in the application of recovery of sparse signals and low-rank matrices. Of particular interest is the Lasso algorithm for sparse regressions. Last but not least, equipped withthe geometry of low-dimensional random projections, we wrapped up the book with a glimpse ofGaussian images of sets, projections of ellipsoids and random projections in the Grassmannian. This report gives a solution manual to nearly all the exercises in the book, based on the 24 May 2019 version of the electronic copy (Chapter 1 - Chapter 6) and the hard copy (Chapter 7 -Chapter 11). The problems are self-contained, presented prior to each solution. Bachelor of Science in Mathematical Sciences 2020-05-18T07:56:27Z 2020-05-18T07:56:27Z 2020 Final Year Project (FYP) https://hdl.handle.net/10356/139274 en application/pdf Nanyang Technological University
institution	Nanyang Technological University
building	NTU Library
continent	Asia
country	Singapore Singapore
content_provider	NTU Library
collection	DR-NTU
language	English
topic	Science::Mathematics
spellingShingle	Science::Mathematics Fang, Xiaowei Mathematical foundation of data science
description	High-dimensional probability theory bears vital importance in the mathematical foundation of data science. This project involves thoroughly reading a recent monograph “High-DimensionalProbability. An Introduction with Applications in Data Science” by Roman Vershynin. The book integrates high-dimensional probability with applications in data science, coveringthe gap between mathematical sophistication and the theoretical methods used in modern re-search. Well-divided emphasis are placed on three parts - Concentration, Stochastic Processcesand Random Projection & Section. Chapter 1 - 6 acts as the backbone of the book. We firstsaw concentration inequalities involving random vectors, random matrices, and random projec-tions, from which applications about semidefi- nite programming and maximum cut for graphsare developed. We were then introduced to covering and packing arguments, which prompts theapplications about error correcting codes, community detection in networks, covariance estimationand clustering via the bounds on sub-gaussian random matrices. Then follows concentration ofLipschitz functions, which allows the establishment of Johnson-Linderstrauss Lemma, Communitydetection in sparse networks and covariance estimation for general distribution. We also learneddecoupling and symmetrization tricks. The application about matrix completion stems from them. The second part delineates the deduction process of bounding the expected supremum of randomprocesses, which would grease the wheels of the last part of the book. The theoretical tools includesa bunch of comparison inequalities for Gaussian processes and the technique of Gaussian interpo-lation, which helps us ferret out the bound on the operator norm of Gaussian random matrix anda lowered bound on the Gaussian width, as well as bounds on the diameter of random projectionof sets. Later, the method of chaining and the combinatorial reasoning based on the VC dimensionenables us to bound the subgaussian-incremented processes and random quadratic form, extending two applications about empirical processes and statistical learning theory. The last bulk of thebook commenced with a remarkably useful uniform deviation inequality for random matrices andrandom projections, whose consequences embrace several recoveries of the results proved earlierby different methods and two innovated results - M bound and the Escape theorem. Whereafter, we immerse ourselves in the application of recovery of sparse signals and low-rank matrices. Of particular interest is the Lasso algorithm for sparse regressions. Last but not least, equipped withthe geometry of low-dimensional random projections, we wrapped up the book with a glimpse ofGaussian images of sets, projections of ellipsoids and random projections in the Grassmannian. This report gives a solution manual to nearly all the exercises in the book, based on the 24 May 2019 version of the electronic copy (Chapter 1 - Chapter 6) and the hard copy (Chapter 7 -Chapter 11). The problems are self-contained, presented prior to each solution.
author2	Li, Yi
author_facet	Li, Yi Fang, Xiaowei
format	Final Year Project
author	Fang, Xiaowei
author_sort	Fang, Xiaowei
title	Mathematical foundation of data science
title_short	Mathematical foundation of data science
title_full	Mathematical foundation of data science
title_fullStr	Mathematical foundation of data science
title_full_unstemmed	Mathematical foundation of data science
title_sort	mathematical foundation of data science
publisher	Nanyang Technological University
publishDate	2020
url	https://hdl.handle.net/10356/139274
_version_	1759857907349848064

Mathematical foundation of data science

Similar Items