Discovery of novel biomarkers using machine learning based methods in cross disorder psychiatry cohorts

Psychiatric disorders (PD) are gaining more attention nowadays due to it profound negative impact on individuals and the society. Therefore, genomic psychiatry is also gaining more interests as it holds much promise in biomarker discovery of PD. However, genomic dataset usually consists of high dime...

Full description

Saved in:

Bibliographic Details
Main Author:	Cao, Shuwen
Other Authors:	Jagath C Rajapakse
Format:	Final Year Project
Language:	English
Published:	Nanyang Technological University 2023
Subjects:	Engineering::Computer science and engineering
Online Access:	https://hdl.handle.net/10356/166555
Tags:	Add Tag No Tags, Be the first to tag this record!
Institution:	Nanyang Technological University
Language:	English

id	sg-ntu-dr.10356-166555
record_format	dspace
spelling	sg-ntu-dr.10356-1665552023-05-05T15:42:49Z Discovery of novel biomarkers using machine learning based methods in cross disorder psychiatry cohorts Cao, Shuwen Jagath C Rajapakse School of Computer Science and Engineering ASJagath@ntu.edu.sg Engineering::Computer science and engineering Psychiatric disorders (PD) are gaining more attention nowadays due to it profound negative impact on individuals and the society. Therefore, genomic psychiatry is also gaining more interests as it holds much promise in biomarker discovery of PD. However, genomic dataset usually consists of high dimensional data with small sample size in a psychiatric outpatient clinic setting, which impose a major challenge for accurate and significant clinical analysis of the transcriptomic data. In this project, we address this issue by proposing a pipeline involving the state-of-the-art machine learning based methods to extract the salient set of genes, which are also known as features of the genomic data as potential biomarkers for future biological analysis. By using machine learning techniques, we aim to narrow down the number of genes, which are potential biomarkers that have a significant impact in identifying bipolar disorders (BD). To better stimulate the application of a psychiatric outpatient clinic setting, we carried out the investigation on transcriptomic data of lithium / non-lithium treated bipolar patients (n=240) and healthy controls (n=240). After a gamut of data pre-processing, univariate filtering using F-test was applied on the genomic data, followed with Principal Component Analysis (PCA) to perform dimensionality reduction. Lastly, we implemented multivariate feature selection method of recursive feature elimination using various machine learning models with nested cross-validation to select the set of genes giving the best prediction accuracy in distinguishing BD patients with healthy controls. The results obtained indicated that the genes selected by our proposed pipeline are able to achieve higher predictive accuracy in classifying BD patients and BD patients treated with lithium from healthy controls. We conclude that our proposed feature selection pipeline combining univariate filtering, PCA and multivariate feature selection with machine learning based methods is capable of overcoming the challenges of high dimensionality of gene expression data, and is able to select relevant salient features for further biological analysis. Bachelor of Science in Data Science and Artificial Intelligence 2023-05-05T02:37:23Z 2023-05-05T02:37:23Z 2023 Final Year Project (FYP) Cao, S. (2023). Discovery of novel biomarkers using machine learning based methods in cross disorder psychiatry cohorts. Final Year Project (FYP), Nanyang Technological University, Singapore. https://hdl.handle.net/10356/166555 https://hdl.handle.net/10356/166555 en application/pdf Nanyang Technological University
institution	Nanyang Technological University
building	NTU Library
continent	Asia
country	Singapore Singapore
content_provider	NTU Library
collection	DR-NTU
language	English
topic	Engineering::Computer science and engineering
spellingShingle	Engineering::Computer science and engineering Cao, Shuwen Discovery of novel biomarkers using machine learning based methods in cross disorder psychiatry cohorts
description	Psychiatric disorders (PD) are gaining more attention nowadays due to it profound negative impact on individuals and the society. Therefore, genomic psychiatry is also gaining more interests as it holds much promise in biomarker discovery of PD. However, genomic dataset usually consists of high dimensional data with small sample size in a psychiatric outpatient clinic setting, which impose a major challenge for accurate and significant clinical analysis of the transcriptomic data. In this project, we address this issue by proposing a pipeline involving the state-of-the-art machine learning based methods to extract the salient set of genes, which are also known as features of the genomic data as potential biomarkers for future biological analysis. By using machine learning techniques, we aim to narrow down the number of genes, which are potential biomarkers that have a significant impact in identifying bipolar disorders (BD). To better stimulate the application of a psychiatric outpatient clinic setting, we carried out the investigation on transcriptomic data of lithium / non-lithium treated bipolar patients (n=240) and healthy controls (n=240). After a gamut of data pre-processing, univariate filtering using F-test was applied on the genomic data, followed with Principal Component Analysis (PCA) to perform dimensionality reduction. Lastly, we implemented multivariate feature selection method of recursive feature elimination using various machine learning models with nested cross-validation to select the set of genes giving the best prediction accuracy in distinguishing BD patients with healthy controls. The results obtained indicated that the genes selected by our proposed pipeline are able to achieve higher predictive accuracy in classifying BD patients and BD patients treated with lithium from healthy controls. We conclude that our proposed feature selection pipeline combining univariate filtering, PCA and multivariate feature selection with machine learning based methods is capable of overcoming the challenges of high dimensionality of gene expression data, and is able to select relevant salient features for further biological analysis.
author2	Jagath C Rajapakse
author_facet	Jagath C Rajapakse Cao, Shuwen
format	Final Year Project
author	Cao, Shuwen
author_sort	Cao, Shuwen
title	Discovery of novel biomarkers using machine learning based methods in cross disorder psychiatry cohorts
title_short	Discovery of novel biomarkers using machine learning based methods in cross disorder psychiatry cohorts
title_full	Discovery of novel biomarkers using machine learning based methods in cross disorder psychiatry cohorts
title_fullStr	Discovery of novel biomarkers using machine learning based methods in cross disorder psychiatry cohorts
title_full_unstemmed	Discovery of novel biomarkers using machine learning based methods in cross disorder psychiatry cohorts
title_sort	discovery of novel biomarkers using machine learning based methods in cross disorder psychiatry cohorts
publisher	Nanyang Technological University
publishDate	2023
url	https://hdl.handle.net/10356/166555
_version_	1770563583610978304

Discovery of novel biomarkers using machine learning based methods in cross disorder psychiatry cohorts

Similar Items