Discovery of novel biomarkers using machine learning based methods in cross disorder psychiatry cohorts

Psychiatric disorders (PD) are gaining more attention nowadays due to it profound negative impact on individuals and the society. Therefore, genomic psychiatry is also gaining more interests as it holds much promise in biomarker discovery of PD. However, genomic dataset usually consists of high dime...

Full description

Saved in:
Bibliographic Details
Main Author: Cao, Shuwen
Other Authors: Jagath C Rajapakse
Format: Final Year Project
Language:English
Published: Nanyang Technological University 2023
Subjects:
Online Access:https://hdl.handle.net/10356/166555
Tags: Add Tag
No Tags, Be the first to tag this record!
Institution: Nanyang Technological University
Language: English
id sg-ntu-dr.10356-166555
record_format dspace
spelling sg-ntu-dr.10356-1665552023-05-05T15:42:49Z Discovery of novel biomarkers using machine learning based methods in cross disorder psychiatry cohorts Cao, Shuwen Jagath C Rajapakse School of Computer Science and Engineering ASJagath@ntu.edu.sg Engineering::Computer science and engineering Psychiatric disorders (PD) are gaining more attention nowadays due to it profound negative impact on individuals and the society. Therefore, genomic psychiatry is also gaining more interests as it holds much promise in biomarker discovery of PD. However, genomic dataset usually consists of high dimensional data with small sample size in a psychiatric outpatient clinic setting, which impose a major challenge for accurate and significant clinical analysis of the transcriptomic data. In this project, we address this issue by proposing a pipeline involving the state-of-the-art machine learning based methods to extract the salient set of genes, which are also known as features of the genomic data as potential biomarkers for future biological analysis. By using machine learning techniques, we aim to narrow down the number of genes, which are potential biomarkers that have a significant impact in identifying bipolar disorders (BD). To better stimulate the application of a psychiatric outpatient clinic setting, we carried out the investigation on transcriptomic data of lithium / non-lithium treated bipolar patients (n=240) and healthy controls (n=240). After a gamut of data pre-processing, univariate filtering using F-test was applied on the genomic data, followed with Principal Component Analysis (PCA) to perform dimensionality reduction. Lastly, we implemented multivariate feature selection method of recursive feature elimination using various machine learning models with nested cross-validation to select the set of genes giving the best prediction accuracy in distinguishing BD patients with healthy controls. The results obtained indicated that the genes selected by our proposed pipeline are able to achieve higher predictive accuracy in classifying BD patients and BD patients treated with lithium from healthy controls. We conclude that our proposed feature selection pipeline combining univariate filtering, PCA and multivariate feature selection with machine learning based methods is capable of overcoming the challenges of high dimensionality of gene expression data, and is able to select relevant salient features for further biological analysis. Bachelor of Science in Data Science and Artificial Intelligence 2023-05-05T02:37:23Z 2023-05-05T02:37:23Z 2023 Final Year Project (FYP) Cao, S. (2023). Discovery of novel biomarkers using machine learning based methods in cross disorder psychiatry cohorts. Final Year Project (FYP), Nanyang Technological University, Singapore. https://hdl.handle.net/10356/166555 https://hdl.handle.net/10356/166555 en application/pdf Nanyang Technological University
institution Nanyang Technological University
building NTU Library
continent Asia
country Singapore
Singapore
content_provider NTU Library
collection DR-NTU
language English
topic Engineering::Computer science and engineering
spellingShingle Engineering::Computer science and engineering
Cao, Shuwen
Discovery of novel biomarkers using machine learning based methods in cross disorder psychiatry cohorts
description Psychiatric disorders (PD) are gaining more attention nowadays due to it profound negative impact on individuals and the society. Therefore, genomic psychiatry is also gaining more interests as it holds much promise in biomarker discovery of PD. However, genomic dataset usually consists of high dimensional data with small sample size in a psychiatric outpatient clinic setting, which impose a major challenge for accurate and significant clinical analysis of the transcriptomic data. In this project, we address this issue by proposing a pipeline involving the state-of-the-art machine learning based methods to extract the salient set of genes, which are also known as features of the genomic data as potential biomarkers for future biological analysis. By using machine learning techniques, we aim to narrow down the number of genes, which are potential biomarkers that have a significant impact in identifying bipolar disorders (BD). To better stimulate the application of a psychiatric outpatient clinic setting, we carried out the investigation on transcriptomic data of lithium / non-lithium treated bipolar patients (n=240) and healthy controls (n=240). After a gamut of data pre-processing, univariate filtering using F-test was applied on the genomic data, followed with Principal Component Analysis (PCA) to perform dimensionality reduction. Lastly, we implemented multivariate feature selection method of recursive feature elimination using various machine learning models with nested cross-validation to select the set of genes giving the best prediction accuracy in distinguishing BD patients with healthy controls. The results obtained indicated that the genes selected by our proposed pipeline are able to achieve higher predictive accuracy in classifying BD patients and BD patients treated with lithium from healthy controls. We conclude that our proposed feature selection pipeline combining univariate filtering, PCA and multivariate feature selection with machine learning based methods is capable of overcoming the challenges of high dimensionality of gene expression data, and is able to select relevant salient features for further biological analysis.
author2 Jagath C Rajapakse
author_facet Jagath C Rajapakse
Cao, Shuwen
format Final Year Project
author Cao, Shuwen
author_sort Cao, Shuwen
title Discovery of novel biomarkers using machine learning based methods in cross disorder psychiatry cohorts
title_short Discovery of novel biomarkers using machine learning based methods in cross disorder psychiatry cohorts
title_full Discovery of novel biomarkers using machine learning based methods in cross disorder psychiatry cohorts
title_fullStr Discovery of novel biomarkers using machine learning based methods in cross disorder psychiatry cohorts
title_full_unstemmed Discovery of novel biomarkers using machine learning based methods in cross disorder psychiatry cohorts
title_sort discovery of novel biomarkers using machine learning based methods in cross disorder psychiatry cohorts
publisher Nanyang Technological University
publishDate 2023
url https://hdl.handle.net/10356/166555
_version_ 1770563583610978304