Encoding and decoding multimodal brain images for disease biomarker discovery

Many neurodegenerative diseases (such as Alzheimer’s disease) and neuropsychiatric disorders (such as schizophrenia) are still poorly understood. While incorporating multimodal datasets (structural and functional) would provide a more complete picture of these neurological conditions, existing studi...

全面介紹

Saved in:

書目詳細資料
主要作者:	Chan, Yi Hao
其他作者:	Jagath C Rajapakse
格式:	Thesis-Doctor of Philosophy
語言:	English
出版:	Nanyang Technological University 2024
主題:	Engineering::Computer science and engineering::Computing methodologies::Artificial intelligence Engineering::Computer science and engineering::Computer applications::Life and medical sciences
在線閱讀:	https://hdl.handle.net/10356/173338
標簽:	添加標簽沒有標簽, 成為第一個標記此記錄!
機構:	Nanyang Technological University
語言:	English

id	sg-ntu-dr.10356-173338
record_format	dspace
institution	Nanyang Technological University
building	NTU Library
continent	Asia
country	Singapore Singapore
content_provider	NTU Library
collection	DR-NTU
language	English
topic	Engineering::Computer science and engineering::Computing methodologies::Artificial intelligence Engineering::Computer science and engineering::Computer applications::Life and medical sciences
spellingShingle	Engineering::Computer science and engineering::Computing methodologies::Artificial intelligence Engineering::Computer science and engineering::Computer applications::Life and medical sciences Chan, Yi Hao Encoding and decoding multimodal brain images for disease biomarker discovery
description	Many neurodegenerative diseases (such as Alzheimer’s disease) and neuropsychiatric disorders (such as schizophrenia) are still poorly understood. While incorporating multimodal datasets (structural and functional) would provide a more complete picture of these neurological conditions, existing studies already struggle with two key challenges even when a single data modality is used: data scarcity and high dimensionality. These problems make machine learning models overfit easily on neuroimaging datasets and result in the generation of spurious biomarkers. Coupled with issues such as inter-scanner variability and disease heterogeneity, biomarker discovery remains an onerous task as reproducible biomarkers for these neurological conditions have proven to be elusive. Overcoming these obstacles would give us a better understanding of these diseases and possibly lead to the discovery of new cures. One emerging trend in the area of biomarker discovery is the use of deep learning models. Despite its perceived complexity, one relatively unexplored aspect of deep learning architectures is the flexibility they afford for modelling multimodal datasets, in terms of architectural design and construction of the loss function. In this thesis, we propose several deep learning architectures customised for the idiosyncrasies of brain imaging datasets and demonstrate how they can produce reliable biomarkers. Firstly, the challenges of high dimensionality and data scarcity in brain imaging datasets are investigated. Existing approaches alleviated the issue by removing less important nodes from the neural network architecture, including the input features. However, it is common to have sites with very small datasets (< 100) where such pruning-based approaches are no longer effective. Thus, we propose an alternative approach based on data harmonisation and semi-supervised learning that tackles this issue by incorporating more data (on the same disease, but collected from other sites) into the analysis while addressing issues such as site differences and labelling inconsistencies. We demonstrated that existing works on biomarker discovery produce biomarkers that could be biased towards the largest site, especially when data imbalance exists across sites. On the other hand, our proposed architecture provides a technique to arrive at site-invariant biomarkers, making it also possible to reveal site-specific biomarkers more clearly. Secondly, a persistent issue with existing machine learning models is their poor generalisability on neuroimaging datasets. This is often hypothesised to be caused by disease heterogeneity. Furthermore, while some sites are carefully curated such that they represent a homogeneous subgroup, many sites still contain a heterogeneous mix of patients, warranting a need for techniques that can capture such variability. Our proposed approach uses both the brain graph and population graph approach to model structural and functional brain networks. It introduces an additional fine-tuning step that adds clustering information into the construction of the population graph, based on the clusters obtained from the brain graph embeddings. This makes it possible to produce subtype-specific biomarkers. Thirdly, we explore the possibility of combining both multimodal neuroimaging data and multi-omics datasets. Structural and functional neuroimaging data provides whole-brain, macroscale information but the limited spatial resolution leaves out microscale information. These are better captured via multi-omics data, but such multimodal and multi-scale imaging genetics research has been relatively under-explored as it further exacerbates the above-mentioned problem of model overfitting. To address this, we propose a scalable deep neural network architecture based on the attention mechanism that allows a myriad of omics modalities to be incorporated into the analysis. We demonstrate how combining imaging and genetics data leads to better model performance and go further to show how relative importance of the modalities involved in the analysis can be determined. Overall, these works demonstrated how existing biomarker discovery approaches are limited to a generic, class-wide view of the disease that could be biased especially for imbalanced multi-site datasets. These problems are addressed by careful design of deep learning architectures (graph neural networks with data harmonisation, semi-supervised learning, clustering and the attention mechanism) to take care of issues unique to biomedical datasets, leading to better model performance and producing richer biomarkers in the form of site-specific, site-invariant, subtype-specific and imaging genetics biomarkers. These biomarkers pave the way towards more effective treatments for these complex neurological conditions.
author2	Jagath C Rajapakse
author_facet	Jagath C Rajapakse Chan, Yi Hao
format	Thesis-Doctor of Philosophy
author	Chan, Yi Hao
author_sort	Chan, Yi Hao
title	Encoding and decoding multimodal brain images for disease biomarker discovery
title_short	Encoding and decoding multimodal brain images for disease biomarker discovery
title_full	Encoding and decoding multimodal brain images for disease biomarker discovery
title_fullStr	Encoding and decoding multimodal brain images for disease biomarker discovery
title_full_unstemmed	Encoding and decoding multimodal brain images for disease biomarker discovery
title_sort	encoding and decoding multimodal brain images for disease biomarker discovery
publisher	Nanyang Technological University
publishDate	2024
url	https://hdl.handle.net/10356/173338
_version_	1789968696527028224
spelling	sg-ntu-dr.10356-1733382024-02-02T15:37:31Z Encoding and decoding multimodal brain images for disease biomarker discovery Chan, Yi Hao Jagath C Rajapakse School of Computer Science and Engineering Biomedical Informatics Lab ASJagath@ntu.edu.sg Engineering::Computer science and engineering::Computing methodologies::Artificial intelligence Engineering::Computer science and engineering::Computer applications::Life and medical sciences Many neurodegenerative diseases (such as Alzheimer’s disease) and neuropsychiatric disorders (such as schizophrenia) are still poorly understood. While incorporating multimodal datasets (structural and functional) would provide a more complete picture of these neurological conditions, existing studies already struggle with two key challenges even when a single data modality is used: data scarcity and high dimensionality. These problems make machine learning models overfit easily on neuroimaging datasets and result in the generation of spurious biomarkers. Coupled with issues such as inter-scanner variability and disease heterogeneity, biomarker discovery remains an onerous task as reproducible biomarkers for these neurological conditions have proven to be elusive. Overcoming these obstacles would give us a better understanding of these diseases and possibly lead to the discovery of new cures. One emerging trend in the area of biomarker discovery is the use of deep learning models. Despite its perceived complexity, one relatively unexplored aspect of deep learning architectures is the flexibility they afford for modelling multimodal datasets, in terms of architectural design and construction of the loss function. In this thesis, we propose several deep learning architectures customised for the idiosyncrasies of brain imaging datasets and demonstrate how they can produce reliable biomarkers. Firstly, the challenges of high dimensionality and data scarcity in brain imaging datasets are investigated. Existing approaches alleviated the issue by removing less important nodes from the neural network architecture, including the input features. However, it is common to have sites with very small datasets (< 100) where such pruning-based approaches are no longer effective. Thus, we propose an alternative approach based on data harmonisation and semi-supervised learning that tackles this issue by incorporating more data (on the same disease, but collected from other sites) into the analysis while addressing issues such as site differences and labelling inconsistencies. We demonstrated that existing works on biomarker discovery produce biomarkers that could be biased towards the largest site, especially when data imbalance exists across sites. On the other hand, our proposed architecture provides a technique to arrive at site-invariant biomarkers, making it also possible to reveal site-specific biomarkers more clearly. Secondly, a persistent issue with existing machine learning models is their poor generalisability on neuroimaging datasets. This is often hypothesised to be caused by disease heterogeneity. Furthermore, while some sites are carefully curated such that they represent a homogeneous subgroup, many sites still contain a heterogeneous mix of patients, warranting a need for techniques that can capture such variability. Our proposed approach uses both the brain graph and population graph approach to model structural and functional brain networks. It introduces an additional fine-tuning step that adds clustering information into the construction of the population graph, based on the clusters obtained from the brain graph embeddings. This makes it possible to produce subtype-specific biomarkers. Thirdly, we explore the possibility of combining both multimodal neuroimaging data and multi-omics datasets. Structural and functional neuroimaging data provides whole-brain, macroscale information but the limited spatial resolution leaves out microscale information. These are better captured via multi-omics data, but such multimodal and multi-scale imaging genetics research has been relatively under-explored as it further exacerbates the above-mentioned problem of model overfitting. To address this, we propose a scalable deep neural network architecture based on the attention mechanism that allows a myriad of omics modalities to be incorporated into the analysis. We demonstrate how combining imaging and genetics data leads to better model performance and go further to show how relative importance of the modalities involved in the analysis can be determined. Overall, these works demonstrated how existing biomarker discovery approaches are limited to a generic, class-wide view of the disease that could be biased especially for imbalanced multi-site datasets. These problems are addressed by careful design of deep learning architectures (graph neural networks with data harmonisation, semi-supervised learning, clustering and the attention mechanism) to take care of issues unique to biomedical datasets, leading to better model performance and producing richer biomarkers in the form of site-specific, site-invariant, subtype-specific and imaging genetics biomarkers. These biomarkers pave the way towards more effective treatments for these complex neurological conditions. Doctor of Philosophy 2024-01-29T08:00:55Z 2024-01-29T08:00:55Z 2023 Thesis-Doctor of Philosophy Chan, Y. H. (2023). Encoding and decoding multimodal brain images for disease biomarker discovery. Doctoral thesis, Nanyang Technological University, Singapore. https://hdl.handle.net/10356/173338 https://hdl.handle.net/10356/173338 10.32657/10356/173338 en This work is licensed under a Creative Commons Attribution-NonCommercial 4.0 International License (CC BY-NC 4.0). application/pdf Nanyang Technological University

Encoding and decoding multimodal brain images for disease biomarker discovery

相似書籍