Encoding and decoding multimodal brain images for disease biomarker discovery
Many neurodegenerative diseases (such as Alzheimer’s disease) and neuropsychiatric disorders (such as schizophrenia) are still poorly understood. While incorporating multimodal datasets (structural and functional) would provide a more complete picture of these neurological conditions, existing studi...
Saved in:
Main Author: | |
---|---|
Other Authors: | |
Format: | Thesis-Doctor of Philosophy |
Language: | English |
Published: |
Nanyang Technological University
2024
|
Subjects: | |
Online Access: | https://hdl.handle.net/10356/173338 |
Tags: |
Add Tag
No Tags, Be the first to tag this record!
|
Institution: | Nanyang Technological University |
Language: | English |
id |
sg-ntu-dr.10356-173338 |
---|---|
record_format |
dspace |
institution |
Nanyang Technological University |
building |
NTU Library |
continent |
Asia |
country |
Singapore Singapore |
content_provider |
NTU Library |
collection |
DR-NTU |
language |
English |
topic |
Engineering::Computer science and engineering::Computing methodologies::Artificial intelligence Engineering::Computer science and engineering::Computer applications::Life and medical sciences |
spellingShingle |
Engineering::Computer science and engineering::Computing methodologies::Artificial intelligence Engineering::Computer science and engineering::Computer applications::Life and medical sciences Chan, Yi Hao Encoding and decoding multimodal brain images for disease biomarker discovery |
description |
Many neurodegenerative diseases (such as Alzheimer’s disease) and neuropsychiatric disorders (such as schizophrenia) are still poorly understood. While incorporating multimodal datasets (structural and functional) would provide a more complete picture of these neurological conditions, existing studies already struggle with two key challenges even when a single data modality is used: data scarcity and high dimensionality. These problems make machine learning models overfit easily on neuroimaging datasets and result in the generation of spurious biomarkers. Coupled with issues such as inter-scanner variability and disease heterogeneity, biomarker discovery remains an onerous task as reproducible biomarkers for these neurological conditions have proven to be elusive. Overcoming these obstacles would give us a better understanding of these diseases and possibly lead to the discovery of new cures. One emerging trend in the area of biomarker discovery is the use of deep learning models. Despite its perceived complexity, one relatively unexplored aspect of deep learning architectures is the flexibility they afford for modelling multimodal datasets, in terms of architectural design and construction of the loss function. In this thesis, we propose several deep learning architectures customised for the idiosyncrasies of brain imaging datasets and demonstrate how they can produce reliable biomarkers.
Firstly, the challenges of high dimensionality and data scarcity in brain imaging datasets are investigated. Existing approaches alleviated the issue by removing less important nodes from the neural network architecture, including the input features. However, it is common to have sites with very small datasets (< 100) where such pruning-based approaches are no longer effective. Thus, we propose an alternative approach based on data harmonisation and semi-supervised learning that tackles this issue by incorporating more data (on the same disease, but collected from other sites) into the analysis while addressing issues such as site differences and labelling inconsistencies. We demonstrated that existing works on biomarker discovery produce biomarkers that could be biased towards the largest site, especially when data imbalance exists across sites. On the other hand, our proposed architecture provides a technique to arrive at site-invariant biomarkers, making it also possible to reveal site-specific biomarkers more clearly.
Secondly, a persistent issue with existing machine learning models is their poor generalisability on neuroimaging datasets. This is often hypothesised to be caused by disease heterogeneity. Furthermore, while some sites are carefully curated such that they represent a homogeneous subgroup, many sites still contain a heterogeneous mix of patients, warranting a need for techniques that can capture such variability. Our proposed approach uses both the brain graph and population graph approach to model structural and functional brain networks. It introduces an additional fine-tuning step that adds clustering information into the construction of the population graph, based on the clusters obtained from the brain graph embeddings. This makes it possible to produce subtype-specific biomarkers.
Thirdly, we explore the possibility of combining both multimodal neuroimaging data and multi-omics datasets. Structural and functional neuroimaging data provides whole-brain, macroscale information but the limited spatial resolution leaves out microscale information. These are better captured via multi-omics data, but such multimodal and multi-scale imaging genetics research has been relatively under-explored as it further exacerbates the above-mentioned problem of model overfitting. To address this, we propose a scalable deep neural network architecture based on the attention mechanism that allows a myriad of omics modalities to be incorporated into the analysis. We demonstrate how combining imaging and genetics data leads to better model performance and go further to show how relative importance of the modalities involved in the analysis can be determined.
Overall, these works demonstrated how existing biomarker discovery approaches are limited to a generic, class-wide view of the disease that could be biased especially for imbalanced multi-site datasets. These problems are addressed by careful design of deep learning architectures (graph neural networks with data harmonisation, semi-supervised learning, clustering and the attention mechanism) to take care of issues unique to biomedical datasets, leading to better model performance and producing richer biomarkers in the form of site-specific, site-invariant, subtype-specific and imaging genetics biomarkers. These biomarkers pave the way towards more effective treatments for these complex neurological conditions. |
author2 |
Jagath C Rajapakse |
author_facet |
Jagath C Rajapakse Chan, Yi Hao |
format |
Thesis-Doctor of Philosophy |
author |
Chan, Yi Hao |
author_sort |
Chan, Yi Hao |
title |
Encoding and decoding multimodal brain images for disease biomarker discovery |
title_short |
Encoding and decoding multimodal brain images for disease biomarker discovery |
title_full |
Encoding and decoding multimodal brain images for disease biomarker discovery |
title_fullStr |
Encoding and decoding multimodal brain images for disease biomarker discovery |
title_full_unstemmed |
Encoding and decoding multimodal brain images for disease biomarker discovery |
title_sort |
encoding and decoding multimodal brain images for disease biomarker discovery |
publisher |
Nanyang Technological University |
publishDate |
2024 |
url |
https://hdl.handle.net/10356/173338 |
_version_ |
1789968696527028224 |
spelling |
sg-ntu-dr.10356-1733382024-02-02T15:37:31Z Encoding and decoding multimodal brain images for disease biomarker discovery Chan, Yi Hao Jagath C Rajapakse School of Computer Science and Engineering Biomedical Informatics Lab ASJagath@ntu.edu.sg Engineering::Computer science and engineering::Computing methodologies::Artificial intelligence Engineering::Computer science and engineering::Computer applications::Life and medical sciences Many neurodegenerative diseases (such as Alzheimer’s disease) and neuropsychiatric disorders (such as schizophrenia) are still poorly understood. While incorporating multimodal datasets (structural and functional) would provide a more complete picture of these neurological conditions, existing studies already struggle with two key challenges even when a single data modality is used: data scarcity and high dimensionality. These problems make machine learning models overfit easily on neuroimaging datasets and result in the generation of spurious biomarkers. Coupled with issues such as inter-scanner variability and disease heterogeneity, biomarker discovery remains an onerous task as reproducible biomarkers for these neurological conditions have proven to be elusive. Overcoming these obstacles would give us a better understanding of these diseases and possibly lead to the discovery of new cures. One emerging trend in the area of biomarker discovery is the use of deep learning models. Despite its perceived complexity, one relatively unexplored aspect of deep learning architectures is the flexibility they afford for modelling multimodal datasets, in terms of architectural design and construction of the loss function. In this thesis, we propose several deep learning architectures customised for the idiosyncrasies of brain imaging datasets and demonstrate how they can produce reliable biomarkers. Firstly, the challenges of high dimensionality and data scarcity in brain imaging datasets are investigated. Existing approaches alleviated the issue by removing less important nodes from the neural network architecture, including the input features. However, it is common to have sites with very small datasets (< 100) where such pruning-based approaches are no longer effective. Thus, we propose an alternative approach based on data harmonisation and semi-supervised learning that tackles this issue by incorporating more data (on the same disease, but collected from other sites) into the analysis while addressing issues such as site differences and labelling inconsistencies. We demonstrated that existing works on biomarker discovery produce biomarkers that could be biased towards the largest site, especially when data imbalance exists across sites. On the other hand, our proposed architecture provides a technique to arrive at site-invariant biomarkers, making it also possible to reveal site-specific biomarkers more clearly. Secondly, a persistent issue with existing machine learning models is their poor generalisability on neuroimaging datasets. This is often hypothesised to be caused by disease heterogeneity. Furthermore, while some sites are carefully curated such that they represent a homogeneous subgroup, many sites still contain a heterogeneous mix of patients, warranting a need for techniques that can capture such variability. Our proposed approach uses both the brain graph and population graph approach to model structural and functional brain networks. It introduces an additional fine-tuning step that adds clustering information into the construction of the population graph, based on the clusters obtained from the brain graph embeddings. This makes it possible to produce subtype-specific biomarkers. Thirdly, we explore the possibility of combining both multimodal neuroimaging data and multi-omics datasets. Structural and functional neuroimaging data provides whole-brain, macroscale information but the limited spatial resolution leaves out microscale information. These are better captured via multi-omics data, but such multimodal and multi-scale imaging genetics research has been relatively under-explored as it further exacerbates the above-mentioned problem of model overfitting. To address this, we propose a scalable deep neural network architecture based on the attention mechanism that allows a myriad of omics modalities to be incorporated into the analysis. We demonstrate how combining imaging and genetics data leads to better model performance and go further to show how relative importance of the modalities involved in the analysis can be determined. Overall, these works demonstrated how existing biomarker discovery approaches are limited to a generic, class-wide view of the disease that could be biased especially for imbalanced multi-site datasets. These problems are addressed by careful design of deep learning architectures (graph neural networks with data harmonisation, semi-supervised learning, clustering and the attention mechanism) to take care of issues unique to biomedical datasets, leading to better model performance and producing richer biomarkers in the form of site-specific, site-invariant, subtype-specific and imaging genetics biomarkers. These biomarkers pave the way towards more effective treatments for these complex neurological conditions. Doctor of Philosophy 2024-01-29T08:00:55Z 2024-01-29T08:00:55Z 2023 Thesis-Doctor of Philosophy Chan, Y. H. (2023). Encoding and decoding multimodal brain images for disease biomarker discovery. Doctoral thesis, Nanyang Technological University, Singapore. https://hdl.handle.net/10356/173338 https://hdl.handle.net/10356/173338 10.32657/10356/173338 en This work is licensed under a Creative Commons Attribution-NonCommercial 4.0 International License (CC BY-NC 4.0). application/pdf Nanyang Technological University |