Encoding and decoding multimodal brain images for disease biomarker discovery

Many neurodegenerative diseases (such as Alzheimer’s disease) and neuropsychiatric disorders (such as schizophrenia) are still poorly understood. While incorporating multimodal datasets (structural and functional) would provide a more complete picture of these neurological conditions, existing studi...

Full description

Saved in:
Bibliographic Details
Main Author: Chan, Yi Hao
Other Authors: Jagath C Rajapakse
Format: Thesis-Doctor of Philosophy
Language:English
Published: Nanyang Technological University 2024
Subjects:
Online Access:https://hdl.handle.net/10356/173338
Tags: Add Tag
No Tags, Be the first to tag this record!
Institution: Nanyang Technological University
Language: English
id sg-ntu-dr.10356-173338
record_format dspace
institution Nanyang Technological University
building NTU Library
continent Asia
country Singapore
Singapore
content_provider NTU Library
collection DR-NTU
language English
topic Engineering::Computer science and engineering::Computing methodologies::Artificial intelligence
Engineering::Computer science and engineering::Computer applications::Life and medical sciences
spellingShingle Engineering::Computer science and engineering::Computing methodologies::Artificial intelligence
Engineering::Computer science and engineering::Computer applications::Life and medical sciences
Chan, Yi Hao
Encoding and decoding multimodal brain images for disease biomarker discovery
description Many neurodegenerative diseases (such as Alzheimer’s disease) and neuropsychiatric disorders (such as schizophrenia) are still poorly understood. While incorporating multimodal datasets (structural and functional) would provide a more complete picture of these neurological conditions, existing studies already struggle with two key challenges even when a single data modality is used: data scarcity and high dimensionality. These problems make machine learning models overfit easily on neuroimaging datasets and result in the generation of spurious biomarkers. Coupled with issues such as inter-scanner variability and disease heterogeneity, biomarker discovery remains an onerous task as reproducible biomarkers for these neurological conditions have proven to be elusive. Overcoming these obstacles would give us a better understanding of these diseases and possibly lead to the discovery of new cures. One emerging trend in the area of biomarker discovery is the use of deep learning models. Despite its perceived complexity, one relatively unexplored aspect of deep learning architectures is the flexibility they afford for modelling multimodal datasets, in terms of architectural design and construction of the loss function. In this thesis, we propose several deep learning architectures customised for the idiosyncrasies of brain imaging datasets and demonstrate how they can produce reliable biomarkers. Firstly, the challenges of high dimensionality and data scarcity in brain imaging datasets are investigated. Existing approaches alleviated the issue by removing less important nodes from the neural network architecture, including the input features. However, it is common to have sites with very small datasets (< 100) where such pruning-based approaches are no longer effective. Thus, we propose an alternative approach based on data harmonisation and semi-supervised learning that tackles this issue by incorporating more data (on the same disease, but collected from other sites) into the analysis while addressing issues such as site differences and labelling inconsistencies. We demonstrated that existing works on biomarker discovery produce biomarkers that could be biased towards the largest site, especially when data imbalance exists across sites. On the other hand, our proposed architecture provides a technique to arrive at site-invariant biomarkers, making it also possible to reveal site-specific biomarkers more clearly. Secondly, a persistent issue with existing machine learning models is their poor generalisability on neuroimaging datasets. This is often hypothesised to be caused by disease heterogeneity. Furthermore, while some sites are carefully curated such that they represent a homogeneous subgroup, many sites still contain a heterogeneous mix of patients, warranting a need for techniques that can capture such variability. Our proposed approach uses both the brain graph and population graph approach to model structural and functional brain networks. It introduces an additional fine-tuning step that adds clustering information into the construction of the population graph, based on the clusters obtained from the brain graph embeddings. This makes it possible to produce subtype-specific biomarkers. Thirdly, we explore the possibility of combining both multimodal neuroimaging data and multi-omics datasets. Structural and functional neuroimaging data provides whole-brain, macroscale information but the limited spatial resolution leaves out microscale information. These are better captured via multi-omics data, but such multimodal and multi-scale imaging genetics research has been relatively under-explored as it further exacerbates the above-mentioned problem of model overfitting. To address this, we propose a scalable deep neural network architecture based on the attention mechanism that allows a myriad of omics modalities to be incorporated into the analysis. We demonstrate how combining imaging and genetics data leads to better model performance and go further to show how relative importance of the modalities involved in the analysis can be determined. Overall, these works demonstrated how existing biomarker discovery approaches are limited to a generic, class-wide view of the disease that could be biased especially for imbalanced multi-site datasets. These problems are addressed by careful design of deep learning architectures (graph neural networks with data harmonisation, semi-supervised learning, clustering and the attention mechanism) to take care of issues unique to biomedical datasets, leading to better model performance and producing richer biomarkers in the form of site-specific, site-invariant, subtype-specific and imaging genetics biomarkers. These biomarkers pave the way towards more effective treatments for these complex neurological conditions.
author2 Jagath C Rajapakse
author_facet Jagath C Rajapakse
Chan, Yi Hao
format Thesis-Doctor of Philosophy
author Chan, Yi Hao
author_sort Chan, Yi Hao
title Encoding and decoding multimodal brain images for disease biomarker discovery
title_short Encoding and decoding multimodal brain images for disease biomarker discovery
title_full Encoding and decoding multimodal brain images for disease biomarker discovery
title_fullStr Encoding and decoding multimodal brain images for disease biomarker discovery
title_full_unstemmed Encoding and decoding multimodal brain images for disease biomarker discovery
title_sort encoding and decoding multimodal brain images for disease biomarker discovery
publisher Nanyang Technological University
publishDate 2024
url https://hdl.handle.net/10356/173338
_version_ 1789968696527028224
spelling sg-ntu-dr.10356-1733382024-02-02T15:37:31Z Encoding and decoding multimodal brain images for disease biomarker discovery Chan, Yi Hao Jagath C Rajapakse School of Computer Science and Engineering Biomedical Informatics Lab ASJagath@ntu.edu.sg Engineering::Computer science and engineering::Computing methodologies::Artificial intelligence Engineering::Computer science and engineering::Computer applications::Life and medical sciences Many neurodegenerative diseases (such as Alzheimer’s disease) and neuropsychiatric disorders (such as schizophrenia) are still poorly understood. While incorporating multimodal datasets (structural and functional) would provide a more complete picture of these neurological conditions, existing studies already struggle with two key challenges even when a single data modality is used: data scarcity and high dimensionality. These problems make machine learning models overfit easily on neuroimaging datasets and result in the generation of spurious biomarkers. Coupled with issues such as inter-scanner variability and disease heterogeneity, biomarker discovery remains an onerous task as reproducible biomarkers for these neurological conditions have proven to be elusive. Overcoming these obstacles would give us a better understanding of these diseases and possibly lead to the discovery of new cures. One emerging trend in the area of biomarker discovery is the use of deep learning models. Despite its perceived complexity, one relatively unexplored aspect of deep learning architectures is the flexibility they afford for modelling multimodal datasets, in terms of architectural design and construction of the loss function. In this thesis, we propose several deep learning architectures customised for the idiosyncrasies of brain imaging datasets and demonstrate how they can produce reliable biomarkers. Firstly, the challenges of high dimensionality and data scarcity in brain imaging datasets are investigated. Existing approaches alleviated the issue by removing less important nodes from the neural network architecture, including the input features. However, it is common to have sites with very small datasets (< 100) where such pruning-based approaches are no longer effective. Thus, we propose an alternative approach based on data harmonisation and semi-supervised learning that tackles this issue by incorporating more data (on the same disease, but collected from other sites) into the analysis while addressing issues such as site differences and labelling inconsistencies. We demonstrated that existing works on biomarker discovery produce biomarkers that could be biased towards the largest site, especially when data imbalance exists across sites. On the other hand, our proposed architecture provides a technique to arrive at site-invariant biomarkers, making it also possible to reveal site-specific biomarkers more clearly. Secondly, a persistent issue with existing machine learning models is their poor generalisability on neuroimaging datasets. This is often hypothesised to be caused by disease heterogeneity. Furthermore, while some sites are carefully curated such that they represent a homogeneous subgroup, many sites still contain a heterogeneous mix of patients, warranting a need for techniques that can capture such variability. Our proposed approach uses both the brain graph and population graph approach to model structural and functional brain networks. It introduces an additional fine-tuning step that adds clustering information into the construction of the population graph, based on the clusters obtained from the brain graph embeddings. This makes it possible to produce subtype-specific biomarkers. Thirdly, we explore the possibility of combining both multimodal neuroimaging data and multi-omics datasets. Structural and functional neuroimaging data provides whole-brain, macroscale information but the limited spatial resolution leaves out microscale information. These are better captured via multi-omics data, but such multimodal and multi-scale imaging genetics research has been relatively under-explored as it further exacerbates the above-mentioned problem of model overfitting. To address this, we propose a scalable deep neural network architecture based on the attention mechanism that allows a myriad of omics modalities to be incorporated into the analysis. We demonstrate how combining imaging and genetics data leads to better model performance and go further to show how relative importance of the modalities involved in the analysis can be determined. Overall, these works demonstrated how existing biomarker discovery approaches are limited to a generic, class-wide view of the disease that could be biased especially for imbalanced multi-site datasets. These problems are addressed by careful design of deep learning architectures (graph neural networks with data harmonisation, semi-supervised learning, clustering and the attention mechanism) to take care of issues unique to biomedical datasets, leading to better model performance and producing richer biomarkers in the form of site-specific, site-invariant, subtype-specific and imaging genetics biomarkers. These biomarkers pave the way towards more effective treatments for these complex neurological conditions. Doctor of Philosophy 2024-01-29T08:00:55Z 2024-01-29T08:00:55Z 2023 Thesis-Doctor of Philosophy Chan, Y. H. (2023). Encoding and decoding multimodal brain images for disease biomarker discovery. Doctoral thesis, Nanyang Technological University, Singapore. https://hdl.handle.net/10356/173338 https://hdl.handle.net/10356/173338 10.32657/10356/173338 en This work is licensed under a Creative Commons Attribution-NonCommercial 4.0 International License (CC BY-NC 4.0). application/pdf Nanyang Technological University