Self-supervised learning for early detection of neurodegenerative diseases with small data
Neurodegenerative diseases are one of the leading causes of disability in the world. They are chronic diseases where patients experience irreversible depletion of neurons in the brain. The most common neurodegenerative diseases are Alzheimer’s disease (AD) and Parkinson’s disease (PD), where patient...
Saved in:
Main Author: | |
---|---|
Other Authors: | |
Format: | Thesis-Doctor of Philosophy |
Language: | English |
Published: |
Nanyang Technological University
2023
|
Subjects: | |
Online Access: | https://hdl.handle.net/10356/166402 |
Tags: |
Add Tag
No Tags, Be the first to tag this record!
|
Institution: | Nanyang Technological University |
Language: | English |
id |
sg-ntu-dr.10356-166402 |
---|---|
record_format |
dspace |
institution |
Nanyang Technological University |
building |
NTU Library |
continent |
Asia |
country |
Singapore Singapore |
content_provider |
NTU Library |
collection |
DR-NTU |
language |
English |
topic |
Engineering::Computer science and engineering |
spellingShingle |
Engineering::Computer science and engineering Jiang, Hongchao Self-supervised learning for early detection of neurodegenerative diseases with small data |
description |
Neurodegenerative diseases are one of the leading causes of disability in the world. They are chronic diseases where patients experience irreversible depletion of neurons in the brain. The most common neurodegenerative diseases are Alzheimer’s disease (AD) and Parkinson’s disease (PD), where patients suffer from cognitive and motor deficiencies respectively. There is currently no cure available, and the condition progressively deteriorates affecting activities of daily living. Therefore, early intervention is crucial to alleviating symptoms and improving quality of life. Automated diagnosis tools have emerged as a viable option to diagnose patients more frequently and objectively. However, data acquisition is particularly challenging due to reasons like the rareness of disease or privacy legislation. Training a model from scratch on small data is prone to overfitting on trivial patterns.
Pre-training is a promising approach where the model learns general knowledge from readily available data before fine-tuning on the small data to learn task-specific knowledge. Supervised pre-training requires the data to be labeled, which is costly when expert annotators (e.g., clinicians) are involved. Self-supervised pre-training on the other hand uses unlabeled data. In this thesis, we explore how pre-training can be used to improve the early detection of neurodegenerative diseases in both clinical and non-clinical settings.
Early detection in a non-clinical setting involves a variety of biomarkers collected from wearables or mobile devices. A large number of applications (e.g., mobile-based assessments, serious games) have been developed to encourage more frequent testing. Maximizing the utility of these applications requires their prompt deployment, but collecting sufficient data to train the underlying models is often challenging. We propose an approach that does not involve any data collection. We developed a mobile-based clock drawing test that enables patients to perform the clinical cognitive assessment task at home in an automated manner. Specifically, synthetic clock drawings are generated to pre-train an object detection model for detecting hand-drawn clock components. A rule-based classifier is then used to score the quality of the clock drawn based on clinical assessment criteria.
We also explore non-clinical applications for PD, such as a mobile-based gait assessment application. Instead of using a rule-based approach, we propose a data-driven general framework that is more scalable. We leverage the fact that assessment tasks are designed to amplify differences from healthy subjects. Therefore, we can formulate the problem as an anomaly detection task and model how healthy a data sample is. We assume that we can collect large amounts of data from healthy subjects, which is not difficult to accomplish due to the pervasiveness of smartphones and crowdsourcing techniques. We first pre-train the model on data samples from healthy subjects using self-supervised learning to obtain a good feature extractor. For the fine-tuning stage, we pull the pre-trained features close together in latent space to extract common patterns representative of healthiness. Our approach is able to discriminate between healthy and PD samples despite not seeing any PD samples during training.
In a clinical setting, biomarkers are less varied (e.g., brain scans, neuropsychological tests). Both labeled and unlabeled data are available for pre-training or fine-tuning as hospitals typically keep some form of clinical records. However, acquiring more data is not as easy as in the non-clinical setting. For example, acquiring large amounts of MRI scans is cost-prohibitive. Therefore, the challenge is to maximize performance with only a small pre-training dataset. We study this problem in the context of differentiating 3D MRI scans from mild cognitive impairment and prodromal AD subjects. We propose a hybrid approach of self-supervised pre-training followed by multitask learning to effectively make use of labeled and unlabeled MRI scans from the AD spectrum.
Recent works have shown that self-supervised pre-training mainly learns low-level features and struggles to learn high-level features, which are crucial for identifying anatomic atrophy patterns in MRI data. To address this limitation, we propose an Anatomy-Aware Gating Network (AAGN) that directly encodes the knowledge of brain anatomy as a form of inductive bias in the model. AAGN outperforms self-supervised learning methods when trained from scratch on small data. |
author2 |
Miao Chun Yan |
author_facet |
Miao Chun Yan Jiang, Hongchao |
format |
Thesis-Doctor of Philosophy |
author |
Jiang, Hongchao |
author_sort |
Jiang, Hongchao |
title |
Self-supervised learning for early detection of neurodegenerative diseases with small data |
title_short |
Self-supervised learning for early detection of neurodegenerative diseases with small data |
title_full |
Self-supervised learning for early detection of neurodegenerative diseases with small data |
title_fullStr |
Self-supervised learning for early detection of neurodegenerative diseases with small data |
title_full_unstemmed |
Self-supervised learning for early detection of neurodegenerative diseases with small data |
title_sort |
self-supervised learning for early detection of neurodegenerative diseases with small data |
publisher |
Nanyang Technological University |
publishDate |
2023 |
url |
https://hdl.handle.net/10356/166402 |
_version_ |
1765213863005913088 |
spelling |
sg-ntu-dr.10356-1664022023-05-02T06:33:01Z Self-supervised learning for early detection of neurodegenerative diseases with small data Jiang, Hongchao Miao Chun Yan Interdisciplinary Graduate School (IGS) Alibaba-NTU Singapore Joint Research Institute (JRI) ASCYMiao@ntu.edu.sg Engineering::Computer science and engineering Neurodegenerative diseases are one of the leading causes of disability in the world. They are chronic diseases where patients experience irreversible depletion of neurons in the brain. The most common neurodegenerative diseases are Alzheimer’s disease (AD) and Parkinson’s disease (PD), where patients suffer from cognitive and motor deficiencies respectively. There is currently no cure available, and the condition progressively deteriorates affecting activities of daily living. Therefore, early intervention is crucial to alleviating symptoms and improving quality of life. Automated diagnosis tools have emerged as a viable option to diagnose patients more frequently and objectively. However, data acquisition is particularly challenging due to reasons like the rareness of disease or privacy legislation. Training a model from scratch on small data is prone to overfitting on trivial patterns. Pre-training is a promising approach where the model learns general knowledge from readily available data before fine-tuning on the small data to learn task-specific knowledge. Supervised pre-training requires the data to be labeled, which is costly when expert annotators (e.g., clinicians) are involved. Self-supervised pre-training on the other hand uses unlabeled data. In this thesis, we explore how pre-training can be used to improve the early detection of neurodegenerative diseases in both clinical and non-clinical settings. Early detection in a non-clinical setting involves a variety of biomarkers collected from wearables or mobile devices. A large number of applications (e.g., mobile-based assessments, serious games) have been developed to encourage more frequent testing. Maximizing the utility of these applications requires their prompt deployment, but collecting sufficient data to train the underlying models is often challenging. We propose an approach that does not involve any data collection. We developed a mobile-based clock drawing test that enables patients to perform the clinical cognitive assessment task at home in an automated manner. Specifically, synthetic clock drawings are generated to pre-train an object detection model for detecting hand-drawn clock components. A rule-based classifier is then used to score the quality of the clock drawn based on clinical assessment criteria. We also explore non-clinical applications for PD, such as a mobile-based gait assessment application. Instead of using a rule-based approach, we propose a data-driven general framework that is more scalable. We leverage the fact that assessment tasks are designed to amplify differences from healthy subjects. Therefore, we can formulate the problem as an anomaly detection task and model how healthy a data sample is. We assume that we can collect large amounts of data from healthy subjects, which is not difficult to accomplish due to the pervasiveness of smartphones and crowdsourcing techniques. We first pre-train the model on data samples from healthy subjects using self-supervised learning to obtain a good feature extractor. For the fine-tuning stage, we pull the pre-trained features close together in latent space to extract common patterns representative of healthiness. Our approach is able to discriminate between healthy and PD samples despite not seeing any PD samples during training. In a clinical setting, biomarkers are less varied (e.g., brain scans, neuropsychological tests). Both labeled and unlabeled data are available for pre-training or fine-tuning as hospitals typically keep some form of clinical records. However, acquiring more data is not as easy as in the non-clinical setting. For example, acquiring large amounts of MRI scans is cost-prohibitive. Therefore, the challenge is to maximize performance with only a small pre-training dataset. We study this problem in the context of differentiating 3D MRI scans from mild cognitive impairment and prodromal AD subjects. We propose a hybrid approach of self-supervised pre-training followed by multitask learning to effectively make use of labeled and unlabeled MRI scans from the AD spectrum. Recent works have shown that self-supervised pre-training mainly learns low-level features and struggles to learn high-level features, which are crucial for identifying anatomic atrophy patterns in MRI data. To address this limitation, we propose an Anatomy-Aware Gating Network (AAGN) that directly encodes the knowledge of brain anatomy as a form of inductive bias in the model. AAGN outperforms self-supervised learning methods when trained from scratch on small data. Doctor of Philosophy 2023-04-27T06:13:55Z 2023-04-27T06:13:55Z 2023 Thesis-Doctor of Philosophy Jiang, H. (2023). Self-supervised learning for early detection of neurodegenerative diseases with small data. Doctoral thesis, Nanyang Technological University, Singapore. https://hdl.handle.net/10356/166402 https://hdl.handle.net/10356/166402 10.32657/10356/166402 en This work is licensed under a Creative Commons Attribution-NonCommercial 4.0 International License (CC BY-NC 4.0). application/pdf Nanyang Technological University |