Semi-supervised learning of functional connectome for disease classification

Overfitting is a common problem when computational models are applied on neuroimaging datasets, which are high-dimensional and small in terms of sample sizes, resulting in poor inferences such as ungeneralizable biomarkers. One way to overcome this is to pool datasets of similar diseases that are co...

全面介紹

Saved in:
書目詳細資料
主要作者: Yew, Wei Chee
其他作者: Jagath C Rajapakse
格式: Final Year Project
語言:English
出版: Nanyang Technological University 2022
主題:
在線閱讀:https://hdl.handle.net/10356/156535
標簽: 添加標簽
沒有標簽, 成為第一個標記此記錄!
機構: Nanyang Technological University
語言: English
實物特徵
總結:Overfitting is a common problem when computational models are applied on neuroimaging datasets, which are high-dimensional and small in terms of sample sizes, resulting in poor inferences such as ungeneralizable biomarkers. One way to overcome this is to pool datasets of similar diseases that are collected from other sites to augment the small dataset. However, such efforts may introduce undesirable variations due to site effects and inconsistent labeling. To mitigate these issues, two encoder-decoder-classifier architectures were proposed to carry out semi-supervised learning (SSL). Using novel multi-objective joint loss functions, end-to-end training could be applied on these architectures. The use of SSL led to a consistent increment in the model accuracy for the task of classifying between healthy subjects and patients with diseases including autism spectrum disorders (ASD) (2.5% accuracy increment in average) and attention-deficit hyperactivity disorder (ADHD) (1.0% accuracy increment in average). In addition, performing data harmonization simultaneously with SSL led to even greater improvements (+5.5% for ASD and +3.3% for ADHD in average). Biomarkers generated from the proposed method could potentially represent site-invariant biomarkers as they were shown to place more emphasis on a subset of previously discovered site-specific biomarkers. This could provide deeper insights in differentiating between site-specific and site-invariant biomarkers. The findings in this report emphasize the importance of taking both site effects and labeling inconsistencies into account when gathering datasets from multiple sites to overcome neuroimaging data paucity. In light of the increasing reliance on retrospectively aggregated open-source datasets in neuroimaging research, our architectures provide solutions to handle site effects and data paucity.