Self-pretraining of 3D transformer variations with masked autoencoders for multiple instances in medical image analysis

Medical image analysis is a multi-discipline field of comprehensive medical imaging, mathematical modelling, artificial intelligence and other technologies. It has key processes such as digital image processing, feature analysis, evaluation and decision making. Traditional medical image analysis met...

Full description

Saved in:
Bibliographic Details
Main Author: Li, Linyuan
Other Authors: Jiang Xudong
Format: Thesis-Master by Coursework
Language:English
Published: Nanyang Technological University 2024
Subjects:
Online Access:https://hdl.handle.net/10356/173323
Tags: Add Tag
No Tags, Be the first to tag this record!
Institution: Nanyang Technological University
Language: English
Description
Summary:Medical image analysis is a multi-discipline field of comprehensive medical imaging, mathematical modelling, artificial intelligence and other technologies. It has key processes such as digital image processing, feature analysis, evaluation and decision making. Traditional medical image analysis methods mostly rely on manual feature engineering technology, requiring experts to spend a lot of time and energy to manually design features for specific medical tasks and combined with their prior knowledge. Such features explicitly designed for specific scenes are often not universal, and the adequacy and precision of their feature representation are also limited to a certain extent. Nowadays, the mainstream research direction of feature processing in medical image analysis has shifted from feature design to feature learning. Deep learning, including deep neural networks, has the advantage of automatically and implicitly learning features directly from medical images, so it has been gradually applied in a variety of medical image analysis tasks and has made certain achievements. In view of the complexity of medical images and the lack of simple linear features, the difficulty and scarcity of medical image annotation data acquisition, and the difficulty of feature extraction and learning of medical images that contain rich information, a Masked autoencoder pre-training learning framework is designed, and it has the following advantages: 1. Masked Autoencoder (MAE) shows its ability to be used effectively in pre-training Vision Transformers (ViT) for analysing natural images. And ViT encoder aggregating contextual information is to infer masked image regions by reconstructing full images. 2. Because there is no ImageNet-scale medical image dataset for models to pre-train, self-pretraining can earn more in scenarios where data scale is limited. My experimental results show that self-supervised pre-training based on MAE markedly improves diverse downstream tasks including MRI brain tumor segmentation, abdominal CT multi-organ segmentation, chest X-ray disease classification and breast cancer detection.