Self-pretraining of 3D transformer variations with masked autoencoders for multiple instances in medical image analysis

Medical image analysis is a multi-discipline field of comprehensive medical imaging, mathematical modelling, artificial intelligence and other technologies. It has key processes such as digital image processing, feature analysis, evaluation and decision making. Traditional medical image analysis met...

Full description

Saved in:
Bibliographic Details
Main Author: Li, Linyuan
Other Authors: Jiang Xudong
Format: Thesis-Master by Coursework
Language:English
Published: Nanyang Technological University 2024
Subjects:
Online Access:https://hdl.handle.net/10356/173323
Tags: Add Tag
No Tags, Be the first to tag this record!
Institution: Nanyang Technological University
Language: English
id sg-ntu-dr.10356-173323
record_format dspace
spelling sg-ntu-dr.10356-1733232024-01-26T15:42:30Z Self-pretraining of 3D transformer variations with masked autoencoders for multiple instances in medical image analysis Li, Linyuan Jiang Xudong School of Electrical and Electronic Engineering EXDJiang@ntu.edu.sg Engineering::Electrical and electronic engineering Medical image analysis is a multi-discipline field of comprehensive medical imaging, mathematical modelling, artificial intelligence and other technologies. It has key processes such as digital image processing, feature analysis, evaluation and decision making. Traditional medical image analysis methods mostly rely on manual feature engineering technology, requiring experts to spend a lot of time and energy to manually design features for specific medical tasks and combined with their prior knowledge. Such features explicitly designed for specific scenes are often not universal, and the adequacy and precision of their feature representation are also limited to a certain extent. Nowadays, the mainstream research direction of feature processing in medical image analysis has shifted from feature design to feature learning. Deep learning, including deep neural networks, has the advantage of automatically and implicitly learning features directly from medical images, so it has been gradually applied in a variety of medical image analysis tasks and has made certain achievements. In view of the complexity of medical images and the lack of simple linear features, the difficulty and scarcity of medical image annotation data acquisition, and the difficulty of feature extraction and learning of medical images that contain rich information, a Masked autoencoder pre-training learning framework is designed, and it has the following advantages: 1. Masked Autoencoder (MAE) shows its ability to be used effectively in pre-training Vision Transformers (ViT) for analysing natural images. And ViT encoder aggregating contextual information is to infer masked image regions by reconstructing full images. 2. Because there is no ImageNet-scale medical image dataset for models to pre-train, self-pretraining can earn more in scenarios where data scale is limited. My experimental results show that self-supervised pre-training based on MAE markedly improves diverse downstream tasks including MRI brain tumor segmentation, abdominal CT multi-organ segmentation, chest X-ray disease classification and breast cancer detection. Master's degree 2024-01-26T03:30:10Z 2024-01-26T03:30:10Z 2023 Thesis-Master by Coursework Li, L. (2023). Self-pretraining of 3D transformer variations with masked autoencoders for multiple instances in medical image analysis. Master's thesis, Nanyang Technological University, Singapore. https://hdl.handle.net/10356/173323 https://hdl.handle.net/10356/173323 en application/pdf Nanyang Technological University
institution Nanyang Technological University
building NTU Library
continent Asia
country Singapore
Singapore
content_provider NTU Library
collection DR-NTU
language English
topic Engineering::Electrical and electronic engineering
spellingShingle Engineering::Electrical and electronic engineering
Li, Linyuan
Self-pretraining of 3D transformer variations with masked autoencoders for multiple instances in medical image analysis
description Medical image analysis is a multi-discipline field of comprehensive medical imaging, mathematical modelling, artificial intelligence and other technologies. It has key processes such as digital image processing, feature analysis, evaluation and decision making. Traditional medical image analysis methods mostly rely on manual feature engineering technology, requiring experts to spend a lot of time and energy to manually design features for specific medical tasks and combined with their prior knowledge. Such features explicitly designed for specific scenes are often not universal, and the adequacy and precision of their feature representation are also limited to a certain extent. Nowadays, the mainstream research direction of feature processing in medical image analysis has shifted from feature design to feature learning. Deep learning, including deep neural networks, has the advantage of automatically and implicitly learning features directly from medical images, so it has been gradually applied in a variety of medical image analysis tasks and has made certain achievements. In view of the complexity of medical images and the lack of simple linear features, the difficulty and scarcity of medical image annotation data acquisition, and the difficulty of feature extraction and learning of medical images that contain rich information, a Masked autoencoder pre-training learning framework is designed, and it has the following advantages: 1. Masked Autoencoder (MAE) shows its ability to be used effectively in pre-training Vision Transformers (ViT) for analysing natural images. And ViT encoder aggregating contextual information is to infer masked image regions by reconstructing full images. 2. Because there is no ImageNet-scale medical image dataset for models to pre-train, self-pretraining can earn more in scenarios where data scale is limited. My experimental results show that self-supervised pre-training based on MAE markedly improves diverse downstream tasks including MRI brain tumor segmentation, abdominal CT multi-organ segmentation, chest X-ray disease classification and breast cancer detection.
author2 Jiang Xudong
author_facet Jiang Xudong
Li, Linyuan
format Thesis-Master by Coursework
author Li, Linyuan
author_sort Li, Linyuan
title Self-pretraining of 3D transformer variations with masked autoencoders for multiple instances in medical image analysis
title_short Self-pretraining of 3D transformer variations with masked autoencoders for multiple instances in medical image analysis
title_full Self-pretraining of 3D transformer variations with masked autoencoders for multiple instances in medical image analysis
title_fullStr Self-pretraining of 3D transformer variations with masked autoencoders for multiple instances in medical image analysis
title_full_unstemmed Self-pretraining of 3D transformer variations with masked autoencoders for multiple instances in medical image analysis
title_sort self-pretraining of 3d transformer variations with masked autoencoders for multiple instances in medical image analysis
publisher Nanyang Technological University
publishDate 2024
url https://hdl.handle.net/10356/173323
_version_ 1789483186074419200