Self-pretraining of 3D transformer variations with masked autoencoders for multiple instances in medical image analysis

Medical image analysis is a multi-discipline field of comprehensive medical imaging, mathematical modelling, artificial intelligence and other technologies. It has key processes such as digital image processing, feature analysis, evaluation and decision making. Traditional medical image analysis met...

Full description

Saved in:

Bibliographic Details
Main Author:	Li, Linyuan
Other Authors:	Jiang Xudong
Format:	Thesis-Master by Coursework
Language:	English
Published:	Nanyang Technological University 2024
Subjects:	Engineering::Electrical and electronic engineering
Online Access:	https://hdl.handle.net/10356/173323
Tags:	Add Tag No Tags, Be the first to tag this record!
Institution:	Nanyang Technological University
Language:	English

id	sg-ntu-dr.10356-173323
record_format	dspace
spelling	sg-ntu-dr.10356-1733232024-01-26T15:42:30Z Self-pretraining of 3D transformer variations with masked autoencoders for multiple instances in medical image analysis Li, Linyuan Jiang Xudong School of Electrical and Electronic Engineering EXDJiang@ntu.edu.sg Engineering::Electrical and electronic engineering Medical image analysis is a multi-discipline field of comprehensive medical imaging, mathematical modelling, artificial intelligence and other technologies. It has key processes such as digital image processing, feature analysis, evaluation and decision making. Traditional medical image analysis methods mostly rely on manual feature engineering technology, requiring experts to spend a lot of time and energy to manually design features for specific medical tasks and combined with their prior knowledge. Such features explicitly designed for specific scenes are often not universal, and the adequacy and precision of their feature representation are also limited to a certain extent. Nowadays, the mainstream research direction of feature processing in medical image analysis has shifted from feature design to feature learning. Deep learning, including deep neural networks, has the advantage of automatically and implicitly learning features directly from medical images, so it has been gradually applied in a variety of medical image analysis tasks and has made certain achievements. In view of the complexity of medical images and the lack of simple linear features, the difficulty and scarcity of medical image annotation data acquisition, and the difficulty of feature extraction and learning of medical images that contain rich information, a Masked autoencoder pre-training learning framework is designed, and it has the following advantages: 1. Masked Autoencoder (MAE) shows its ability to be used effectively in pre-training Vision Transformers (ViT) for analysing natural images. And ViT encoder aggregating contextual information is to infer masked image regions by reconstructing full images. 2. Because there is no ImageNet-scale medical image dataset for models to pre-train, self-pretraining can earn more in scenarios where data scale is limited. My experimental results show that self-supervised pre-training based on MAE markedly improves diverse downstream tasks including MRI brain tumor segmentation, abdominal CT multi-organ segmentation, chest X-ray disease classification and breast cancer detection. Master's degree 2024-01-26T03:30:10Z 2024-01-26T03:30:10Z 2023 Thesis-Master by Coursework Li, L. (2023). Self-pretraining of 3D transformer variations with masked autoencoders for multiple instances in medical image analysis. Master's thesis, Nanyang Technological University, Singapore. https://hdl.handle.net/10356/173323 https://hdl.handle.net/10356/173323 en application/pdf Nanyang Technological University
institution	Nanyang Technological University
building	NTU Library
continent	Asia
country	Singapore Singapore
content_provider	NTU Library
collection	DR-NTU
language	English
topic	Engineering::Electrical and electronic engineering
spellingShingle	Engineering::Electrical and electronic engineering Li, Linyuan Self-pretraining of 3D transformer variations with masked autoencoders for multiple instances in medical image analysis
description	Medical image analysis is a multi-discipline field of comprehensive medical imaging, mathematical modelling, artificial intelligence and other technologies. It has key processes such as digital image processing, feature analysis, evaluation and decision making. Traditional medical image analysis methods mostly rely on manual feature engineering technology, requiring experts to spend a lot of time and energy to manually design features for specific medical tasks and combined with their prior knowledge. Such features explicitly designed for specific scenes are often not universal, and the adequacy and precision of their feature representation are also limited to a certain extent. Nowadays, the mainstream research direction of feature processing in medical image analysis has shifted from feature design to feature learning. Deep learning, including deep neural networks, has the advantage of automatically and implicitly learning features directly from medical images, so it has been gradually applied in a variety of medical image analysis tasks and has made certain achievements. In view of the complexity of medical images and the lack of simple linear features, the difficulty and scarcity of medical image annotation data acquisition, and the difficulty of feature extraction and learning of medical images that contain rich information, a Masked autoencoder pre-training learning framework is designed, and it has the following advantages: 1. Masked Autoencoder (MAE) shows its ability to be used effectively in pre-training Vision Transformers (ViT) for analysing natural images. And ViT encoder aggregating contextual information is to infer masked image regions by reconstructing full images. 2. Because there is no ImageNet-scale medical image dataset for models to pre-train, self-pretraining can earn more in scenarios where data scale is limited. My experimental results show that self-supervised pre-training based on MAE markedly improves diverse downstream tasks including MRI brain tumor segmentation, abdominal CT multi-organ segmentation, chest X-ray disease classification and breast cancer detection.
author2	Jiang Xudong
author_facet	Jiang Xudong Li, Linyuan
format	Thesis-Master by Coursework
author	Li, Linyuan
author_sort	Li, Linyuan
title	Self-pretraining of 3D transformer variations with masked autoencoders for multiple instances in medical image analysis
title_short	Self-pretraining of 3D transformer variations with masked autoencoders for multiple instances in medical image analysis
title_full	Self-pretraining of 3D transformer variations with masked autoencoders for multiple instances in medical image analysis
title_fullStr	Self-pretraining of 3D transformer variations with masked autoencoders for multiple instances in medical image analysis
title_full_unstemmed	Self-pretraining of 3D transformer variations with masked autoencoders for multiple instances in medical image analysis
title_sort	self-pretraining of 3d transformer variations with masked autoencoders for multiple instances in medical image analysis
publisher	Nanyang Technological University
publishDate	2024
url	https://hdl.handle.net/10356/173323
_version_	1789483186074419200

Self-pretraining of 3D transformer variations with masked autoencoders for multiple instances in medical image analysis

Similar Items