Vision transformer as image fusion model

Vision transformers show the state-of-art performance in vision tasks, the self attention block works not only limited to NLP tasks but also perform well in process images. In this report, I investigated whether this performance can be further extended into more detailed tasks on images by combining...

Full description

Saved in:

Bibliographic Details
Main Author:	Zhao, Fengye
Other Authors:	Zinovi Rabinovich
Format:	Final Year Project
Language:	English
Published:	Nanyang Technological University 2023
Subjects:	Engineering::Computer science and engineering::Computing methodologies::Artificial intelligence
Online Access:	https://hdl.handle.net/10356/166048
Tags:	Add Tag No Tags, Be the first to tag this record!
Institution:	Nanyang Technological University
Language:	English

id	sg-ntu-dr.10356-166048
record_format	dspace
spelling	sg-ntu-dr.10356-1660482023-04-21T15:39:44Z Vision transformer as image fusion model Zhao, Fengye Zinovi Rabinovich School of Computer Science and Engineering zfy0120@gmail.com, zinovi@ntu.edu.sg Engineering::Computer science and engineering::Computing methodologies::Artificial intelligence Vision transformers show the state-of-art performance in vision tasks, the self attention block works not only limited to NLP tasks but also perform well in process images. In this report, I investigated whether this performance can be further extended into more detailed tasks on images by combining it with a VAE decoder. I observe that the output from the Vit encoder is able to be reconstructed by the VAE decoder, and with controlling the input patches variability, the model is able to perform image fusion tasks. In addition, it also has the potential to solve other high complexity image processing tasks. Bachelor of Engineering (Computer Science) 2023-04-20T06:06:22Z 2023-04-20T06:06:22Z 2023 Final Year Project (FYP) Zhao, F. (2023). Vision transformer as image fusion model. Final Year Project (FYP), Nanyang Technological University, Singapore. https://hdl.handle.net/10356/166048 https://hdl.handle.net/10356/166048 en application/pdf Nanyang Technological University
institution	Nanyang Technological University
building	NTU Library
continent	Asia
country	Singapore Singapore
content_provider	NTU Library
collection	DR-NTU
language	English
topic	Engineering::Computer science and engineering::Computing methodologies::Artificial intelligence
spellingShingle	Engineering::Computer science and engineering::Computing methodologies::Artificial intelligence Zhao, Fengye Vision transformer as image fusion model
description	Vision transformers show the state-of-art performance in vision tasks, the self attention block works not only limited to NLP tasks but also perform well in process images. In this report, I investigated whether this performance can be further extended into more detailed tasks on images by combining it with a VAE decoder. I observe that the output from the Vit encoder is able to be reconstructed by the VAE decoder, and with controlling the input patches variability, the model is able to perform image fusion tasks. In addition, it also has the potential to solve other high complexity image processing tasks.
author2	Zinovi Rabinovich
author_facet	Zinovi Rabinovich Zhao, Fengye
format	Final Year Project
author	Zhao, Fengye
author_sort	Zhao, Fengye
title	Vision transformer as image fusion model
title_short	Vision transformer as image fusion model
title_full	Vision transformer as image fusion model
title_fullStr	Vision transformer as image fusion model
title_full_unstemmed	Vision transformer as image fusion model
title_sort	vision transformer as image fusion model
publisher	Nanyang Technological University
publishDate	2023
url	https://hdl.handle.net/10356/166048
_version_	1764208075381669888

Vision transformer as image fusion model

Similar Items