Vision transformer as image fusion model
Vision transformers show the state-of-art performance in vision tasks, the self attention block works not only limited to NLP tasks but also perform well in process images. In this report, I investigated whether this performance can be further extended into more detailed tasks on images by combining...
Saved in:
Main Author: | |
---|---|
Other Authors: | |
Format: | Final Year Project |
Language: | English |
Published: |
Nanyang Technological University
2023
|
Subjects: | |
Online Access: | https://hdl.handle.net/10356/166048 |
Tags: |
Add Tag
No Tags, Be the first to tag this record!
|
Institution: | Nanyang Technological University |
Language: | English |
id |
sg-ntu-dr.10356-166048 |
---|---|
record_format |
dspace |
spelling |
sg-ntu-dr.10356-1660482023-04-21T15:39:44Z Vision transformer as image fusion model Zhao, Fengye Zinovi Rabinovich School of Computer Science and Engineering zfy0120@gmail.com, zinovi@ntu.edu.sg Engineering::Computer science and engineering::Computing methodologies::Artificial intelligence Vision transformers show the state-of-art performance in vision tasks, the self attention block works not only limited to NLP tasks but also perform well in process images. In this report, I investigated whether this performance can be further extended into more detailed tasks on images by combining it with a VAE decoder. I observe that the output from the Vit encoder is able to be reconstructed by the VAE decoder, and with controlling the input patches variability, the model is able to perform image fusion tasks. In addition, it also has the potential to solve other high complexity image processing tasks. Bachelor of Engineering (Computer Science) 2023-04-20T06:06:22Z 2023-04-20T06:06:22Z 2023 Final Year Project (FYP) Zhao, F. (2023). Vision transformer as image fusion model. Final Year Project (FYP), Nanyang Technological University, Singapore. https://hdl.handle.net/10356/166048 https://hdl.handle.net/10356/166048 en application/pdf Nanyang Technological University |
institution |
Nanyang Technological University |
building |
NTU Library |
continent |
Asia |
country |
Singapore Singapore |
content_provider |
NTU Library |
collection |
DR-NTU |
language |
English |
topic |
Engineering::Computer science and engineering::Computing methodologies::Artificial intelligence |
spellingShingle |
Engineering::Computer science and engineering::Computing methodologies::Artificial intelligence Zhao, Fengye Vision transformer as image fusion model |
description |
Vision transformers show the state-of-art performance in vision tasks, the self attention block works not only limited to NLP tasks but also perform well in process images. In this report, I investigated whether this performance can be further extended into more detailed tasks on images by combining it with a VAE decoder. I observe that the output from the Vit encoder is able to be reconstructed by the VAE decoder, and with controlling the input patches variability, the model is able to perform image fusion tasks. In addition, it also has the potential to solve other high complexity image processing tasks. |
author2 |
Zinovi Rabinovich |
author_facet |
Zinovi Rabinovich Zhao, Fengye |
format |
Final Year Project |
author |
Zhao, Fengye |
author_sort |
Zhao, Fengye |
title |
Vision transformer as image fusion model |
title_short |
Vision transformer as image fusion model |
title_full |
Vision transformer as image fusion model |
title_fullStr |
Vision transformer as image fusion model |
title_full_unstemmed |
Vision transformer as image fusion model |
title_sort |
vision transformer as image fusion model |
publisher |
Nanyang Technological University |
publishDate |
2023 |
url |
https://hdl.handle.net/10356/166048 |
_version_ |
1764208075381669888 |