Vision transformer as image fusion model

Vision transformers show the state-of-art performance in vision tasks, the self attention block works not only limited to NLP tasks but also perform well in process images. In this report, I investigated whether this performance can be further extended into more detailed tasks on images by combining...

وصف كامل

محفوظ في:
التفاصيل البيبلوغرافية
المؤلف الرئيسي: Zhao, Fengye
مؤلفون آخرون: Zinovi Rabinovich
التنسيق: Final Year Project
اللغة:English
منشور في: Nanyang Technological University 2023
الموضوعات:
الوصول للمادة أونلاين:https://hdl.handle.net/10356/166048
الوسوم: إضافة وسم
لا توجد وسوم, كن أول من يضع وسما على هذه التسجيلة!
المؤسسة: Nanyang Technological University
اللغة: English
الوصف
الملخص:Vision transformers show the state-of-art performance in vision tasks, the self attention block works not only limited to NLP tasks but also perform well in process images. In this report, I investigated whether this performance can be further extended into more detailed tasks on images by combining it with a VAE decoder. I observe that the output from the Vit encoder is able to be reconstructed by the VAE decoder, and with controlling the input patches variability, the model is able to perform image fusion tasks. In addition, it also has the potential to solve other high complexity image processing tasks.