Transformers for computer vision

Transformer models were initially introduced on natural language tasks based on the self-attention mechanism. They require minimal inductive biases on design and can be applied as individual processing layers in network design in network design. In recent years, transformer models are applied to pop...

وصف كامل

محفوظ في:

التفاصيل البيبلوغرافية
المؤلف الرئيسي:	Deng, Yaojun
مؤلفون آخرون:	Wang Lipo
التنسيق:	Thesis-Master by Coursework
اللغة:	English
منشور في:	Nanyang Technological University 2022
الموضوعات:	Engineering::Computer science and engineering::Computing methodologies::Image processing and computer vision
الوصول للمادة أونلاين:	https://hdl.handle.net/10356/154659
الوسوم:	إضافة وسم لا توجد وسوم, كن أول من يضع وسما على هذه التسجيلة!
المؤسسة:	Nanyang Technological University
اللغة:	English

الوصف
الملخص:	Transformer models were initially introduced on natural language tasks based on the self-attention mechanism. They require minimal inductive biases on design and can be applied as individual processing layers in network design in network design. In recent years, transformer models are applied to popular Computer Vision (CV) tasks and led to significant progress. Previous surveys introduced applications of transformers on different tasks (e.g., object detection, activity recognition, and image enhancement). In this dissertation, we focus on image classification and introduce several outstanding and representative improved vision transformer models. We conduct comparison and simulation between transformer models and several representative convolution neural network (CNN) models to illustrate the advantages and limitations of vision transformers in Computer Vision (CV) tasks.

Transformers for computer vision

مواد مشابهة