EL-VIT: Probing vision transformer with interactive visualization

Nowadays, Vision Transformer (ViT) is widely utilized in various computer vision tasks, owing to its unique self-attention mechanism. However, the model architecture of ViT is complex and often challenging to comprehend, leading to a steep learning curve. ViT developers and users frequently encounte...

Full description

Saved in:

Bibliographic Details
Main Authors:	ZHOU, Hong, ZHANG, Rui, LAI, Peifeng, GUO, Chaoran, WANG, Yong, SUN, Zhida, LI, Junjie
Format:	text
Language:	English
Published:	Institutional Knowledge at Singapore Management University 2023
Subjects:	Education Tool Explainable AI Vision Transformer Visual Analysis Databases and Information Systems Numerical Analysis and Scientific Computing
Online Access:	https://ink.library.smu.edu.sg/sis_research/8708 https://ink.library.smu.edu.sg/context/sis_research/article/9711/viewcontent/EL_VIT_av.pdf
Tags:	Add Tag No Tags, Be the first to tag this record!
Institution:	Singapore Management University
Language:	English

id	sg-smu-ink.sis_research-9711
record_format	dspace
spelling	sg-smu-ink.sis_research-97112024-04-04T09:06:17Z EL-VIT: Probing vision transformer with interactive visualization ZHOU, Hong ZHANG, Rui LAI, Peifeng GUO, Chaoran WANG, Yong SUN, Zhida LI, Junjie Nowadays, Vision Transformer (ViT) is widely utilized in various computer vision tasks, owing to its unique self-attention mechanism. However, the model architecture of ViT is complex and often challenging to comprehend, leading to a steep learning curve. ViT developers and users frequently encounter difficulties in interpreting its inner workings. Therefore, a visualization system is needed to assist ViT users in understanding its functionality. This paper introduces EL-VIT, an interactive visual analytics system designed to probe the Vision Transformer and facilitate a better understanding of its operations. The system consists of four layers of visualization views. The first three layers include model overview, knowledge background graph, and model detail view. These three layers elucidate the operation process of ViT from three perspectives: the overall model architecture, detailed explanation, and mathematical operations, enabling users to understand the underlying principles and the transition process between layers. The fourth interpretation view helps ViT users and experts gain a deeper understanding by calculating the cosine similarity between patches. Our two usage scenarios demonstrate the effectiveness and usability of EL-VIT in helping ViT users understand the working mechanism of ViT. 2023-01-01T08:00:00Z text application/pdf https://ink.library.smu.edu.sg/sis_research/8708 info:doi/10.1109/ICDMW60847.2023.00023 https://ink.library.smu.edu.sg/context/sis_research/article/9711/viewcontent/EL_VIT_av.pdf http://creativecommons.org/licenses/by-nc-nd/4.0/ Research Collection School Of Computing and Information Systems eng Institutional Knowledge at Singapore Management University Education Tool Explainable AI Vision Transformer Visual Analysis Databases and Information Systems Numerical Analysis and Scientific Computing
institution	Singapore Management University
building	SMU Libraries
continent	Asia
country	Singapore Singapore
content_provider	SMU Libraries
collection	InK@SMU
language	English
topic	Education Tool Explainable AI Vision Transformer Visual Analysis Databases and Information Systems Numerical Analysis and Scientific Computing
spellingShingle	Education Tool Explainable AI Vision Transformer Visual Analysis Databases and Information Systems Numerical Analysis and Scientific Computing ZHOU, Hong ZHANG, Rui LAI, Peifeng GUO, Chaoran WANG, Yong SUN, Zhida LI, Junjie EL-VIT: Probing vision transformer with interactive visualization
description	Nowadays, Vision Transformer (ViT) is widely utilized in various computer vision tasks, owing to its unique self-attention mechanism. However, the model architecture of ViT is complex and often challenging to comprehend, leading to a steep learning curve. ViT developers and users frequently encounter difficulties in interpreting its inner workings. Therefore, a visualization system is needed to assist ViT users in understanding its functionality. This paper introduces EL-VIT, an interactive visual analytics system designed to probe the Vision Transformer and facilitate a better understanding of its operations. The system consists of four layers of visualization views. The first three layers include model overview, knowledge background graph, and model detail view. These three layers elucidate the operation process of ViT from three perspectives: the overall model architecture, detailed explanation, and mathematical operations, enabling users to understand the underlying principles and the transition process between layers. The fourth interpretation view helps ViT users and experts gain a deeper understanding by calculating the cosine similarity between patches. Our two usage scenarios demonstrate the effectiveness and usability of EL-VIT in helping ViT users understand the working mechanism of ViT.
format	text
author	ZHOU, Hong ZHANG, Rui LAI, Peifeng GUO, Chaoran WANG, Yong SUN, Zhida LI, Junjie
author_facet	ZHOU, Hong ZHANG, Rui LAI, Peifeng GUO, Chaoran WANG, Yong SUN, Zhida LI, Junjie
author_sort	ZHOU, Hong
title	EL-VIT: Probing vision transformer with interactive visualization
title_short	EL-VIT: Probing vision transformer with interactive visualization
title_full	EL-VIT: Probing vision transformer with interactive visualization
title_fullStr	EL-VIT: Probing vision transformer with interactive visualization
title_full_unstemmed	EL-VIT: Probing vision transformer with interactive visualization
title_sort	el-vit: probing vision transformer with interactive visualization
publisher	Institutional Knowledge at Singapore Management University
publishDate	2023
url	https://ink.library.smu.edu.sg/sis_research/8708 https://ink.library.smu.edu.sg/context/sis_research/article/9711/viewcontent/EL_VIT_av.pdf
_version_	1814047472471769088

EL-VIT: Probing vision transformer with interactive visualization

Similar Items