Video graph transformer for video question answering

Video graph transformer for video question answering

This paper proposes a Video Graph Transformer (VGT) model for Video Quetion Answering (VideoQA). VGT’s uniqueness are two-fold: 1) it designs a dynamic graph transformer module which encodes video by explicitly capturing the visual objects, their relations, and dynamics for complex spatio-temporal r...

Full description

Saved in:

Bibliographic Details
Main Authors:	XIAO, Junbin, ZHOU, Pan, CHUA, Tat-Seng, YAN, Shuicheng
Format:	text
Language:	English
Published:	Institutional Knowledge at Singapore Management University 2022
Subjects:	Dynamic visual graph Transformer VideoQA Graphics and Human Computer Interfaces
Online Access:	https://ink.library.smu.edu.sg/sis_research/8994 https://ink.library.smu.edu.sg/context/sis_research/article/9997/viewcontent/2022_ECCV_VQA.pdf
Tags:	Add Tag No Tags, Be the first to tag this record!
Institution:	Singapore Management University
Language:	English

Similar Items

Contrastive video question answering via video graph transformer
by: XIAO, Junbin Xiao, et al.
Published: (2023)

VISUAL RELATION DRIVEN VIDEO QUESTION ANSWERING
by: XIAO JUNBIN
Published: (2023)

DualFormer: Local-global stratified transformer for efficient video recognition
by: LIANG, Yuxuan, et al.
Published: (2022)

VideoQA: Question answering on news video
by: Yang, H., et al.
Published: (2013)

DiffSeer: Difference-based dynamic weighted graph visualization
by: WEN, Xiaolin, et al.
Published: (2023)

Action-centric relation transformer network for video question answering
by: ZHANG, Jipeng, et al.
Published: (2022)

Video reference: Question answering on YouTube
by: Li, G., et al.
Published: (2013)

Video reference: A video question answering engine
by: Gao, L., et al.
Published: (2013)

Constructing holistic spatio-temporal scene graph for video semantic role labeling
by: ZHAO, Yu, et al.
Published: (2023)

Vireo @ video browser showdown 2019
by: NGUYEN, Phuong Anh, et al.
Published: (2019)

Heterogeneous graph transformer with poly-tokenization
by: LU, Zhiyuan, et al.
Published: (2024)

Question answering over community-contributed web videos
by: Li, G., et al.
Published: (2013)

RELATION UNDERSTANDING IN VIDEOS
by: SHANG XINDI
Published: (2021)

Watching 360° videos together
by: TANG, Anthony, et al.
Published: (2017)

Synchronization of lecture videos and electronic slides by video text analysis
by: WANG, Feng, et al.
Published: (2003)

Annotating Objects and Relations in User-Generated Videos
by: Xindi Shang, et al.
Published: (2020)

EmotionCues: Emotion-oriented visual summarization of classroom videos
by: ZENG, Haipeng, et al.
Published: (2021)

Recent advances in content-based video analysis
by: NGO, Chong-wah, et al.
Published: (2001)

Trajectory-based visualization of web video topics
by: CAO, Juan, et al.
Published: (2010)

Galaxy browser: Exploratory search of web videos
by: PANG, Lei, et al.
Published: (2011)

Mix-DANN and dynamic-modal-distillation for video domain adaptation
by: YIN, Yuehao, et al.
Published: (2022)

Learning temporal dynamics in videos with image transformer
by: SHU, Yan, et al.
Published: (2024)

Video summarization and scene detection by graph modeling
by: NGO, Chong-wah, et al.
Published: (2005)

Long-term leap attention, short-term periodic shift for video classification
by: ZHANG, Hao, et al.
Published: (2022)

MMGCN: Multimodal Graph Convolution Network for Personalized Recommendation of Micro-video
by: Yinwei Wei, et al.
Published: (2020)

Nonfactoid question answering as query-focused summarization with graph-enhanced multihop inference
by: DENG, Yang, et al.
Published: (2024)

Exploring video streaming in public settings: Shared geocaching over distance using mobile video chat
by: PROCYK, Jason, et al.
Published: (2014)

Automatic video summarization by graph modeling
by: NGO, Chong-wah, et al.
Published: (2003)

Tracking web video topics: Discovery, visualization, and monitoring
by: CAO, Juan, et al.
Published: (2011)

AmbiguityVis: Visualization of ambiguity in graph layouts
by: WANG, Yong, et al.
Published: (2016)

Relation Understanding in Videos: A Grand Challenge Overview
by: Xindi Shang, et al.
Published: (2020)

Techniques to visualize occluded graph elements for 2.5D map editing
by: FUJITA, Kazuyuki, et al.
Published: (2020)

Multi-graph based active learning for interactive video retrieval
by: ZHANG XIAOMING
Published: (2010)

Bag-of-visual-words expansion using visual relatedness for video indexing
by: JIANG, Yu-Gang, et al.
Published: (2008)

Approach for video retrieval by video clip
by: PENG, Y., et al.
Published: (2003)

On the feasibility of Simple Transformer for dynamic graph modeling
by: WU, Yuxia, et al.
Published: (2024)

On the annotation of web videos by efficient near-duplicate search
by: ZHAO, Wan-Lei, et al.
Published: (2010)

MANDO-HGT: Heterogeneous graph transformers for smart contract vulnerability detection
by: NGUYEN, Huu Hoang, et al.
Published: (2023)

combining multimodal external resources for event-based news video retrieval and question answering
by: NEO SHI YONG
Published: (2010)

Hierarchical visualization of video search results for topic-based browsing
by: JIANG, Yu-Gang, et al.
Published: (2016)