Contrastive video question answering via video graph transformer

Contrastive video question answering via video graph transformer

We propose to perform video question answering (VideoQA) in a Contrastive manner via a Video Graph Transformer model (CoVGT). CoVGT’s uniqueness and superiority are three-fold: 1) It proposes a dynamic graph transformer module which encodes video by explicitly capturing the visual objects, their rel...

Full description

Saved in:

Bibliographic Details
Main Authors:	XIAO, Junbin Xiao, ZHOU, Pan, YAO, Angela, LI, Yicong, HONG, Richang, YAN, Shuicheng, CHUA, Tat-Seng
Format:	text
Language:	English
Published:	Institutional Knowledge at Singapore Management University 2023
Subjects:	VideoQA Cross-Modal Visual Reasoning Video-Language Dynamic Visual Graphs Contrastive Learning Transformer Graphics and Human Computer Interfaces
Online Access:	https://ink.library.smu.edu.sg/sis_research/9053 https://ink.library.smu.edu.sg/context/sis_research/article/10056/viewcontent/2023_TPAMI_ContrastiveVideo.pdf
Tags:	Add Tag No Tags, Be the first to tag this record!
Institution:	Singapore Management University
Language:	English

Similar Items

Video graph transformer for video question answering
by: XIAO, Junbin, et al.
Published: (2022)

VISUAL RELATION DRIVEN VIDEO QUESTION ANSWERING
by: XIAO JUNBIN
Published: (2023)

Action-centric relation transformer network for video question answering
by: ZHANG, Jipeng, et al.
Published: (2022)

VideoQA: Question answering on news video
by: Yang, H., et al.
Published: (2013)

RELATION UNDERSTANDING IN VIDEOS
by: SHANG XINDI
Published: (2021)

Video reference: Question answering on YouTube
by: Li, G., et al.
Published: (2013)

MMGCN: Multimodal Graph Convolution Network for Personalized Recommendation of Micro-video
by: Yinwei Wei, et al.
Published: (2020)

Annotating Objects and Relations in User-Generated Videos
by: Xindi Shang, et al.
Published: (2020)

Efficient cross-modal video retrieval with meta-optimized frames
by: HAN, Ning, et al.
Published: (2024)

Video reference: A video question answering engine
by: Gao, L., et al.
Published: (2013)

Vireo @ video browser showdown 2019
by: NGUYEN, Phuong Anh, et al.
Published: (2019)

VideoAder: A video advertising system based on intelligent analysis of visual content
by: Hu, J., et al.
Published: (2013)

Cross-modal Moment Localization in Videos
by: Meng Liu, et al.
Published: (2020)

Question answering over community-contributed web videos
by: Li, G., et al.
Published: (2013)

Relation Understanding in Videos: A Grand Challenge Overview
by: Xindi Shang, et al.
Published: (2020)

Advertising object in web videos
by: Hong, R., et al.
Published: (2014)

Video Visual Relation Detection
by: Xindi Shang, et al.
Published: (2020)

Towards semantic, debiased and moment video retrieval
by: Satar, Burak
Published: (2025)

Automatic video logo detection and removal
by: Yan, W.-Q., et al.
Published: (2013)

Video Relation Detection via Multiple Hypothesis Association
by: Zixuan Su, et al.
Published: (2020)

Anchorage: Visual Analysis of Satisfaction in Customer Service Videos Via Anchor Events
by: WONG, Kam Kwai, et al.
Published: (2023)

Temporal sentence grounding in videos: a survey and future directions
by: Zhang, Hao, et al.
Published: (2023)

Adapting Video Delivery Based on Motion Triggered Visual Attention
by: KALVA, Hari, et al.
Published: (2012)

DEEP REPRESENTATION LEARNING FOR VIDEO FOUNDATION MODELS
by: HUANG ZIYUAN
Published: (2023)

TRRNet : tiered relation reasoning for compositional visual question answering
by: Yang, Xiaofeng, et al.
Published: (2020)

Multiple Hypothesis Video Relation Detection
by: Donglin Di, et al.
Published: (2020)

EmotionCues: Emotion-oriented visual summarization of classroom videos
by: ZENG, Haipeng, et al.
Published: (2021)

combining multimodal external resources for event-based news video retrieval and question answering
by: NEO SHI YONG
Published: (2010)

NoteVideo: Facilitating navigation of blackboard-style lecture videos
by: Monserrat, T.-J.K.P., et al.
Published: (2014)

Recent advances in content-based video analysis
by: NGO, Chong-wah, et al.
Published: (2001)

Learning temporal dynamics in videos with image transformer
by: SHU, Yan, et al.
Published: (2024)

Unifying text, tables, and images for multimodal question answering
by: LUO, Haohao, et al.
Published: (2023)

Trajectory-based visualization of web video topics
by: CAO, Juan, et al.
Published: (2010)

Galaxy browser: Exploratory search of web videos
by: PANG, Lei, et al.
Published: (2011)

Watching 360° videos together
by: TANG, Anthony, et al.
Published: (2017)

Mix-DANN and dynamic-modal-distillation for video domain adaptation
by: YIN, Yuehao, et al.
Published: (2022)

Bag-of-visual-words expansion using visual relatedness for video indexing
by: JIANG, Yu-Gang, et al.
Published: (2008)

Video segmentation by level set.
by: WANG PEI-LIN
Published: (2014)

On the annotation of web videos by efficient near-duplicate search
by: ZHAO, Wan-Lei, et al.
Published: (2010)

Real-time video copy-location detection in large-scale repositories
by: Liu, B., et al.
Published: (2013)