GroundNLQ @ Ego4D natural language queries challenge 2023

In this report, we present our champion solution for Ego4D Natural Language Queries (NLQ) Challenge in CVPR 2023. Essentially, to accurately ground in a video, an effective egocentric feature extractor and a powerful grounding model are required. Motivated by this, we leverage a two-stage pre-traini...

Full description

Saved in:

Bibliographic Details
Main Authors:	HOU, Zhijian, JI, Lei, GAO, Difei, ZHONG, Wanjun, YAN, Kun, NGO, Chong-wah, CHAN, Wing-Kwong, NGO, Chong-Wah, DUAN, Nan, SHOU, Mike Zheng
Format:	text
Language:	English
Published:	Institutional Knowledge at Singapore Management University 2023
Subjects:	Databases and Information Systems
Online Access:	https://ink.library.smu.edu.sg/sis_research/8416 https://ink.library.smu.edu.sg/context/sis_research/article/9419/viewcontent/2306.15255.pdf
Tags:	Add Tag No Tags, Be the first to tag this record!
Institution:	Singapore Management University
Language:	English

Description
Summary:	In this report, we present our champion solution for Ego4D Natural Language Queries (NLQ) Challenge in CVPR 2023. Essentially, to accurately ground in a video, an effective egocentric feature extractor and a powerful grounding model are required. Motivated by this, we leverage a two-stage pre-training strategy to train egocentric feature extractors and the grounding model on video narrations, and further fine-tune the model on annotated data. In addition, we introduce a novel grounding model GroundNLQ, which employs a multi-modal multiscale grounding module for effective video and text fusion and various temporal intervals, especially for long videos. On the blind test set, GroundNLQ achieves 25.67 and 18.18 for R1@IoU=0.3 and R1@IoU=0.5, respectively, and surpasses all other teams by a noticeable margin. Our code will be released at https://github. com/houzhijian/GroundNLQ.

GroundNLQ @ Ego4D natural language queries challenge 2023

Similar Items