GroundNLQ @ Ego4D natural language queries challenge 2023

In this report, we present our champion solution for Ego4D Natural Language Queries (NLQ) Challenge in CVPR 2023. Essentially, to accurately ground in a video, an effective egocentric feature extractor and a powerful grounding model are required. Motivated by this, we leverage a two-stage pre-traini...

Full description

Saved in:
Bibliographic Details
Main Authors: HOU, Zhijian, JI, Lei, GAO, Difei, ZHONG, Wanjun, YAN, Kun, NGO, Chong-wah, CHAN, Wing-Kwong, NGO, Chong-Wah, DUAN, Nan, SHOU, Mike Zheng
Format: text
Language:English
Published: Institutional Knowledge at Singapore Management University 2023
Subjects:
Online Access:https://ink.library.smu.edu.sg/sis_research/8416
https://ink.library.smu.edu.sg/context/sis_research/article/9419/viewcontent/2306.15255.pdf
Tags: Add Tag
No Tags, Be the first to tag this record!
Institution: Singapore Management University
Language: English
id sg-smu-ink.sis_research-9419
record_format dspace
spelling sg-smu-ink.sis_research-94192024-01-09T03:32:38Z GroundNLQ @ Ego4D natural language queries challenge 2023 HOU, Zhijian JI, Lei GAO, Difei ZHONG, Wanjun YAN, Kun NGO, Chong-wah CHAN, Wing-Kwong NGO, Chong-Wah DUAN, Nan SHOU, Mike Zheng In this report, we present our champion solution for Ego4D Natural Language Queries (NLQ) Challenge in CVPR 2023. Essentially, to accurately ground in a video, an effective egocentric feature extractor and a powerful grounding model are required. Motivated by this, we leverage a two-stage pre-training strategy to train egocentric feature extractors and the grounding model on video narrations, and further fine-tune the model on annotated data. In addition, we introduce a novel grounding model GroundNLQ, which employs a multi-modal multiscale grounding module for effective video and text fusion and various temporal intervals, especially for long videos. On the blind test set, GroundNLQ achieves 25.67 and 18.18 for R1@IoU=0.3 and R1@IoU=0.5, respectively, and surpasses all other teams by a noticeable margin. Our code will be released at https://github. com/houzhijian/GroundNLQ. 2023-06-01T07:00:00Z text application/pdf https://ink.library.smu.edu.sg/sis_research/8416 https://ink.library.smu.edu.sg/context/sis_research/article/9419/viewcontent/2306.15255.pdf http://creativecommons.org/licenses/by-nc-nd/4.0/ Research Collection School Of Computing and Information Systems eng Institutional Knowledge at Singapore Management University Databases and Information Systems
institution Singapore Management University
building SMU Libraries
continent Asia
country Singapore
Singapore
content_provider SMU Libraries
collection InK@SMU
language English
topic Databases and Information Systems
spellingShingle Databases and Information Systems
HOU, Zhijian
JI, Lei
GAO, Difei
ZHONG, Wanjun
YAN, Kun
NGO, Chong-wah
CHAN, Wing-Kwong
NGO, Chong-Wah
DUAN, Nan
SHOU, Mike Zheng
GroundNLQ @ Ego4D natural language queries challenge 2023
description In this report, we present our champion solution for Ego4D Natural Language Queries (NLQ) Challenge in CVPR 2023. Essentially, to accurately ground in a video, an effective egocentric feature extractor and a powerful grounding model are required. Motivated by this, we leverage a two-stage pre-training strategy to train egocentric feature extractors and the grounding model on video narrations, and further fine-tune the model on annotated data. In addition, we introduce a novel grounding model GroundNLQ, which employs a multi-modal multiscale grounding module for effective video and text fusion and various temporal intervals, especially for long videos. On the blind test set, GroundNLQ achieves 25.67 and 18.18 for R1@IoU=0.3 and R1@IoU=0.5, respectively, and surpasses all other teams by a noticeable margin. Our code will be released at https://github. com/houzhijian/GroundNLQ.
format text
author HOU, Zhijian
JI, Lei
GAO, Difei
ZHONG, Wanjun
YAN, Kun
NGO, Chong-wah
CHAN, Wing-Kwong
NGO, Chong-Wah
DUAN, Nan
SHOU, Mike Zheng
author_facet HOU, Zhijian
JI, Lei
GAO, Difei
ZHONG, Wanjun
YAN, Kun
NGO, Chong-wah
CHAN, Wing-Kwong
NGO, Chong-Wah
DUAN, Nan
SHOU, Mike Zheng
author_sort HOU, Zhijian
title GroundNLQ @ Ego4D natural language queries challenge 2023
title_short GroundNLQ @ Ego4D natural language queries challenge 2023
title_full GroundNLQ @ Ego4D natural language queries challenge 2023
title_fullStr GroundNLQ @ Ego4D natural language queries challenge 2023
title_full_unstemmed GroundNLQ @ Ego4D natural language queries challenge 2023
title_sort groundnlq @ ego4d natural language queries challenge 2023
publisher Institutional Knowledge at Singapore Management University
publishDate 2023
url https://ink.library.smu.edu.sg/sis_research/8416
https://ink.library.smu.edu.sg/context/sis_research/article/9419/viewcontent/2306.15255.pdf
_version_ 1789483242444816384