GroundNLQ @ Ego4D natural language queries challenge 2023
In this report, we present our champion solution for Ego4D Natural Language Queries (NLQ) Challenge in CVPR 2023. Essentially, to accurately ground in a video, an effective egocentric feature extractor and a powerful grounding model are required. Motivated by this, we leverage a two-stage pre-traini...
Saved in:
Main Authors: | , , , , , , , , , |
---|---|
Format: | text |
Language: | English |
Published: |
Institutional Knowledge at Singapore Management University
2023
|
Subjects: | |
Online Access: | https://ink.library.smu.edu.sg/sis_research/8416 https://ink.library.smu.edu.sg/context/sis_research/article/9419/viewcontent/2306.15255.pdf |
Tags: |
Add Tag
No Tags, Be the first to tag this record!
|
Institution: | Singapore Management University |
Language: | English |
id |
sg-smu-ink.sis_research-9419 |
---|---|
record_format |
dspace |
spelling |
sg-smu-ink.sis_research-94192024-01-09T03:32:38Z GroundNLQ @ Ego4D natural language queries challenge 2023 HOU, Zhijian JI, Lei GAO, Difei ZHONG, Wanjun YAN, Kun NGO, Chong-wah CHAN, Wing-Kwong NGO, Chong-Wah DUAN, Nan SHOU, Mike Zheng In this report, we present our champion solution for Ego4D Natural Language Queries (NLQ) Challenge in CVPR 2023. Essentially, to accurately ground in a video, an effective egocentric feature extractor and a powerful grounding model are required. Motivated by this, we leverage a two-stage pre-training strategy to train egocentric feature extractors and the grounding model on video narrations, and further fine-tune the model on annotated data. In addition, we introduce a novel grounding model GroundNLQ, which employs a multi-modal multiscale grounding module for effective video and text fusion and various temporal intervals, especially for long videos. On the blind test set, GroundNLQ achieves 25.67 and 18.18 for R1@IoU=0.3 and R1@IoU=0.5, respectively, and surpasses all other teams by a noticeable margin. Our code will be released at https://github. com/houzhijian/GroundNLQ. 2023-06-01T07:00:00Z text application/pdf https://ink.library.smu.edu.sg/sis_research/8416 https://ink.library.smu.edu.sg/context/sis_research/article/9419/viewcontent/2306.15255.pdf http://creativecommons.org/licenses/by-nc-nd/4.0/ Research Collection School Of Computing and Information Systems eng Institutional Knowledge at Singapore Management University Databases and Information Systems |
institution |
Singapore Management University |
building |
SMU Libraries |
continent |
Asia |
country |
Singapore Singapore |
content_provider |
SMU Libraries |
collection |
InK@SMU |
language |
English |
topic |
Databases and Information Systems |
spellingShingle |
Databases and Information Systems HOU, Zhijian JI, Lei GAO, Difei ZHONG, Wanjun YAN, Kun NGO, Chong-wah CHAN, Wing-Kwong NGO, Chong-Wah DUAN, Nan SHOU, Mike Zheng GroundNLQ @ Ego4D natural language queries challenge 2023 |
description |
In this report, we present our champion solution for Ego4D Natural Language Queries (NLQ) Challenge in CVPR 2023. Essentially, to accurately ground in a video, an effective egocentric feature extractor and a powerful grounding model are required. Motivated by this, we leverage a two-stage pre-training strategy to train egocentric feature extractors and the grounding model on video narrations, and further fine-tune the model on annotated data. In addition, we introduce a novel grounding model GroundNLQ, which employs a multi-modal multiscale grounding module for effective video and text fusion and various temporal intervals, especially for long videos. On the blind test set, GroundNLQ achieves 25.67 and 18.18 for R1@IoU=0.3 and R1@IoU=0.5, respectively, and surpasses all other teams by a noticeable margin. Our code will be released at https://github. com/houzhijian/GroundNLQ. |
format |
text |
author |
HOU, Zhijian JI, Lei GAO, Difei ZHONG, Wanjun YAN, Kun NGO, Chong-wah CHAN, Wing-Kwong NGO, Chong-Wah DUAN, Nan SHOU, Mike Zheng |
author_facet |
HOU, Zhijian JI, Lei GAO, Difei ZHONG, Wanjun YAN, Kun NGO, Chong-wah CHAN, Wing-Kwong NGO, Chong-Wah DUAN, Nan SHOU, Mike Zheng |
author_sort |
HOU, Zhijian |
title |
GroundNLQ @ Ego4D natural language queries challenge 2023 |
title_short |
GroundNLQ @ Ego4D natural language queries challenge 2023 |
title_full |
GroundNLQ @ Ego4D natural language queries challenge 2023 |
title_fullStr |
GroundNLQ @ Ego4D natural language queries challenge 2023 |
title_full_unstemmed |
GroundNLQ @ Ego4D natural language queries challenge 2023 |
title_sort |
groundnlq @ ego4d natural language queries challenge 2023 |
publisher |
Institutional Knowledge at Singapore Management University |
publishDate |
2023 |
url |
https://ink.library.smu.edu.sg/sis_research/8416 https://ink.library.smu.edu.sg/context/sis_research/article/9419/viewcontent/2306.15255.pdf |
_version_ |
1789483242444816384 |