Partial annotation-based video moment retrieval via iterative learning

Given a descriptive language query, Video Moment Retrieval (VMR) aims to seek the corresponding semantic-consistent moment clip in the video, which is represented as a pair of the start and end timestamps. Although current methods have achieved satisfying performance, training these models heavily r...

Full description

Saved in:

Bibliographic Details
Main Authors:	JI, Wei, LIANG, Renjie, LIAO, Lizi, FEI, Hao, FENG, Fuli
Format:	text
Language:	English
Published:	Institutional Knowledge at Singapore Management University 2023
Subjects:	Current Coarse guidance Iterative learning Labour-intensive Performance Pseudo label Query video Retrieval methods Time-stamp Video moment retrieval Databases and Information Systems
Online Access:	https://ink.library.smu.edu.sg/sis_research/8585 https://ink.library.smu.edu.sg/context/sis_research/article/9588/viewcontent/Partial_Annotation_based_Video_Moment_Retrieval_via_Iterative_Learning.pdf
Tags:	Add Tag No Tags, Be the first to tag this record!
Institution:	Singapore Management University
Language:	English

id	sg-smu-ink.sis_research-9588
record_format	dspace
spelling	sg-smu-ink.sis_research-95882024-01-25T08:53:38Z Partial annotation-based video moment retrieval via iterative learning JI, Wei LIANG, Renjie LIAO, Lizi FEI, Hao FENG, Fuli Given a descriptive language query, Video Moment Retrieval (VMR) aims to seek the corresponding semantic-consistent moment clip in the video, which is represented as a pair of the start and end timestamps. Although current methods have achieved satisfying performance, training these models heavily relies on the fully-annotated VMR datasets. Nonetheless, precise video temporal annotations are extremely labor-intensive and ambiguous due to the diverse preferences of different annotators.Although there are several works trying to explore weakly supervised VMR tasks with scattered annotated frames as labels, there is still much room to improve in terms of accuracy. Therefore, we design a new setting of VMR where users can easily point to small segments of non-controversy video moments and our proposed method can automatically fill in the remaining parts based on the video and query semantics. To support this, we propose a new framework named Video Moment Retrieval via Iterative Learning (VMRIL). It treats the partial temporal region as the seed, then expands the pseudo label by iterative training. In order to restrict the expansion with reasonable boundaries, we utilize a pretrained video action localization model to provide coarse guidance of potential video segments. Compared with other VMR methods, our VMRIL achieves a trade-off between satisfying performance and annotation efficiency. Experimental results show that our proposed method can achieve the SOTA performance in the weakly supervised VMR setting, and are even comparable with some fully-supervised VMR methods but with much less annotation cost. 2023-11-01T07:00:00Z text application/pdf https://ink.library.smu.edu.sg/sis_research/8585 info:doi/10.1145/3581783.3612088 https://ink.library.smu.edu.sg/context/sis_research/article/9588/viewcontent/Partial_Annotation_based_Video_Moment_Retrieval_via_Iterative_Learning.pdf http://creativecommons.org/licenses/by-nc-nd/4.0/ Research Collection School Of Computing and Information Systems eng Institutional Knowledge at Singapore Management University Current Coarse guidance Iterative learning Labour-intensive Performance Pseudo label Query video Retrieval methods Time-stamp Video moment retrieval Databases and Information Systems
institution	Singapore Management University
building	SMU Libraries
continent	Asia
country	Singapore Singapore
content_provider	SMU Libraries
collection	InK@SMU
language	English
topic	Current Coarse guidance Iterative learning Labour-intensive Performance Pseudo label Query video Retrieval methods Time-stamp Video moment retrieval Databases and Information Systems
spellingShingle	Current Coarse guidance Iterative learning Labour-intensive Performance Pseudo label Query video Retrieval methods Time-stamp Video moment retrieval Databases and Information Systems JI, Wei LIANG, Renjie LIAO, Lizi FEI, Hao FENG, Fuli Partial annotation-based video moment retrieval via iterative learning
description	Given a descriptive language query, Video Moment Retrieval (VMR) aims to seek the corresponding semantic-consistent moment clip in the video, which is represented as a pair of the start and end timestamps. Although current methods have achieved satisfying performance, training these models heavily relies on the fully-annotated VMR datasets. Nonetheless, precise video temporal annotations are extremely labor-intensive and ambiguous due to the diverse preferences of different annotators.Although there are several works trying to explore weakly supervised VMR tasks with scattered annotated frames as labels, there is still much room to improve in terms of accuracy. Therefore, we design a new setting of VMR where users can easily point to small segments of non-controversy video moments and our proposed method can automatically fill in the remaining parts based on the video and query semantics. To support this, we propose a new framework named Video Moment Retrieval via Iterative Learning (VMRIL). It treats the partial temporal region as the seed, then expands the pseudo label by iterative training. In order to restrict the expansion with reasonable boundaries, we utilize a pretrained video action localization model to provide coarse guidance of potential video segments. Compared with other VMR methods, our VMRIL achieves a trade-off between satisfying performance and annotation efficiency. Experimental results show that our proposed method can achieve the SOTA performance in the weakly supervised VMR setting, and are even comparable with some fully-supervised VMR methods but with much less annotation cost.
format	text
author	JI, Wei LIANG, Renjie LIAO, Lizi FEI, Hao FENG, Fuli
author_facet	JI, Wei LIANG, Renjie LIAO, Lizi FEI, Hao FENG, Fuli
author_sort	JI, Wei
title	Partial annotation-based video moment retrieval via iterative learning
title_short	Partial annotation-based video moment retrieval via iterative learning
title_full	Partial annotation-based video moment retrieval via iterative learning
title_fullStr	Partial annotation-based video moment retrieval via iterative learning
title_full_unstemmed	Partial annotation-based video moment retrieval via iterative learning
title_sort	partial annotation-based video moment retrieval via iterative learning
publisher	Institutional Knowledge at Singapore Management University
publishDate	2023
url	https://ink.library.smu.edu.sg/sis_research/8585 https://ink.library.smu.edu.sg/context/sis_research/article/9588/viewcontent/Partial_Annotation_based_Video_Moment_Retrieval_via_Iterative_Learning.pdf
_version_	1789483280715743232

Partial annotation-based video moment retrieval via iterative learning

Similar Items