Delving deep into many-to-many attention for few-shot video object segmentation

This paper tackles the task of Few-Shot Video Object Segmentation (FSVOS), i.e., segmenting objects in the query videos with certain class specified in a few labeled support images. The key is to model the relationship between the query videos and the support images for propagating the object inform...

Full description

Saved in:

Bibliographic Details
Main Authors:	CHEN, Haoxin, WU, Hanjie, ZHAO, Nanxuan, REN, Sucheng, HE, Shengfeng
Format:	text
Language:	English
Published:	Institutional Knowledge at Singapore Management University 2021
Subjects:	Agent network; Breakings; Linear spaces; Linear time; Many to many; Novel domain; Object information; Query video; Single frames; Video objects segmentations Graphics and Human Computer Interfaces Numerical Analysis and Scientific Computing
Online Access:	https://ink.library.smu.edu.sg/sis_research/8527 https://ink.library.smu.edu.sg/context/sis_research/article/9530/viewcontent/Delving_deep_into_many_to_many_attention_for_few_shot_video_object_segmentation.pdf
Tags:	Add Tag No Tags, Be the first to tag this record!
Institution:	Singapore Management University
Language:	English

Description
Summary:	This paper tackles the task of Few-Shot Video Object Segmentation (FSVOS), i.e., segmenting objects in the query videos with certain class specified in a few labeled support images. The key is to model the relationship between the query videos and the support images for propagating the object information. This is a many-to-many problem and often relies on full-rank attention, which is computationally intensive. In this paper, we propose a novel Domain Agent Network (DAN), breaking down the full-rank attention into two smaller ones. We consider one single frame of the query video as the domain agent, bridging between the support images and the query video. Our DAN allows a linear space and time complexity as opposed to the original quadratic form with no loss of performance. In addition, we introduce a learning strategy by combining meta-learning with online learning to further improve the segmentation accuracy. We build a FSVOS benchmark on the Youtube-VIS dataset and conduct experiments to demonstrate that our method outperforms baselines on both computational cost and accuracy, achieving the state-of-the-art performance

Delving deep into many-to-many attention for few-shot video object segmentation

Similar Items