Delving deep into many-to-many attention for few-shot video object segmentation

This paper tackles the task of Few-Shot Video Object Segmentation (FSVOS), i.e., segmenting objects in the query videos with certain class specified in a few labeled support images. The key is to model the relationship between the query videos and the support images for propagating the object inform...

全面介紹

Saved in:

書目詳細資料
Main Authors:	CHEN, Haoxin, WU, Hanjie, ZHAO, Nanxuan, REN, Sucheng, HE, Shengfeng
格式:	text
語言:	English
出版:	Institutional Knowledge at Singapore Management University 2021
主題:	Agent network; Breakings; Linear spaces; Linear time; Many to many; Novel domain; Object information; Query video; Single frames; Video objects segmentations Graphics and Human Computer Interfaces Numerical Analysis and Scientific Computing
在線閱讀:	https://ink.library.smu.edu.sg/sis_research/8527 https://ink.library.smu.edu.sg/context/sis_research/article/9530/viewcontent/Delving_deep_into_many_to_many_attention_for_few_shot_video_object_segmentation.pdf
標簽:	添加標簽沒有標簽, 成為第一個標記此記錄!
機構:	Singapore Management University
語言:	English

id	sg-smu-ink.sis_research-9530
record_format	dspace
spelling	sg-smu-ink.sis_research-95302024-01-22T15:00:08Z Delving deep into many-to-many attention for few-shot video object segmentation CHEN, Haoxin WU, Hanjie ZHAO, Nanxuan REN, Sucheng HE, Shengfeng This paper tackles the task of Few-Shot Video Object Segmentation (FSVOS), i.e., segmenting objects in the query videos with certain class specified in a few labeled support images. The key is to model the relationship between the query videos and the support images for propagating the object information. This is a many-to-many problem and often relies on full-rank attention, which is computationally intensive. In this paper, we propose a novel Domain Agent Network (DAN), breaking down the full-rank attention into two smaller ones. We consider one single frame of the query video as the domain agent, bridging between the support images and the query video. Our DAN allows a linear space and time complexity as opposed to the original quadratic form with no loss of performance. In addition, we introduce a learning strategy by combining meta-learning with online learning to further improve the segmentation accuracy. We build a FSVOS benchmark on the Youtube-VIS dataset and conduct experiments to demonstrate that our method outperforms baselines on both computational cost and accuracy, achieving the state-of-the-art performance 2021-06-01T07:00:00Z text application/pdf https://ink.library.smu.edu.sg/sis_research/8527 info:doi/10.1109/CVPR46437.2021.01382 https://ink.library.smu.edu.sg/context/sis_research/article/9530/viewcontent/Delving_deep_into_many_to_many_attention_for_few_shot_video_object_segmentation.pdf http://creativecommons.org/licenses/by-nc-nd/4.0/ Research Collection School Of Computing and Information Systems eng Institutional Knowledge at Singapore Management University Agent network; Breakings; Linear spaces; Linear time; Many to many; Novel domain; Object information; Query video; Single frames; Video objects segmentations Graphics and Human Computer Interfaces Numerical Analysis and Scientific Computing
institution	Singapore Management University
building	SMU Libraries
continent	Asia
country	Singapore Singapore
content_provider	SMU Libraries
collection	InK@SMU
language	English
topic	Agent network; Breakings; Linear spaces; Linear time; Many to many; Novel domain; Object information; Query video; Single frames; Video objects segmentations Graphics and Human Computer Interfaces Numerical Analysis and Scientific Computing
spellingShingle	Agent network; Breakings; Linear spaces; Linear time; Many to many; Novel domain; Object information; Query video; Single frames; Video objects segmentations Graphics and Human Computer Interfaces Numerical Analysis and Scientific Computing CHEN, Haoxin WU, Hanjie ZHAO, Nanxuan REN, Sucheng HE, Shengfeng Delving deep into many-to-many attention for few-shot video object segmentation
description	This paper tackles the task of Few-Shot Video Object Segmentation (FSVOS), i.e., segmenting objects in the query videos with certain class specified in a few labeled support images. The key is to model the relationship between the query videos and the support images for propagating the object information. This is a many-to-many problem and often relies on full-rank attention, which is computationally intensive. In this paper, we propose a novel Domain Agent Network (DAN), breaking down the full-rank attention into two smaller ones. We consider one single frame of the query video as the domain agent, bridging between the support images and the query video. Our DAN allows a linear space and time complexity as opposed to the original quadratic form with no loss of performance. In addition, we introduce a learning strategy by combining meta-learning with online learning to further improve the segmentation accuracy. We build a FSVOS benchmark on the Youtube-VIS dataset and conduct experiments to demonstrate that our method outperforms baselines on both computational cost and accuracy, achieving the state-of-the-art performance
format	text
author	CHEN, Haoxin WU, Hanjie ZHAO, Nanxuan REN, Sucheng HE, Shengfeng
author_facet	CHEN, Haoxin WU, Hanjie ZHAO, Nanxuan REN, Sucheng HE, Shengfeng
author_sort	CHEN, Haoxin
title	Delving deep into many-to-many attention for few-shot video object segmentation
title_short	Delving deep into many-to-many attention for few-shot video object segmentation
title_full	Delving deep into many-to-many attention for few-shot video object segmentation
title_fullStr	Delving deep into many-to-many attention for few-shot video object segmentation
title_full_unstemmed	Delving deep into many-to-many attention for few-shot video object segmentation
title_sort	delving deep into many-to-many attention for few-shot video object segmentation
publisher	Institutional Knowledge at Singapore Management University
publishDate	2021
url	https://ink.library.smu.edu.sg/sis_research/8527 https://ink.library.smu.edu.sg/context/sis_research/article/9530/viewcontent/Delving_deep_into_many_to_many_attention_for_few_shot_video_object_segmentation.pdf
_version_	1789483259223080960

Delving deep into many-to-many attention for few-shot video object segmentation

相似書籍