EPIC-KITCHENS VISOR benchmark: Video segmentations and object relations
We introduce VISOR, a new dataset of pixel annotations and a benchmark suite for segmenting hands and active objects in egocentric video. VISOR annotates videos from EPIC-KITCHENS, which comes with a new set of challenges not encountered in current video segmentation datasets. Specifically, we need...
Saved in:
Main Authors: | , , , , , , , , |
---|---|
Format: | text |
Language: | English |
Published: |
Institutional Knowledge at Singapore Management University
2022
|
Subjects: | |
Online Access: | https://ink.library.smu.edu.sg/sis_research/9013 https://ink.library.smu.edu.sg/context/sis_research/article/10016/viewcontent/98_epic_kitchens_visor_benchmark_.pdf |
Tags: |
Add Tag
No Tags, Be the first to tag this record!
|
Institution: | Singapore Management University |
Language: | English |
id |
sg-smu-ink.sis_research-10016 |
---|---|
record_format |
dspace |
spelling |
sg-smu-ink.sis_research-100162024-07-25T08:12:28Z EPIC-KITCHENS VISOR benchmark: Video segmentations and object relations DAR KHALIL, Ahmad AK SHAN, Dandan ZHU, Bin MA, Jian KAR, Amlan HIGGINS, Richard FOUHEY, David FIDLER, Sanja DAMEN, Dima We introduce VISOR, a new dataset of pixel annotations and a benchmark suite for segmenting hands and active objects in egocentric video. VISOR annotates videos from EPIC-KITCHENS, which comes with a new set of challenges not encountered in current video segmentation datasets. Specifically, we need to ensure both short- and long-term consistency of pixel-level annotations as objects undergo transformative interactions, e.g. an onion is peeled, diced and cooked - where we aim to obtain accurate pixel-level annotations of the peel, onion pieces, chopping board, knife, pan, as well as the acting hands. VISOR introduces an annotation pipeline, AI-powered in parts, for scalability and quality. In total, we publiclyrelease 272K manual semantic masks of 257 object classes, 9.9M interpolated dense masks, 67K hand-object relations, covering 36 hours of 179 untrimmed videos. Along with the annotations, we introduce three challenges in video object segmentation, interaction understanding and long-term reasoning.For data, code and leaderboards: http://epic-kitchens.github.io/VISOR 2022-11-01T07:00:00Z text application/pdf https://ink.library.smu.edu.sg/sis_research/9013 https://ink.library.smu.edu.sg/context/sis_research/article/10016/viewcontent/98_epic_kitchens_visor_benchmark_.pdf http://creativecommons.org/licenses/by-nc-nd/4.0/ Research Collection School Of Computing and Information Systems eng Institutional Knowledge at Singapore Management University Graphics and Human Computer Interfaces |
institution |
Singapore Management University |
building |
SMU Libraries |
continent |
Asia |
country |
Singapore Singapore |
content_provider |
SMU Libraries |
collection |
InK@SMU |
language |
English |
topic |
Graphics and Human Computer Interfaces |
spellingShingle |
Graphics and Human Computer Interfaces DAR KHALIL, Ahmad AK SHAN, Dandan ZHU, Bin MA, Jian KAR, Amlan HIGGINS, Richard FOUHEY, David FIDLER, Sanja DAMEN, Dima EPIC-KITCHENS VISOR benchmark: Video segmentations and object relations |
description |
We introduce VISOR, a new dataset of pixel annotations and a benchmark suite for segmenting hands and active objects in egocentric video. VISOR annotates videos from EPIC-KITCHENS, which comes with a new set of challenges not encountered in current video segmentation datasets. Specifically, we need to ensure both short- and long-term consistency of pixel-level annotations as objects undergo transformative interactions, e.g. an onion is peeled, diced and cooked - where we aim to obtain accurate pixel-level annotations of the peel, onion pieces, chopping board, knife, pan, as well as the acting hands. VISOR introduces an annotation pipeline, AI-powered in parts, for scalability and quality. In total, we publiclyrelease 272K manual semantic masks of 257 object classes, 9.9M interpolated dense masks, 67K hand-object relations, covering 36 hours of 179 untrimmed videos. Along with the annotations, we introduce three challenges in video object segmentation, interaction understanding and long-term reasoning.For data, code and leaderboards: http://epic-kitchens.github.io/VISOR |
format |
text |
author |
DAR KHALIL, Ahmad AK SHAN, Dandan ZHU, Bin MA, Jian KAR, Amlan HIGGINS, Richard FOUHEY, David FIDLER, Sanja DAMEN, Dima |
author_facet |
DAR KHALIL, Ahmad AK SHAN, Dandan ZHU, Bin MA, Jian KAR, Amlan HIGGINS, Richard FOUHEY, David FIDLER, Sanja DAMEN, Dima |
author_sort |
DAR KHALIL, Ahmad AK |
title |
EPIC-KITCHENS VISOR benchmark: Video segmentations and object relations |
title_short |
EPIC-KITCHENS VISOR benchmark: Video segmentations and object relations |
title_full |
EPIC-KITCHENS VISOR benchmark: Video segmentations and object relations |
title_fullStr |
EPIC-KITCHENS VISOR benchmark: Video segmentations and object relations |
title_full_unstemmed |
EPIC-KITCHENS VISOR benchmark: Video segmentations and object relations |
title_sort |
epic-kitchens visor benchmark: video segmentations and object relations |
publisher |
Institutional Knowledge at Singapore Management University |
publishDate |
2022 |
url |
https://ink.library.smu.edu.sg/sis_research/9013 https://ink.library.smu.edu.sg/context/sis_research/article/10016/viewcontent/98_epic_kitchens_visor_benchmark_.pdf |
_version_ |
1814047692458819584 |