EPIC-KITCHENS VISOR benchmark: Video segmentations and object relations

We introduce VISOR, a new dataset of pixel annotations and a benchmark suite for segmenting hands and active objects in egocentric video. VISOR annotates videos from EPIC-KITCHENS, which comes with a new set of challenges not encountered in current video segmentation datasets. Specifically, we need...

Full description

Saved in:
Bibliographic Details
Main Authors: DAR KHALIL, Ahmad AK, SHAN, Dandan, ZHU, Bin, MA, Jian, KAR, Amlan, HIGGINS, Richard, FOUHEY, David, FIDLER, Sanja, DAMEN, Dima
Format: text
Language:English
Published: Institutional Knowledge at Singapore Management University 2022
Subjects:
Online Access:https://ink.library.smu.edu.sg/sis_research/9013
https://ink.library.smu.edu.sg/context/sis_research/article/10016/viewcontent/98_epic_kitchens_visor_benchmark_.pdf
Tags: Add Tag
No Tags, Be the first to tag this record!
Institution: Singapore Management University
Language: English
id sg-smu-ink.sis_research-10016
record_format dspace
spelling sg-smu-ink.sis_research-100162024-07-25T08:12:28Z EPIC-KITCHENS VISOR benchmark: Video segmentations and object relations DAR KHALIL, Ahmad AK SHAN, Dandan ZHU, Bin MA, Jian KAR, Amlan HIGGINS, Richard FOUHEY, David FIDLER, Sanja DAMEN, Dima We introduce VISOR, a new dataset of pixel annotations and a benchmark suite for segmenting hands and active objects in egocentric video. VISOR annotates videos from EPIC-KITCHENS, which comes with a new set of challenges not encountered in current video segmentation datasets. Specifically, we need to ensure both short- and long-term consistency of pixel-level annotations as objects undergo transformative interactions, e.g. an onion is peeled, diced and cooked - where we aim to obtain accurate pixel-level annotations of the peel, onion pieces, chopping board, knife, pan, as well as the acting hands. VISOR introduces an annotation pipeline, AI-powered in parts, for scalability and quality. In total, we publiclyrelease 272K manual semantic masks of 257 object classes, 9.9M interpolated dense masks, 67K hand-object relations, covering 36 hours of 179 untrimmed videos. Along with the annotations, we introduce three challenges in video object segmentation, interaction understanding and long-term reasoning.For data, code and leaderboards: http://epic-kitchens.github.io/VISOR 2022-11-01T07:00:00Z text application/pdf https://ink.library.smu.edu.sg/sis_research/9013 https://ink.library.smu.edu.sg/context/sis_research/article/10016/viewcontent/98_epic_kitchens_visor_benchmark_.pdf http://creativecommons.org/licenses/by-nc-nd/4.0/ Research Collection School Of Computing and Information Systems eng Institutional Knowledge at Singapore Management University Graphics and Human Computer Interfaces
institution Singapore Management University
building SMU Libraries
continent Asia
country Singapore
Singapore
content_provider SMU Libraries
collection InK@SMU
language English
topic Graphics and Human Computer Interfaces
spellingShingle Graphics and Human Computer Interfaces
DAR KHALIL, Ahmad AK
SHAN, Dandan
ZHU, Bin
MA, Jian
KAR, Amlan
HIGGINS, Richard
FOUHEY, David
FIDLER, Sanja
DAMEN, Dima
EPIC-KITCHENS VISOR benchmark: Video segmentations and object relations
description We introduce VISOR, a new dataset of pixel annotations and a benchmark suite for segmenting hands and active objects in egocentric video. VISOR annotates videos from EPIC-KITCHENS, which comes with a new set of challenges not encountered in current video segmentation datasets. Specifically, we need to ensure both short- and long-term consistency of pixel-level annotations as objects undergo transformative interactions, e.g. an onion is peeled, diced and cooked - where we aim to obtain accurate pixel-level annotations of the peel, onion pieces, chopping board, knife, pan, as well as the acting hands. VISOR introduces an annotation pipeline, AI-powered in parts, for scalability and quality. In total, we publiclyrelease 272K manual semantic masks of 257 object classes, 9.9M interpolated dense masks, 67K hand-object relations, covering 36 hours of 179 untrimmed videos. Along with the annotations, we introduce three challenges in video object segmentation, interaction understanding and long-term reasoning.For data, code and leaderboards: http://epic-kitchens.github.io/VISOR
format text
author DAR KHALIL, Ahmad AK
SHAN, Dandan
ZHU, Bin
MA, Jian
KAR, Amlan
HIGGINS, Richard
FOUHEY, David
FIDLER, Sanja
DAMEN, Dima
author_facet DAR KHALIL, Ahmad AK
SHAN, Dandan
ZHU, Bin
MA, Jian
KAR, Amlan
HIGGINS, Richard
FOUHEY, David
FIDLER, Sanja
DAMEN, Dima
author_sort DAR KHALIL, Ahmad AK
title EPIC-KITCHENS VISOR benchmark: Video segmentations and object relations
title_short EPIC-KITCHENS VISOR benchmark: Video segmentations and object relations
title_full EPIC-KITCHENS VISOR benchmark: Video segmentations and object relations
title_fullStr EPIC-KITCHENS VISOR benchmark: Video segmentations and object relations
title_full_unstemmed EPIC-KITCHENS VISOR benchmark: Video segmentations and object relations
title_sort epic-kitchens visor benchmark: video segmentations and object relations
publisher Institutional Knowledge at Singapore Management University
publishDate 2022
url https://ink.library.smu.edu.sg/sis_research/9013
https://ink.library.smu.edu.sg/context/sis_research/article/10016/viewcontent/98_epic_kitchens_visor_benchmark_.pdf
_version_ 1814047692458819584