Localizing volumetric motion for action recognition in realistic videos

This paper presents a novel motion localization approach for recognizing actions and events in real videos. Examples include StandUp and Kiss in Hollywood movies. The challenge can be attributed to the large visual and motion variations imposed by realistic action poses. Previous works mainly focus...

Full description

Saved in:

Bibliographic Details
Main Authors:	WU, Xiao, NGO, Chong-wah, LI, Jintao, ZHANG, Yongdong
Format:	text
Language:	English
Published:	Institutional Knowledge at Singapore Management University 2009
Subjects:	Human action recognition Keypoint trajectory Mean-shift clustering Motion subspace learning Realistic videos Artificial Intelligence and Robotics Graphics and Human Computer Interfaces
Online Access:	https://ink.library.smu.edu.sg/sis_research/6370 https://ink.library.smu.edu.sg/context/sis_research/article/7373/viewcontent/10.1.1.567.4273.pdf
Tags:	Add Tag No Tags, Be the first to tag this record!
Institution:	Singapore Management University
Language:	English

id	sg-smu-ink.sis_research-7373
record_format	dspace
spelling	sg-smu-ink.sis_research-73732021-11-23T02:48:38Z Localizing volumetric motion for action recognition in realistic videos WU, Xiao NGO, Chong-wah LI, Jintao ZHANG, Yongdong This paper presents a novel motion localization approach for recognizing actions and events in real videos. Examples include StandUp and Kiss in Hollywood movies. The challenge can be attributed to the large visual and motion variations imposed by realistic action poses. Previous works mainly focus on learning from descriptors of cuboids around space time interest points (STIP) to characterize actions. The size, shape and space-time position of cuboids are fixed without considering the underlying motion dynamics. This often results in large set of fragmentized cuboids which fail to capture long-term dynamic properties of realistic actions. This paper proposes the detection of spatio-temporal motion volumes (namely Volume of Interest, VOI) of scale and position adaptive to localize actions. First, motions are described as bags of point trajectories by tracking keypoints along the time dimension. VOIs are then adaptively extracted by clustering trajectory on the motion mainfold. The resulting VOIs, of varying scales and centering at arbitrary positions depending on motion dynamics, are eventually described by SIFT and 3D gradient features for action recognition. Comparing with fixed-size cuboids, VOI allows comprehensive modeling of long-term motion and shows better capability in capturing contextual information associated with motion dynamics. Experiments on a realistic Hollywood movie dataset show that the proposed approach can achieve 20% relative improvement compared to the state-ofthe-art STIP based algorithm. 2009-10-01T07:00:00Z text application/pdf https://ink.library.smu.edu.sg/sis_research/6370 info:doi/10.1145/1631272.1631342 https://ink.library.smu.edu.sg/context/sis_research/article/7373/viewcontent/10.1.1.567.4273.pdf http://creativecommons.org/licenses/by-nc-nd/4.0/ Research Collection School Of Computing and Information Systems eng Institutional Knowledge at Singapore Management University Human action recognition Keypoint trajectory Mean-shift clustering Motion subspace learning Realistic videos Artificial Intelligence and Robotics Graphics and Human Computer Interfaces
institution	Singapore Management University
building	SMU Libraries
continent	Asia
country	Singapore Singapore
content_provider	SMU Libraries
collection	InK@SMU
language	English
topic	Human action recognition Keypoint trajectory Mean-shift clustering Motion subspace learning Realistic videos Artificial Intelligence and Robotics Graphics and Human Computer Interfaces
spellingShingle	Human action recognition Keypoint trajectory Mean-shift clustering Motion subspace learning Realistic videos Artificial Intelligence and Robotics Graphics and Human Computer Interfaces WU, Xiao NGO, Chong-wah LI, Jintao ZHANG, Yongdong Localizing volumetric motion for action recognition in realistic videos
description	This paper presents a novel motion localization approach for recognizing actions and events in real videos. Examples include StandUp and Kiss in Hollywood movies. The challenge can be attributed to the large visual and motion variations imposed by realistic action poses. Previous works mainly focus on learning from descriptors of cuboids around space time interest points (STIP) to characterize actions. The size, shape and space-time position of cuboids are fixed without considering the underlying motion dynamics. This often results in large set of fragmentized cuboids which fail to capture long-term dynamic properties of realistic actions. This paper proposes the detection of spatio-temporal motion volumes (namely Volume of Interest, VOI) of scale and position adaptive to localize actions. First, motions are described as bags of point trajectories by tracking keypoints along the time dimension. VOIs are then adaptively extracted by clustering trajectory on the motion mainfold. The resulting VOIs, of varying scales and centering at arbitrary positions depending on motion dynamics, are eventually described by SIFT and 3D gradient features for action recognition. Comparing with fixed-size cuboids, VOI allows comprehensive modeling of long-term motion and shows better capability in capturing contextual information associated with motion dynamics. Experiments on a realistic Hollywood movie dataset show that the proposed approach can achieve 20% relative improvement compared to the state-ofthe-art STIP based algorithm.
format	text
author	WU, Xiao NGO, Chong-wah LI, Jintao ZHANG, Yongdong
author_facet	WU, Xiao NGO, Chong-wah LI, Jintao ZHANG, Yongdong
author_sort	WU, Xiao
title	Localizing volumetric motion for action recognition in realistic videos
title_short	Localizing volumetric motion for action recognition in realistic videos
title_full	Localizing volumetric motion for action recognition in realistic videos
title_fullStr	Localizing volumetric motion for action recognition in realistic videos
title_full_unstemmed	Localizing volumetric motion for action recognition in realistic videos
title_sort	localizing volumetric motion for action recognition in realistic videos
publisher	Institutional Knowledge at Singapore Management University
publishDate	2009
url	https://ink.library.smu.edu.sg/sis_research/6370 https://ink.library.smu.edu.sg/context/sis_research/article/7373/viewcontent/10.1.1.567.4273.pdf
_version_	1770575943483523072

Localizing volumetric motion for action recognition in realistic videos

Similar Items