Middle-level representation for human activities recognition : the role of spatio-temporal relationships
We tackle the challenging problem of human activity recognition in realistic video sequences. Unlike local features-based methods or global template-based methods, we propose to represent a video sequence by a set of middle-level parts. A part, or component, has consistent spatial structure and cons...
Saved in:
Main Authors: | , , |
---|---|
Other Authors: | |
Format: | Book Chapter |
Language: | English |
Published: |
Springer Berlin Heidelberg
2014
|
Subjects: | |
Online Access: | https://hdl.handle.net/10356/103853 http://hdl.handle.net/10220/19350 |
Tags: |
Add Tag
No Tags, Be the first to tag this record!
|
Institution: | Nanyang Technological University |
Language: | English |
id |
sg-ntu-dr.10356-103853 |
---|---|
record_format |
dspace |
spelling |
sg-ntu-dr.10356-1038532020-03-07T14:05:46Z Middle-level representation for human activities recognition : the role of spatio-temporal relationships Yuan, Fei Prinet, V´eronique Yuan, Junsong Kutulakos, Kiriakos N. School of Electrical and Electronic Engineering DRNTU::Engineering::Electrical and electronic engineering We tackle the challenging problem of human activity recognition in realistic video sequences. Unlike local features-based methods or global template-based methods, we propose to represent a video sequence by a set of middle-level parts. A part, or component, has consistent spatial structure and consistent motion. We first segment the visual motion patterns and generate a set of middle-level components by clustering keypoints-based trajectories extracted from the video. To further exploit the interdependencies of the moving parts, we then define spatio-temporal relationships between pairwise components. The resulting descriptive middle-level components and pairwise-components thereby catch the essential motion characteristics of human activities. They also give a very compact representation of the video. We apply our framework on popular and challenging video datasets: Weizmann dataset and UT-Interaction dataset. We demonstrate experimentally that our middle-level representation combined with a χ 2-SVM classifier equals to or outperforms the state-of-the-art results on these dataset. 2014-05-15T07:06:27Z 2019-12-06T21:21:35Z 2014-05-15T07:06:27Z 2019-12-06T21:21:35Z 2012 2012 Book Chapter Yuan, F., Prinet, V., & Yuan, J. (2012). Middle-Level Representation for Human Activities Recognition: The Role of Spatio-Temporal Relationships. In K.N. Kutulakos (Ed.), Trends and Topics in Computer Vision, ECCV 2010 Workshops, Part I, LNCS 6553, (pp.168–180). Springer-Verlag Berlin Heidelberg. 978-3-642-35748-0; 978-3-642-35749-7 https://hdl.handle.net/10356/103853 http://hdl.handle.net/10220/19350 10.1007/978-3-642-35749-7 en © 2012 Springer-Verlag Berlin Heidelberg. application/pdf Springer Berlin Heidelberg |
institution |
Nanyang Technological University |
building |
NTU Library |
country |
Singapore |
collection |
DR-NTU |
language |
English |
topic |
DRNTU::Engineering::Electrical and electronic engineering |
spellingShingle |
DRNTU::Engineering::Electrical and electronic engineering Yuan, Fei Prinet, V´eronique Yuan, Junsong Middle-level representation for human activities recognition : the role of spatio-temporal relationships |
description |
We tackle the challenging problem of human activity recognition in realistic video sequences. Unlike local features-based methods or global template-based methods, we propose to represent a video sequence by a set of middle-level parts. A part, or component, has consistent spatial structure and consistent motion. We first segment the visual motion patterns and generate a set of middle-level components by clustering keypoints-based trajectories extracted from the video. To further exploit the interdependencies of the moving parts, we then define spatio-temporal relationships between pairwise components. The resulting descriptive middle-level components and pairwise-components thereby catch the essential motion characteristics of human activities. They also give a very compact representation of the video. We apply our framework on popular and challenging video datasets: Weizmann dataset and UT-Interaction dataset. We demonstrate experimentally that our middle-level representation combined with a χ 2-SVM classifier equals to or outperforms the state-of-the-art results on these dataset. |
author2 |
Kutulakos, Kiriakos N. |
author_facet |
Kutulakos, Kiriakos N. Yuan, Fei Prinet, V´eronique Yuan, Junsong |
format |
Book Chapter |
author |
Yuan, Fei Prinet, V´eronique Yuan, Junsong |
author_sort |
Yuan, Fei |
title |
Middle-level representation for human activities recognition : the role of spatio-temporal relationships |
title_short |
Middle-level representation for human activities recognition : the role of spatio-temporal relationships |
title_full |
Middle-level representation for human activities recognition : the role of spatio-temporal relationships |
title_fullStr |
Middle-level representation for human activities recognition : the role of spatio-temporal relationships |
title_full_unstemmed |
Middle-level representation for human activities recognition : the role of spatio-temporal relationships |
title_sort |
middle-level representation for human activities recognition : the role of spatio-temporal relationships |
publisher |
Springer Berlin Heidelberg |
publishDate |
2014 |
url |
https://hdl.handle.net/10356/103853 http://hdl.handle.net/10220/19350 |
_version_ |
1681045343167315968 |