Max-margin structured output regression for spatio-temporal action localization

Structured output learning has been successfully applied to object localization, where the mapping between an image and an object bounding box can be well captured. Its extension to action localization in videos, however, is much more challenging, because one needs to predict the locations of the ac...

Full description

Saved in:
Bibliographic Details
Main Authors: Tran, Du, Yuan, Junsong
Other Authors: School of Electrical and Electronic Engineering
Format: Conference or Workshop Item
Language:English
Published: 2014
Subjects:
Online Access:https://hdl.handle.net/10356/103011
http://hdl.handle.net/10220/19131
http://papers.nips.cc/paper/4794-max-margin-structured-output-regression-for-spatio-temporal-action-localization
Tags: Add Tag
No Tags, Be the first to tag this record!
Institution: Nanyang Technological University
Language: English
id sg-ntu-dr.10356-103011
record_format dspace
spelling sg-ntu-dr.10356-1030112019-12-06T21:03:54Z Max-margin structured output regression for spatio-temporal action localization Tran, Du Yuan, Junsong School of Electrical and Electronic Engineering Advances in Neural Information Processing Systems 25 (NIPS 2012) DRNTU::Engineering::Electrical and electronic engineering Structured output learning has been successfully applied to object localization, where the mapping between an image and an object bounding box can be well captured. Its extension to action localization in videos, however, is much more challenging, because one needs to predict the locations of the action patterns both spatially and temporally, i.e., identifying a sequence of bounding boxes that track the action in video. The problem becomes intractable due to the exponentially large size of the structured video space where actions could occur. We propose a novel structured learning approach for spatio-temporal action localization. The mapping between a video and a spatio-temporal action trajectory is learned. The intractable inference and learning problems are addressed by leveraging an efficient Max-Path search method, thus makes it feasible to optimize the model over the whole structured space. Experiments on two challenging benchmark datasets show that our proposed method outperforms the state-of-the-art methods. Published version 2014-04-07T01:46:24Z 2019-12-06T21:03:54Z 2014-04-07T01:46:24Z 2019-12-06T21:03:54Z 2012 2012 Conference Paper Tran, D., & Yuan J. (2012). Max-Margin Structured Output Regression for Spatio-Temporal Action Localization. Advances in Neural Information Processing Systems 25 (NIPS 2012), 1-9. https://hdl.handle.net/10356/103011 http://hdl.handle.net/10220/19131 http://papers.nips.cc/paper/4794-max-margin-structured-output-regression-for-spatio-temporal-action-localization en © 2012 Massachusetts Institute of Technology Press. This paper was published in Advances in Neural Information Processing Systems 25 (NIPS 2012) and is made available as an electronic reprint (preprint) with permission of Massachusetts Institute of Technology Press. The paper can be found at the following official URL: [http://papers.nips.cc/paper/4794-max-margin-structured-output-regression-for-spatio-temporal-action-localization]. One print or electronic copy may be made for personal use only. Systematic or multiple reproduction, distribution to multiple locations via electronic or other means, duplication of any material in this paper for a fee or for commercial purposes, or modification of the content of the paper is prohibited and is subject to penalties under law. application/pdf
institution Nanyang Technological University
building NTU Library
country Singapore
collection DR-NTU
language English
topic DRNTU::Engineering::Electrical and electronic engineering
spellingShingle DRNTU::Engineering::Electrical and electronic engineering
Tran, Du
Yuan, Junsong
Max-margin structured output regression for spatio-temporal action localization
description Structured output learning has been successfully applied to object localization, where the mapping between an image and an object bounding box can be well captured. Its extension to action localization in videos, however, is much more challenging, because one needs to predict the locations of the action patterns both spatially and temporally, i.e., identifying a sequence of bounding boxes that track the action in video. The problem becomes intractable due to the exponentially large size of the structured video space where actions could occur. We propose a novel structured learning approach for spatio-temporal action localization. The mapping between a video and a spatio-temporal action trajectory is learned. The intractable inference and learning problems are addressed by leveraging an efficient Max-Path search method, thus makes it feasible to optimize the model over the whole structured space. Experiments on two challenging benchmark datasets show that our proposed method outperforms the state-of-the-art methods.
author2 School of Electrical and Electronic Engineering
author_facet School of Electrical and Electronic Engineering
Tran, Du
Yuan, Junsong
format Conference or Workshop Item
author Tran, Du
Yuan, Junsong
author_sort Tran, Du
title Max-margin structured output regression for spatio-temporal action localization
title_short Max-margin structured output regression for spatio-temporal action localization
title_full Max-margin structured output regression for spatio-temporal action localization
title_fullStr Max-margin structured output regression for spatio-temporal action localization
title_full_unstemmed Max-margin structured output regression for spatio-temporal action localization
title_sort max-margin structured output regression for spatio-temporal action localization
publishDate 2014
url https://hdl.handle.net/10356/103011
http://hdl.handle.net/10220/19131
http://papers.nips.cc/paper/4794-max-margin-structured-output-regression-for-spatio-temporal-action-localization
_version_ 1681049335370874880