A compact representation of human actions by sliding coordinate coding

Human action recognition remains challenging in realistic videos, where scale and viewpoint changes make the problem complicated. Many complex models have been developed to overcome these difficulties, while we explore using low-level features and typical classifiers to achieve the state-of-the-art...

Full description

Saved in:

Bibliographic Details
Main Authors:	Ding, Runwei, Sun, Qianru, Liu, Mengyuan, Liu, Hong
Other Authors:	School of Electrical and Electronic Engineering
Format:	Article
Language:	English
Published:	2018
Subjects:	Human Action Recognition Bag-of-words Model
Online Access:	https://hdl.handle.net/10356/87514 http://hdl.handle.net/10220/44460
Tags:	Add Tag No Tags, Be the first to tag this record!
Institution:	Nanyang Technological University
Language:	English

id	sg-ntu-dr.10356-87514
record_format	dspace
spelling	sg-ntu-dr.10356-875142020-03-07T13:57:31Z A compact representation of human actions by sliding coordinate coding Ding, Runwei Sun, Qianru Liu, Mengyuan Liu, Hong School of Electrical and Electronic Engineering Human Action Recognition Bag-of-words Model Human action recognition remains challenging in realistic videos, where scale and viewpoint changes make the problem complicated. Many complex models have been developed to overcome these difficulties, while we explore using low-level features and typical classifiers to achieve the state-of-the-art performance. The baseline model of feature encoding for action recognition is bag-of-words model, which has shown high efficiency but ignores the arrangement of local features. Refined methods compensate for this problem by using a large number of co-occurrence descriptors or a concatenation of the local distributions in designed segments. In contrast, this article proposes to encode the relative position of visual words using a simple but very compact method called sliding coordinates coding (SCC). The SCC vector of each kind of word is only an eight-dimensional vector which is more compact than many of the spatial or spatial–temporal pooling methods in the literature. Our key observation is that the relative position is robust to the variations of video scale and view angle. Additionally, we design a temporal cutting scheme to define the margin of coding within video clips, since visual words far away from each other have little relationship. In experiments, four action data sets, including KTH, Rochester Activities, IXMAS, and UCF YouTube, are used for performance evaluation. Results show that our method achieves comparable or better performance than the state of the art, while using more compact and less complex models. Published version 2018-02-28T04:50:07Z 2019-12-06T16:43:31Z 2018-02-28T04:50:07Z 2019-12-06T16:43:31Z 2017 Journal Article Ding, R., Sun, Q., Liu, M., & Liu, H. (2017). A compact representation of human actions by sliding coordinate coding. International Journal of Advanced Robotic Systems, 14(6), 1-12. 1729-8806 https://hdl.handle.net/10356/87514 http://hdl.handle.net/10220/44460 10.1177/1729881417746114 en International Journal of Advanced Robotic Systems © 2017 The Author(s). Creative Commons CC BY: This article is distributed under the terms of the Creative Commons Attribution 4.0 License (http://www.creativecommons.org/licenses/by/4.0/) which permits any use, reproduction and distribution of the work without further permission provided the original work is attributed as specified on the SAGE and Open Access pages (https://us.sagepub.com/en-us/nam/open-access-at-sage). 12 p. application/pdf
institution	Nanyang Technological University
building	NTU Library
country	Singapore
collection	DR-NTU
language	English
topic	Human Action Recognition Bag-of-words Model
spellingShingle	Human Action Recognition Bag-of-words Model Ding, Runwei Sun, Qianru Liu, Mengyuan Liu, Hong A compact representation of human actions by sliding coordinate coding
description	Human action recognition remains challenging in realistic videos, where scale and viewpoint changes make the problem complicated. Many complex models have been developed to overcome these difficulties, while we explore using low-level features and typical classifiers to achieve the state-of-the-art performance. The baseline model of feature encoding for action recognition is bag-of-words model, which has shown high efficiency but ignores the arrangement of local features. Refined methods compensate for this problem by using a large number of co-occurrence descriptors or a concatenation of the local distributions in designed segments. In contrast, this article proposes to encode the relative position of visual words using a simple but very compact method called sliding coordinates coding (SCC). The SCC vector of each kind of word is only an eight-dimensional vector which is more compact than many of the spatial or spatial–temporal pooling methods in the literature. Our key observation is that the relative position is robust to the variations of video scale and view angle. Additionally, we design a temporal cutting scheme to define the margin of coding within video clips, since visual words far away from each other have little relationship. In experiments, four action data sets, including KTH, Rochester Activities, IXMAS, and UCF YouTube, are used for performance evaluation. Results show that our method achieves comparable or better performance than the state of the art, while using more compact and less complex models.
author2	School of Electrical and Electronic Engineering
author_facet	School of Electrical and Electronic Engineering Ding, Runwei Sun, Qianru Liu, Mengyuan Liu, Hong
format	Article
author	Ding, Runwei Sun, Qianru Liu, Mengyuan Liu, Hong
author_sort	Ding, Runwei
title	A compact representation of human actions by sliding coordinate coding
title_short	A compact representation of human actions by sliding coordinate coding
title_full	A compact representation of human actions by sliding coordinate coding
title_fullStr	A compact representation of human actions by sliding coordinate coding
title_full_unstemmed	A compact representation of human actions by sliding coordinate coding
title_sort	compact representation of human actions by sliding coordinate coding
publishDate	2018
url	https://hdl.handle.net/10356/87514 http://hdl.handle.net/10220/44460
_version_	1681043729069113344

A compact representation of human actions by sliding coordinate coding

Similar Items