A compact representation of human actions by sliding coordinate coding

Human action recognition remains challenging in realistic videos, where scale and viewpoint changes make the problem complicated. Many complex models have been developed to overcome these difficulties, while we explore using low-level features and typical classifiers to achieve the state-of-the-art...

Full description

Saved in:
Bibliographic Details
Main Authors: DING, Runwei, SUN, Qianru, LIU, Mengyuan, LIU, Hong
Format: text
Language:English
Published: Institutional Knowledge at Singapore Management University 2017
Subjects:
Online Access:https://ink.library.smu.edu.sg/sis_research/4451
https://ink.library.smu.edu.sg/context/sis_research/article/5454/viewcontent/1729881417746114.pdf
Tags: Add Tag
No Tags, Be the first to tag this record!
Institution: Singapore Management University
Language: English
id sg-smu-ink.sis_research-5454
record_format dspace
spelling sg-smu-ink.sis_research-54542021-02-19T04:46:52Z A compact representation of human actions by sliding coordinate coding DING, Runwei SUN, Qianru LIU, Mengyuan LIU, Hong Human action recognition remains challenging in realistic videos, where scale and viewpoint changes make the problem complicated. Many complex models have been developed to overcome these difficulties, while we explore using low-level features and typical classifiers to achieve the state-of-the-art performance. The baseline model of feature encoding for action recognition is bag-of-words model, which has shown high efficiency but ignores the arrangement of local features. Refined methods compensate for this problem by using a large number of co-occurrence descriptors or a concatenation of the local distributions in designed segments. In contrast, this article proposes to encode the relative position of visual words using a simple but very compact method called sliding coordinates coding (SCC). The SCC vector of each kind of word is only an eight-dimensional vector which is more compact than many of the spatial or spatial-temporal pooling methods in the literature. Our key observation is that the relative position is robust to the variations of video scale and view angle. Additionally, we design a temporal cutting scheme to define the margin of coding within video clips, since visual words far away from each other have little relationship. In experiments, four action data sets, including KTH, Rochester Activities, IXMAS, and UCF YouTube, are used for performance evaluation. Results show that our method achieves comparable or better performance than the state of the art, while using more compact and less complex models. 2017-12-01T08:00:00Z text application/pdf https://ink.library.smu.edu.sg/sis_research/4451 info:doi/10.1177/1729881417746114 https://ink.library.smu.edu.sg/context/sis_research/article/5454/viewcontent/1729881417746114.pdf http://creativecommons.org/licenses/by-nc-nd/4.0/ Research Collection School Of Computing and Information Systems eng Institutional Knowledge at Singapore Management University Human action recognition bag-of-words model local feature Artificial Intelligence and Robotics Numerical Analysis and Scientific Computing
institution Singapore Management University
building SMU Libraries
continent Asia
country Singapore
Singapore
content_provider SMU Libraries
collection InK@SMU
language English
topic Human action recognition
bag-of-words model
local feature
Artificial Intelligence and Robotics
Numerical Analysis and Scientific Computing
spellingShingle Human action recognition
bag-of-words model
local feature
Artificial Intelligence and Robotics
Numerical Analysis and Scientific Computing
DING, Runwei
SUN, Qianru
LIU, Mengyuan
LIU, Hong
A compact representation of human actions by sliding coordinate coding
description Human action recognition remains challenging in realistic videos, where scale and viewpoint changes make the problem complicated. Many complex models have been developed to overcome these difficulties, while we explore using low-level features and typical classifiers to achieve the state-of-the-art performance. The baseline model of feature encoding for action recognition is bag-of-words model, which has shown high efficiency but ignores the arrangement of local features. Refined methods compensate for this problem by using a large number of co-occurrence descriptors or a concatenation of the local distributions in designed segments. In contrast, this article proposes to encode the relative position of visual words using a simple but very compact method called sliding coordinates coding (SCC). The SCC vector of each kind of word is only an eight-dimensional vector which is more compact than many of the spatial or spatial-temporal pooling methods in the literature. Our key observation is that the relative position is robust to the variations of video scale and view angle. Additionally, we design a temporal cutting scheme to define the margin of coding within video clips, since visual words far away from each other have little relationship. In experiments, four action data sets, including KTH, Rochester Activities, IXMAS, and UCF YouTube, are used for performance evaluation. Results show that our method achieves comparable or better performance than the state of the art, while using more compact and less complex models.
format text
author DING, Runwei
SUN, Qianru
LIU, Mengyuan
LIU, Hong
author_facet DING, Runwei
SUN, Qianru
LIU, Mengyuan
LIU, Hong
author_sort DING, Runwei
title A compact representation of human actions by sliding coordinate coding
title_short A compact representation of human actions by sliding coordinate coding
title_full A compact representation of human actions by sliding coordinate coding
title_fullStr A compact representation of human actions by sliding coordinate coding
title_full_unstemmed A compact representation of human actions by sliding coordinate coding
title_sort compact representation of human actions by sliding coordinate coding
publisher Institutional Knowledge at Singapore Management University
publishDate 2017
url https://ink.library.smu.edu.sg/sis_research/4451
https://ink.library.smu.edu.sg/context/sis_research/article/5454/viewcontent/1729881417746114.pdf
_version_ 1770574842662223872