Learning spatio-temporal co-occurrence correlograms for efficient human action classification
Spatio-temporal interest point (STIP) based features show great promises in human action analysis with high efficiency and robustness. However, they typically focus on bag-of-visual words (BoVW), which omits any correlation among words and shows limited discrimination in real-world videos. In this p...
Saved in:
Main Authors: | , |
---|---|
Format: | text |
Language: | English |
Published: |
Institutional Knowledge at Singapore Management University
2013
|
Subjects: | |
Online Access: | https://ink.library.smu.edu.sg/sis_research/4465 https://ink.library.smu.edu.sg/context/sis_research/article/5468/viewcontent/Template.pdf |
Tags: |
Add Tag
No Tags, Be the first to tag this record!
|
Institution: | Singapore Management University |
Language: | English |
Summary: | Spatio-temporal interest point (STIP) based features show great promises in human action analysis with high efficiency and robustness. However, they typically focus on bag-of-visual words (BoVW), which omits any correlation among words and shows limited discrimination in real-world videos. In this paper, we propose a novel approach to add the spatio-temporal co-occurrence relationships of visual words to BoVW for a richer representation. Rather than assigning a particular scale on videos, we adopt the normalized google-like distance (NGLD) to measure the words' co-occurrence semantics, which grasps the videos' structure information in a statistical way. All pairwise distances in spatial and temporal domain compose the corresponding NGLD correlograms, then their united form is incorporated with BoVW by training a multi-channel kernel SVM classifier. Experiments on real-world datasets (KTH and UCF sports) validate the efficiency of our approach for the classification of human actions. |
---|