Human action recognition based on comparative similarity

Human action recognition is a key issue in computer vision. This thesis aims to solve the problem of human action recognition, especially when there are few or no positive examples. This case is very important due to the intrinsic long-tailed distribution of categories in real world, which...

Full description

Saved in:
Bibliographic Details
Main Author: Cao, Zhiguang
Other Authors: School of Electrical and Electronic Engineering
Format: Theses and Dissertations
Language:English
Published: 2013
Subjects:
Online Access:http://hdl.handle.net/10356/54714
Tags: Add Tag
No Tags, Be the first to tag this record!
Institution: Nanyang Technological University
Language: English
Description
Summary:Human action recognition is a key issue in computer vision. This thesis aims to solve the problem of human action recognition, especially when there are few or no positive examples. This case is very important due to the intrinsic long-tailed distribution of categories in real world, which means that for some categories, there are actually only few examples. It also indicates that the traditional classifiers could not work well for this situation because most of them should be trained based on sufficient positive examples for each category if they want to achieve a satisfactory accuracy rate. This thesis employs comparative similarity to tackle this problem, because human seem to manage with few or no visual examples by being told what an action is "like" and "dislike", thus a new action category could be defined in terms of existing ones. Through comparing with other actions, a better recognizing result could be obtained when there are not enough positive examples. In this thesis, human action is recognized based on videos rather than images, and to the best knowledge, it is the first time that comparative similarity are implied on video-based human action recognition. The whole experiments are performed on three popular action datasets, and two main steps are taken as follows: human action representation and classification. For a strong representation, interest points in each video are detected, described by HOGHOF feature, and then converted to visual-words to represent each video; for classification, two conventional SVM kernel machines are trained as baselines for comparison. Two other baselines related with the comparative similarity machine are also assumed. Relative results are shown in each chapter respectively. The final result indicates that when there are fair enough positive examples for each action category, they all could obtain satisfactory results. But when there are no or only a few positive examples for some categories, the classifier based on comparative similarity could achieve much higher accuracy rate than the other methods, justifying the performance of comparative similarity for case of few or no positive examples in human action recognition.