Human action recognition based on comparative similarity
Human action recognition is a key issue in computer vision. This thesis aims to solve the problem of human action recognition, especially when there are few or no positive examples. This case is very important due to the intrinsic long-tailed distribution of categories in real world, which...
Saved in:
Main Author: | |
---|---|
Other Authors: | |
Format: | Theses and Dissertations |
Language: | English |
Published: |
2013
|
Subjects: | |
Online Access: | http://hdl.handle.net/10356/54714 |
Tags: |
Add Tag
No Tags, Be the first to tag this record!
|
Institution: | Nanyang Technological University |
Language: | English |
Summary: | Human action recognition is a key issue in computer vision. This thesis aims to
solve the problem of human action recognition, especially when there are few or no
positive examples. This case is very important due to the intrinsic long-tailed
distribution of categories in real world, which means that for some categories, there are
actually only few examples. It also indicates that the traditional classifiers could not
work well for this situation because most of them should be trained based on sufficient
positive examples for each category if they want to achieve a satisfactory accuracy rate.
This thesis employs comparative similarity to tackle this problem, because human
seem to manage with few or no visual examples by being told what an action is "like"
and "dislike", thus a new action category could be defined in terms of existing ones.
Through comparing with other actions, a better recognizing result could be obtained
when there are not enough positive examples. In this thesis, human action is recognized
based on videos rather than images, and to the best knowledge, it is the first time that
comparative similarity are implied on video-based human action recognition.
The whole experiments are performed on three popular action datasets, and two
main steps are taken as follows: human action representation and classification. For a
strong representation, interest points in each video are detected, described by HOGHOF
feature, and then converted to visual-words to represent each video; for classification,
two conventional SVM kernel machines are trained as baselines for comparison. Two
other baselines related with the comparative similarity machine are also assumed.
Relative results are shown in each chapter respectively. The final result indicates
that when there are fair enough positive examples for each action category, they all
could obtain satisfactory results. But when there are no or only a few positive examples
for some categories, the classifier based on comparative similarity could achieve much
higher accuracy rate than the other methods, justifying the performance of comparative
similarity for case of few or no positive examples in human action recognition. |
---|