Human action recognition based on comparative similarity

Human action recognition is a key issue in computer vision. This thesis aims to solve the problem of human action recognition, especially when there are few or no positive examples. This case is very important due to the intrinsic long-tailed distribution of categories in real world, which...

全面介紹

Saved in:
書目詳細資料
主要作者: Cao, Zhiguang
其他作者: School of Electrical and Electronic Engineering
格式: Theses and Dissertations
語言:English
出版: 2013
主題:
在線閱讀:http://hdl.handle.net/10356/54714
標簽: 添加標簽
沒有標簽, 成為第一個標記此記錄!
實物特徵
總結:Human action recognition is a key issue in computer vision. This thesis aims to solve the problem of human action recognition, especially when there are few or no positive examples. This case is very important due to the intrinsic long-tailed distribution of categories in real world, which means that for some categories, there are actually only few examples. It also indicates that the traditional classifiers could not work well for this situation because most of them should be trained based on sufficient positive examples for each category if they want to achieve a satisfactory accuracy rate. This thesis employs comparative similarity to tackle this problem, because human seem to manage with few or no visual examples by being told what an action is "like" and "dislike", thus a new action category could be defined in terms of existing ones. Through comparing with other actions, a better recognizing result could be obtained when there are not enough positive examples. In this thesis, human action is recognized based on videos rather than images, and to the best knowledge, it is the first time that comparative similarity are implied on video-based human action recognition. The whole experiments are performed on three popular action datasets, and two main steps are taken as follows: human action representation and classification. For a strong representation, interest points in each video are detected, described by HOGHOF feature, and then converted to visual-words to represent each video; for classification, two conventional SVM kernel machines are trained as baselines for comparison. Two other baselines related with the comparative similarity machine are also assumed. Relative results are shown in each chapter respectively. The final result indicates that when there are fair enough positive examples for each action category, they all could obtain satisfactory results. But when there are no or only a few positive examples for some categories, the classifier based on comparative similarity could achieve much higher accuracy rate than the other methods, justifying the performance of comparative similarity for case of few or no positive examples in human action recognition.