Advanced maximum margin learning techniques for large scale vision tasks
Computer science has been experiencing a paradigm shift because of the emergence of “big data”, so is computer vision. The explosion of images and videos on the Internet and the availability of large amounts of annotated data have brought out unprecedented opportunities and fundamental challenges on...
Saved in:
Main Author: | |
---|---|
Other Authors: | |
Format: | Theses and Dissertations |
Language: | English |
Published: |
2014
|
Subjects: | |
Online Access: | http://hdl.handle.net/10356/61769 |
Tags: |
Add Tag
No Tags, Be the first to tag this record!
|
Institution: | Nanyang Technological University |
Language: | English |
Summary: | Computer science has been experiencing a paradigm shift because of the emergence of “big data”, so is computer vision. The explosion of images and videos on the Internet and the availability of large amounts of annotated data have brought out unprecedented opportunities and fundamental challenges on exploiting large-scale datasets for computer vision tasks, such as image retrieval, image classification, object recognition, and event recognition. With more data available, more sophisticated model can be learned to accomplish more difficult task; however, how to effectively and efficiently leverage large-scale datasets for computer vision tasks is challenging due to the lack of high quality labeled data, domain distribution difference, scalability from a large number of samples and classes. Considering that maximum margin learning has been proven to be effective in many real-world applications, this thesis aims to solve the challenging issues in several large-scale vision tasks by developing advanced maximum margin learning techniques. In this thesis, we investigate challenging issues in four large-scale computer vision tasks: 1. tag-based image retrieval with weakly labeled training web images; 2. visual event recognition in consumer videos with weakly labeled training web images and videos; 3. semi-supervised learning with large amount of unlabeled data for visual recognition; and 4. class hierarchy learning with many classes for visual recognition. These four tasks are connected not only because they are facing with large-scale datasets, but also because they are all solved by our developed advanced maximum margin learning techniques. For both Task 1 and Task 2, we leverage a large amount of weakly labeled web data (images/videos) without the requirement for accurately labeled training data. In Task 1, we observe that some unpopular tags are only associated with a small number of relevant training web images; therefore, we propose a classification method called SVM with Augmented Features (AFSVM) to utilize the inter-correlation among concepts by
leveraging the pre-learned SVM classifiers of popular tags that are associated with a large number of relevant training web images, whereby the knowledge can be transferred from popular concepts to unpopular concepts. While Task 1 mainly focuses on cross-category knowledge transfer, Task 2 is about cross-domain knowledge transfer. In particular, we observe that web domain (i.e., source domain) and consumer domain (i.e., target domain) have different data distributions, in Task 2, we formulate this task as a domain adaptation problem and propose a maximum margin learning technique to solve it. In Task 3 and Task 4, we try to handle to scalability issues when there are a large amount of unlabeled data and many classes, respectively. In order to leverage the large amount of unlabeled data, in Task 3, we propose an efficient and effective Semi-Supervised Learning (SSL) framework based on manifold regularization. In order to conduct rapid multi-class prediction when there are many classes, in Task 4, we propose an efficient discriminative learning framework to learn class hierarchical model by decomposing the original many-class problem into a sequence of much smaller sub-problems with an adaptive classifier updating method and an active class selection strategy. |
---|