Few-shot visual understanding with deep neural networks

Deep Neural Networks (DNNs) have become indispensable for a variety of computer vision tasks, such as image recognition, image segmentation, and object detection. The availability of large-scale labeled datasets and the powerful fitting capability of deep models are two crucial factors that contribu...

Full description

Saved in:
Bibliographic Details
Main Author: Zhang, Chi
Other Authors: Lin Guosheng
Format: Thesis-Doctor of Philosophy
Language:English
Published: Nanyang Technological University 2022
Subjects:
Online Access:https://hdl.handle.net/10356/154696
Tags: Add Tag
No Tags, Be the first to tag this record!
Institution: Nanyang Technological University
Language: English
Description
Summary:Deep Neural Networks (DNNs) have become indispensable for a variety of computer vision tasks, such as image recognition, image segmentation, and object detection. The availability of large-scale labeled datasets and the powerful fitting capability of deep models are two crucial factors that contribute to its success. However, models trained under fully supervised learning have many limitations that hinder their applications in real-world scenarios. For example, a trained CNN model can only apply to a set of pre-defined classes and it needs a large amount of labeled data to fine-tune a model for tasks on new categories. Moreover, data labelling can be very expensive for some vision tasks, such as image segmentation. Few-shot learning is proposed as a promising direction to alleviate the need for exhaustively labeled data by exploring a learning case where only a few labeled data is available to undertake a novel task based on prior knowledge learned on previous tasks. Typical application scenarios are few-shot image classification and few-shot image segmentation. In this thesis, we propose some novel algorithms to address the few-shot learning problems on image recognition and image segmentation tasks. In detail, for the task of few-shot image segmentation, we solve it as a message-passing problem that aims to extract useful information from the limited training data for the predictions on test images. We present two frameworks to achieve this goal. One employs the idea of prototype learning by representing the message in a category as a class-specific prototype, and the other method employs graph models to conduct message passing between local regions. For few-shot classification, we propose an algorithm that utilizes local representations in the images and structured distance to determine the image similarity for classification. To further solve the limitation in current few-shot learning methods that different few-shot learning algorithms often excel at different few-shot learning scenarios, we proffer to automate the selection from various few-shot learning designs and present a searching-based framework, which is inspired by the recent success in Automated Machine Learning literature (AutoML).