Towards robust visual recognition: learning from imperfect data
Though Deep Convolutional Neural Networks (DCNN) have shown success in many tasks in the field of computer vision, the huge effort made in constructing large-scale annotated datasets is indispensable. Even the prevailing models can fail when the dataset does not cover enough samples. For example, fo...
Saved in:
Main Author: | |
---|---|
Other Authors: | |
Format: | Thesis-Doctor of Philosophy |
Language: | English |
Published: |
Nanyang Technological University
2023
|
Subjects: | |
Online Access: | https://hdl.handle.net/10356/164415 |
Tags: |
Add Tag
No Tags, Be the first to tag this record!
|
Institution: | Nanyang Technological University |
Language: | English |
Summary: | Though Deep Convolutional Neural Networks (DCNN) have shown success in many tasks in the field of computer vision, the huge effort made in constructing large-scale annotated datasets is indispensable. Even the prevailing models can fail when the dataset does not cover enough samples. For example, for the most fundamental vision task -- classification task, when we shrink the dataset size in half, the error rates are doubled.
Such data-hungry neural networks raise a question: can creating a large enough dataset improve the performance once and for all? The answer is NO. First of all, in many real-world applications, such as medical imaging and autonomous driving, the availability of large quantities of labeled data can be severe due to the scarcity of certain cases. Apart from the collection difficulty and the cost of expert knowledge, bias in the annotation process is unavoidable. One representative example is the long-tail distribution -- a few head classes occupy most of the instances due to either human preferences or the natural observation probability. Moreover, even if we can post-process the dataset so that it is individually and independently distributed over all the classes, the dataset can be out-of-date due to social changes. One example is that the images of the ``phone" now are totally different from those of ten years ago. Novel classes and data are continuously shown, which makes the current dataset limited by time. Based on the above discussions, we argue that a robust AI recognizer needs to handle unbalanced, data-scarce, data-changing datasets, and we generalize those kinds of data as ``imperfect" data. In other words, for the current data-driven DCNN, learning from imperfect data refers to learning with biased, limited, or changing supervision.
In this thesis, we study the imperfect data from those discussed aspects -- unbalanced data, insufficient data, and time-varying data. Specifically, in Chapter 3, we analyze the large-scale long-tailed instance segmentation, where a few head classes occupy most of the instances. In 4we further discussed the semi-supervision case where only a small fraction of data are labeled and the data imbalance also exists in the labeled data. In Chapter 5, we focus on the life-long learning scenarios where the labeled data changes with time.
In face of different kinds of imperfections, different solutions are proposed. For long-tail data, we proposed two essentials --- the imbalance across classes and the few-shot learning for tail classes, and deal with them together in one incremental few-shot learning paradigm.
For insufficient labeled data, we exploited the importance of large-scale unlabeled data. We observed that the variety in the unlabeled data can help to fix the erroneous predictions generated over scarce labeled data, especially when the labeled data are also biased. For time-varying data, we created a uniform framework explaining how previous methods fight against the forgetting of old classes when learning new knowledge and providing a principled solution.
To summarize, we discussed different kinds of data imperfections in this thesis. We hope this work can provide a new solution instead of constructing larger and larger datasets when applying neural networks to real-world applications. |
---|