Towards robust visual recognition: learning from imperfect data

Though Deep Convolutional Neural Networks (DCNN) have shown success in many tasks in the field of computer vision, the huge effort made in constructing large-scale annotated datasets is indispensable. Even the prevailing models can fail when the dataset does not cover enough samples. For example, fo...

Full description

Saved in:
Bibliographic Details
Main Author: Hu, Xinting
Other Authors: Miao Chun Yan
Format: Thesis-Doctor of Philosophy
Language:English
Published: Nanyang Technological University 2023
Subjects:
Online Access:https://hdl.handle.net/10356/164415
Tags: Add Tag
No Tags, Be the first to tag this record!
Institution: Nanyang Technological University
Language: English
id sg-ntu-dr.10356-164415
record_format dspace
spelling sg-ntu-dr.10356-1644152023-02-01T03:20:55Z Towards robust visual recognition: learning from imperfect data Hu, Xinting Miao Chun Yan Zhang Hanwang School of Computer Science and Engineering hanwangzhang@ntu.edu.sg, ASCYMiao@ntu.edu.sg Engineering::Computer science and engineering Though Deep Convolutional Neural Networks (DCNN) have shown success in many tasks in the field of computer vision, the huge effort made in constructing large-scale annotated datasets is indispensable. Even the prevailing models can fail when the dataset does not cover enough samples. For example, for the most fundamental vision task -- classification task, when we shrink the dataset size in half, the error rates are doubled. Such data-hungry neural networks raise a question: can creating a large enough dataset improve the performance once and for all? The answer is NO. First of all, in many real-world applications, such as medical imaging and autonomous driving, the availability of large quantities of labeled data can be severe due to the scarcity of certain cases. Apart from the collection difficulty and the cost of expert knowledge, bias in the annotation process is unavoidable. One representative example is the long-tail distribution -- a few head classes occupy most of the instances due to either human preferences or the natural observation probability. Moreover, even if we can post-process the dataset so that it is individually and independently distributed over all the classes, the dataset can be out-of-date due to social changes. One example is that the images of the ``phone" now are totally different from those of ten years ago. Novel classes and data are continuously shown, which makes the current dataset limited by time. Based on the above discussions, we argue that a robust AI recognizer needs to handle unbalanced, data-scarce, data-changing datasets, and we generalize those kinds of data as ``imperfect" data. In other words, for the current data-driven DCNN, learning from imperfect data refers to learning with biased, limited, or changing supervision. In this thesis, we study the imperfect data from those discussed aspects -- unbalanced data, insufficient data, and time-varying data. Specifically, in Chapter 3, we analyze the large-scale long-tailed instance segmentation, where a few head classes occupy most of the instances. In 4we further discussed the semi-supervision case where only a small fraction of data are labeled and the data imbalance also exists in the labeled data. In Chapter 5, we focus on the life-long learning scenarios where the labeled data changes with time. In face of different kinds of imperfections, different solutions are proposed. For long-tail data, we proposed two essentials --- the imbalance across classes and the few-shot learning for tail classes, and deal with them together in one incremental few-shot learning paradigm. For insufficient labeled data, we exploited the importance of large-scale unlabeled data. We observed that the variety in the unlabeled data can help to fix the erroneous predictions generated over scarce labeled data, especially when the labeled data are also biased. For time-varying data, we created a uniform framework explaining how previous methods fight against the forgetting of old classes when learning new knowledge and providing a principled solution. To summarize, we discussed different kinds of data imperfections in this thesis. We hope this work can provide a new solution instead of constructing larger and larger datasets when applying neural networks to real-world applications. Doctor of Philosophy 2023-01-25T05:16:11Z 2023-01-25T05:16:11Z 2022 Thesis-Doctor of Philosophy Hu, X. (2022). Towards robust visual recognition: learning from imperfect data. Doctoral thesis, Nanyang Technological University, Singapore. https://hdl.handle.net/10356/164415 https://hdl.handle.net/10356/164415 10.32657/10356/164415 en This work is licensed under a Creative Commons Attribution-NonCommercial 4.0 International License (CC BY-NC 4.0). application/pdf Nanyang Technological University
institution Nanyang Technological University
building NTU Library
continent Asia
country Singapore
Singapore
content_provider NTU Library
collection DR-NTU
language English
topic Engineering::Computer science and engineering
spellingShingle Engineering::Computer science and engineering
Hu, Xinting
Towards robust visual recognition: learning from imperfect data
description Though Deep Convolutional Neural Networks (DCNN) have shown success in many tasks in the field of computer vision, the huge effort made in constructing large-scale annotated datasets is indispensable. Even the prevailing models can fail when the dataset does not cover enough samples. For example, for the most fundamental vision task -- classification task, when we shrink the dataset size in half, the error rates are doubled. Such data-hungry neural networks raise a question: can creating a large enough dataset improve the performance once and for all? The answer is NO. First of all, in many real-world applications, such as medical imaging and autonomous driving, the availability of large quantities of labeled data can be severe due to the scarcity of certain cases. Apart from the collection difficulty and the cost of expert knowledge, bias in the annotation process is unavoidable. One representative example is the long-tail distribution -- a few head classes occupy most of the instances due to either human preferences or the natural observation probability. Moreover, even if we can post-process the dataset so that it is individually and independently distributed over all the classes, the dataset can be out-of-date due to social changes. One example is that the images of the ``phone" now are totally different from those of ten years ago. Novel classes and data are continuously shown, which makes the current dataset limited by time. Based on the above discussions, we argue that a robust AI recognizer needs to handle unbalanced, data-scarce, data-changing datasets, and we generalize those kinds of data as ``imperfect" data. In other words, for the current data-driven DCNN, learning from imperfect data refers to learning with biased, limited, or changing supervision. In this thesis, we study the imperfect data from those discussed aspects -- unbalanced data, insufficient data, and time-varying data. Specifically, in Chapter 3, we analyze the large-scale long-tailed instance segmentation, where a few head classes occupy most of the instances. In 4we further discussed the semi-supervision case where only a small fraction of data are labeled and the data imbalance also exists in the labeled data. In Chapter 5, we focus on the life-long learning scenarios where the labeled data changes with time. In face of different kinds of imperfections, different solutions are proposed. For long-tail data, we proposed two essentials --- the imbalance across classes and the few-shot learning for tail classes, and deal with them together in one incremental few-shot learning paradigm. For insufficient labeled data, we exploited the importance of large-scale unlabeled data. We observed that the variety in the unlabeled data can help to fix the erroneous predictions generated over scarce labeled data, especially when the labeled data are also biased. For time-varying data, we created a uniform framework explaining how previous methods fight against the forgetting of old classes when learning new knowledge and providing a principled solution. To summarize, we discussed different kinds of data imperfections in this thesis. We hope this work can provide a new solution instead of constructing larger and larger datasets when applying neural networks to real-world applications.
author2 Miao Chun Yan
author_facet Miao Chun Yan
Hu, Xinting
format Thesis-Doctor of Philosophy
author Hu, Xinting
author_sort Hu, Xinting
title Towards robust visual recognition: learning from imperfect data
title_short Towards robust visual recognition: learning from imperfect data
title_full Towards robust visual recognition: learning from imperfect data
title_fullStr Towards robust visual recognition: learning from imperfect data
title_full_unstemmed Towards robust visual recognition: learning from imperfect data
title_sort towards robust visual recognition: learning from imperfect data
publisher Nanyang Technological University
publishDate 2023
url https://hdl.handle.net/10356/164415
_version_ 1757048190333353984