Towards robust visual recognition: learning from imperfect data

Though Deep Convolutional Neural Networks (DCNN) have shown success in many tasks in the field of computer vision, the huge effort made in constructing large-scale annotated datasets is indispensable. Even the prevailing models can fail when the dataset does not cover enough samples. For example, fo...

Full description

Saved in:

Bibliographic Details
Main Author:	Hu, Xinting
Other Authors:	Miao Chun Yan
Format:	Thesis-Doctor of Philosophy
Language:	English
Published:	Nanyang Technological University 2023
Subjects:	Engineering::Computer science and engineering
Online Access:	https://hdl.handle.net/10356/164415
Tags:	Add Tag No Tags, Be the first to tag this record!
Institution:	Nanyang Technological University
Language:	English

id	sg-ntu-dr.10356-164415
record_format	dspace
spelling	sg-ntu-dr.10356-1644152023-02-01T03:20:55Z Towards robust visual recognition: learning from imperfect data Hu, Xinting Miao Chun Yan Zhang Hanwang School of Computer Science and Engineering hanwangzhang@ntu.edu.sg, ASCYMiao@ntu.edu.sg Engineering::Computer science and engineering Though Deep Convolutional Neural Networks (DCNN) have shown success in many tasks in the field of computer vision, the huge effort made in constructing large-scale annotated datasets is indispensable. Even the prevailing models can fail when the dataset does not cover enough samples. For example, for the most fundamental vision task -- classification task, when we shrink the dataset size in half, the error rates are doubled. Such data-hungry neural networks raise a question: can creating a large enough dataset improve the performance once and for all? The answer is NO. First of all, in many real-world applications, such as medical imaging and autonomous driving, the availability of large quantities of labeled data can be severe due to the scarcity of certain cases. Apart from the collection difficulty and the cost of expert knowledge, bias in the annotation process is unavoidable. One representative example is the long-tail distribution -- a few head classes occupy most of the instances due to either human preferences or the natural observation probability. Moreover, even if we can post-process the dataset so that it is individually and independently distributed over all the classes, the dataset can be out-of-date due to social changes. One example is that the images of the ``phone" now are totally different from those of ten years ago. Novel classes and data are continuously shown, which makes the current dataset limited by time. Based on the above discussions, we argue that a robust AI recognizer needs to handle unbalanced, data-scarce, data-changing datasets, and we generalize those kinds of data as ``imperfect" data. In other words, for the current data-driven DCNN, learning from imperfect data refers to learning with biased, limited, or changing supervision. In this thesis, we study the imperfect data from those discussed aspects -- unbalanced data, insufficient data, and time-varying data. Specifically, in Chapter 3, we analyze the large-scale long-tailed instance segmentation, where a few head classes occupy most of the instances. In 4we further discussed the semi-supervision case where only a small fraction of data are labeled and the data imbalance also exists in the labeled data. In Chapter 5, we focus on the life-long learning scenarios where the labeled data changes with time. In face of different kinds of imperfections, different solutions are proposed. For long-tail data, we proposed two essentials --- the imbalance across classes and the few-shot learning for tail classes, and deal with them together in one incremental few-shot learning paradigm. For insufficient labeled data, we exploited the importance of large-scale unlabeled data. We observed that the variety in the unlabeled data can help to fix the erroneous predictions generated over scarce labeled data, especially when the labeled data are also biased. For time-varying data, we created a uniform framework explaining how previous methods fight against the forgetting of old classes when learning new knowledge and providing a principled solution. To summarize, we discussed different kinds of data imperfections in this thesis. We hope this work can provide a new solution instead of constructing larger and larger datasets when applying neural networks to real-world applications. Doctor of Philosophy 2023-01-25T05:16:11Z 2023-01-25T05:16:11Z 2022 Thesis-Doctor of Philosophy Hu, X. (2022). Towards robust visual recognition: learning from imperfect data. Doctoral thesis, Nanyang Technological University, Singapore. https://hdl.handle.net/10356/164415 https://hdl.handle.net/10356/164415 10.32657/10356/164415 en This work is licensed under a Creative Commons Attribution-NonCommercial 4.0 International License (CC BY-NC 4.0). application/pdf Nanyang Technological University
institution	Nanyang Technological University
building	NTU Library
continent	Asia
country	Singapore Singapore
content_provider	NTU Library
collection	DR-NTU
language	English
topic	Engineering::Computer science and engineering
spellingShingle	Engineering::Computer science and engineering Hu, Xinting Towards robust visual recognition: learning from imperfect data
description	Though Deep Convolutional Neural Networks (DCNN) have shown success in many tasks in the field of computer vision, the huge effort made in constructing large-scale annotated datasets is indispensable. Even the prevailing models can fail when the dataset does not cover enough samples. For example, for the most fundamental vision task -- classification task, when we shrink the dataset size in half, the error rates are doubled. Such data-hungry neural networks raise a question: can creating a large enough dataset improve the performance once and for all? The answer is NO. First of all, in many real-world applications, such as medical imaging and autonomous driving, the availability of large quantities of labeled data can be severe due to the scarcity of certain cases. Apart from the collection difficulty and the cost of expert knowledge, bias in the annotation process is unavoidable. One representative example is the long-tail distribution -- a few head classes occupy most of the instances due to either human preferences or the natural observation probability. Moreover, even if we can post-process the dataset so that it is individually and independently distributed over all the classes, the dataset can be out-of-date due to social changes. One example is that the images of the ``phone" now are totally different from those of ten years ago. Novel classes and data are continuously shown, which makes the current dataset limited by time. Based on the above discussions, we argue that a robust AI recognizer needs to handle unbalanced, data-scarce, data-changing datasets, and we generalize those kinds of data as ``imperfect" data. In other words, for the current data-driven DCNN, learning from imperfect data refers to learning with biased, limited, or changing supervision. In this thesis, we study the imperfect data from those discussed aspects -- unbalanced data, insufficient data, and time-varying data. Specifically, in Chapter 3, we analyze the large-scale long-tailed instance segmentation, where a few head classes occupy most of the instances. In 4we further discussed the semi-supervision case where only a small fraction of data are labeled and the data imbalance also exists in the labeled data. In Chapter 5, we focus on the life-long learning scenarios where the labeled data changes with time. In face of different kinds of imperfections, different solutions are proposed. For long-tail data, we proposed two essentials --- the imbalance across classes and the few-shot learning for tail classes, and deal with them together in one incremental few-shot learning paradigm. For insufficient labeled data, we exploited the importance of large-scale unlabeled data. We observed that the variety in the unlabeled data can help to fix the erroneous predictions generated over scarce labeled data, especially when the labeled data are also biased. For time-varying data, we created a uniform framework explaining how previous methods fight against the forgetting of old classes when learning new knowledge and providing a principled solution. To summarize, we discussed different kinds of data imperfections in this thesis. We hope this work can provide a new solution instead of constructing larger and larger datasets when applying neural networks to real-world applications.
author2	Miao Chun Yan
author_facet	Miao Chun Yan Hu, Xinting
format	Thesis-Doctor of Philosophy
author	Hu, Xinting
author_sort	Hu, Xinting
title	Towards robust visual recognition: learning from imperfect data
title_short	Towards robust visual recognition: learning from imperfect data
title_full	Towards robust visual recognition: learning from imperfect data
title_fullStr	Towards robust visual recognition: learning from imperfect data
title_full_unstemmed	Towards robust visual recognition: learning from imperfect data
title_sort	towards robust visual recognition: learning from imperfect data
publisher	Nanyang Technological University
publishDate	2023
url	https://hdl.handle.net/10356/164415
_version_	1757048190333353984

Towards robust visual recognition: learning from imperfect data

Similar Items