2D and 3D visual understanding with limited supervision

Existing fully supervised deep learning methods usually require a large number of training samples with abundant annotations for the model training, which is extremely expensive and labor-consuming. Therefore, in order to alleviate huge labeling costs, it is highly desirable to develop weakly superv...

Full description

Saved in:

Bibliographic Details
Main Author:	Wu, Zhonghua
Other Authors:	Lin Guosheng
Format:	Thesis-Doctor of Philosophy
Language:	English
Published:	Nanyang Technological University 2023
Subjects:	Engineering::Computer science and engineering::Computing methodologies::Image processing and computer vision Engineering::Computer science and engineering::Computing methodologies::Artificial intelligence
Online Access:	https://hdl.handle.net/10356/164693
Tags:	Add Tag No Tags, Be the first to tag this record!
Institution:	Nanyang Technological University
Language:	English

id	sg-ntu-dr.10356-164693
record_format	dspace
institution	Nanyang Technological University
building	NTU Library
continent	Asia
country	Singapore Singapore
content_provider	NTU Library
collection	DR-NTU
language	English
topic	Engineering::Computer science and engineering::Computing methodologies::Image processing and computer vision Engineering::Computer science and engineering::Computing methodologies::Artificial intelligence
spellingShingle	Engineering::Computer science and engineering::Computing methodologies::Image processing and computer vision Engineering::Computer science and engineering::Computing methodologies::Artificial intelligence Wu, Zhonghua 2D and 3D visual understanding with limited supervision
description	Existing fully supervised deep learning methods usually require a large number of training samples with abundant annotations for the model training, which is extremely expensive and labor-consuming. Therefore, in order to alleviate huge labeling costs, it is highly desirable to develop weakly supervised learning methods. Here, weakly supervised learning refers to that during the model training, the labels of the training data could be inexact, incomplete, or inaccurate. Typically, there are three types of weakly supervised learning scenarios: inexact supervision, incomplete supervision, and inaccurate supervision. In this thesis, we study all three types of weak supervision scenarios based on three different fundamental 2D and 3D recognition tasks including weakly supervised object detection (WSOD), few-shot image segmentation (FSS), and weakly supervised point cloud segmentation (WSPCS). Specifically, for WSOD, we only have the image-level annotations for the novel class images in the web domain, corresponding to inexact supervision. For FSS, we only have a few pixel-level labeled images (e.g., one or five images) for the novel classes, corresponding to incomplete supervision. For WSPCS, we consider partially labeled samples as weak annotations for the model training, i.e., only a few sparse points inside the whole scene are labeled and all other points are unlabeled, corresponding to incomplete supervision. Moreover, we observe one major limitation in existing consistency-based weakly supervised point cloud segmentation methods, i.e., unsatisfied pseudo labels due to the conventional confidence-based selection, which further leads to inaccurate supervision. For weakly supervised object detection, in Chapter 3, we propose a novel webly supervised object detection (WebSOD) method for novel class detection, which only requires the web images retrieved via the internet using class names as the keywords. Here, we only have the image-level annotations for web images during the model training. Our proposed method combines bottom-up and top-down cues. Within our approach, we introduce a bottom-up mechanism based on the well-trained fully supervised object detector (e.g., Faster RCNN) as an object region estimator for web images by recognizing the common objectiveness shared between base and novel classes. With the estimated regions on the web images, we then use the top-down attention cues as the guidance for region classification. Furthermore, we propose a residual feature refinement (RFR) block to tackle the domain mismatch between the web domain and the target domain. For few-shot image segmentation, currently, the state-of-the-art methods treat this task as a conditional foreground-background segmentation problem, assuming each class is independent. Different from existing methods, in Chapter 4, we introduce the concept of meta-class, which is the meta-information (e.g. certain middle-level features) shareable among all classes. To explicitly learn meta-class representations, we propose a novel Meta-class Memory-based few-shot segmentation method (MM-Net), where we introduce a set of learnable memory embeddings to memorize the meta-class information during the base class training and transfer it to novel classes during the inference stage. For weakly supervised point cloud segmentation, we only have a few sparse labeled points as well as a large number of unlabeled points. To exploit the unlabeled data, we design two different methods from the aspect of adversarial training and consistency training. Firstly, in Chapter 5, considering the smoothness-based methods have achieved promising progress, we advocate applying the consistency constraint under various perturbations to effectively regularize unlabeled 3D points. In particular, we propose a novel Dual Adaptive Transformations (DAT) model for weakly supervised point cloud segmentation, where the dual adaptive transformations are performed via an adversarial strategy at both point-level and region-level, aiming at enforcing the local and structural smoothness constraints on 3D point clouds. Secondly, in Chapter 6, we observe that the straightforward way of applying consistency constraints to weakly supervised point cloud segmentation has two major limitations: unsatisfied pseudo labels due to the conventional confidence-based selection and insufficient consistency constraints due to discarding unreliable pseudo labels. Therefore, we propose a novel Reliability-Adaptive Consistency Network (RAC-Net) to use both prediction confidence and model uncertainty to measure the reliability of pseudo labels and apply consistency training on all unlabeled points while with different consistency constraints for different points based on the reliability of corresponding pseudo labels.
author2	Lin Guosheng
author_facet	Lin Guosheng Wu, Zhonghua
format	Thesis-Doctor of Philosophy
author	Wu, Zhonghua
author_sort	Wu, Zhonghua
title	2D and 3D visual understanding with limited supervision
title_short	2D and 3D visual understanding with limited supervision
title_full	2D and 3D visual understanding with limited supervision
title_fullStr	2D and 3D visual understanding with limited supervision
title_full_unstemmed	2D and 3D visual understanding with limited supervision
title_sort	2d and 3d visual understanding with limited supervision
publisher	Nanyang Technological University
publishDate	2023
url	https://hdl.handle.net/10356/164693
_version_	1759856369931911168
spelling	sg-ntu-dr.10356-1646932023-03-06T07:30:04Z 2D and 3D visual understanding with limited supervision Wu, Zhonghua Lin Guosheng School of Computer Science and Engineering gslin@ntu.edu.sg Engineering::Computer science and engineering::Computing methodologies::Image processing and computer vision Engineering::Computer science and engineering::Computing methodologies::Artificial intelligence Existing fully supervised deep learning methods usually require a large number of training samples with abundant annotations for the model training, which is extremely expensive and labor-consuming. Therefore, in order to alleviate huge labeling costs, it is highly desirable to develop weakly supervised learning methods. Here, weakly supervised learning refers to that during the model training, the labels of the training data could be inexact, incomplete, or inaccurate. Typically, there are three types of weakly supervised learning scenarios: inexact supervision, incomplete supervision, and inaccurate supervision. In this thesis, we study all three types of weak supervision scenarios based on three different fundamental 2D and 3D recognition tasks including weakly supervised object detection (WSOD), few-shot image segmentation (FSS), and weakly supervised point cloud segmentation (WSPCS). Specifically, for WSOD, we only have the image-level annotations for the novel class images in the web domain, corresponding to inexact supervision. For FSS, we only have a few pixel-level labeled images (e.g., one or five images) for the novel classes, corresponding to incomplete supervision. For WSPCS, we consider partially labeled samples as weak annotations for the model training, i.e., only a few sparse points inside the whole scene are labeled and all other points are unlabeled, corresponding to incomplete supervision. Moreover, we observe one major limitation in existing consistency-based weakly supervised point cloud segmentation methods, i.e., unsatisfied pseudo labels due to the conventional confidence-based selection, which further leads to inaccurate supervision. For weakly supervised object detection, in Chapter 3, we propose a novel webly supervised object detection (WebSOD) method for novel class detection, which only requires the web images retrieved via the internet using class names as the keywords. Here, we only have the image-level annotations for web images during the model training. Our proposed method combines bottom-up and top-down cues. Within our approach, we introduce a bottom-up mechanism based on the well-trained fully supervised object detector (e.g., Faster RCNN) as an object region estimator for web images by recognizing the common objectiveness shared between base and novel classes. With the estimated regions on the web images, we then use the top-down attention cues as the guidance for region classification. Furthermore, we propose a residual feature refinement (RFR) block to tackle the domain mismatch between the web domain and the target domain. For few-shot image segmentation, currently, the state-of-the-art methods treat this task as a conditional foreground-background segmentation problem, assuming each class is independent. Different from existing methods, in Chapter 4, we introduce the concept of meta-class, which is the meta-information (e.g. certain middle-level features) shareable among all classes. To explicitly learn meta-class representations, we propose a novel Meta-class Memory-based few-shot segmentation method (MM-Net), where we introduce a set of learnable memory embeddings to memorize the meta-class information during the base class training and transfer it to novel classes during the inference stage. For weakly supervised point cloud segmentation, we only have a few sparse labeled points as well as a large number of unlabeled points. To exploit the unlabeled data, we design two different methods from the aspect of adversarial training and consistency training. Firstly, in Chapter 5, considering the smoothness-based methods have achieved promising progress, we advocate applying the consistency constraint under various perturbations to effectively regularize unlabeled 3D points. In particular, we propose a novel Dual Adaptive Transformations (DAT) model for weakly supervised point cloud segmentation, where the dual adaptive transformations are performed via an adversarial strategy at both point-level and region-level, aiming at enforcing the local and structural smoothness constraints on 3D point clouds. Secondly, in Chapter 6, we observe that the straightforward way of applying consistency constraints to weakly supervised point cloud segmentation has two major limitations: unsatisfied pseudo labels due to the conventional confidence-based selection and insufficient consistency constraints due to discarding unreliable pseudo labels. Therefore, we propose a novel Reliability-Adaptive Consistency Network (RAC-Net) to use both prediction confidence and model uncertainty to measure the reliability of pseudo labels and apply consistency training on all unlabeled points while with different consistency constraints for different points based on the reliability of corresponding pseudo labels. Doctor of Philosophy 2023-02-10T06:13:47Z 2023-02-10T06:13:47Z 2023 Thesis-Doctor of Philosophy Wu, Z. (2023). 2D and 3D visual understanding with limited supervision. Doctoral thesis, Nanyang Technological University, Singapore. https://hdl.handle.net/10356/164693 https://hdl.handle.net/10356/164693 10.32657/10356/164693 en This work is licensed under a Creative Commons Attribution-NonCommercial 4.0 International License (CC BY-NC 4.0). application/pdf Nanyang Technological University

2D and 3D visual understanding with limited supervision

Similar Items