Image segmentation with less manual labeling effort

Semantic segmentation is a task that classifies each pixel into a particular class. With the help of deep learning, fully supervised segmentation has achieved remarkable performance. However, fully supervised learning has critical intrinsic limitations, which is that it often requires a prohibitivel...

Full description

Saved in:
Bibliographic Details
Main Author: Liu, Weide
Other Authors: Lin Guosheng
Format: Thesis-Doctor of Philosophy
Language:English
Published: Nanyang Technological University 2022
Subjects:
Online Access:https://hdl.handle.net/10356/162800
Tags: Add Tag
No Tags, Be the first to tag this record!
Institution: Nanyang Technological University
Language: English
id sg-ntu-dr.10356-162800
record_format dspace
institution Nanyang Technological University
building NTU Library
continent Asia
country Singapore
Singapore
content_provider NTU Library
collection DR-NTU
language English
topic Engineering::Computer science and engineering
spellingShingle Engineering::Computer science and engineering
Liu, Weide
Image segmentation with less manual labeling effort
description Semantic segmentation is a task that classifies each pixel into a particular class. With the help of deep learning, fully supervised segmentation has achieved remarkable performance. However, fully supervised learning has critical intrinsic limitations, which is that it often requires a prohibitively large number of pixel-level annotated images for model training. Collecting the labeled data can be notoriously expensive in dense prediction tasks like semantic segmentation, instance segmentation, and video segmentation. To alleviate or even free researchers from the high cost of laborious annotations, this thesis tackles the problem mentioned above from two aspects: few-shot segmentation and weakly supervised segmentation. Few-shot segmentation is proposed to learn a network to predict segmentation masks for the novel classes with only a few newly annotated training samples. On the other hand, weakly supervised segmentation is proposed to learn a pixel-level network with weaker annotations. The annotations can be obtained in a much-eased manner, such as bounding boxes, scribbles, image labels, and points, rather than labeling all pixels in an image. In the first aspect, we aim to improve the few-shot segmentation performance with the following innovations: Firstly, we propose a Cross-Reference and Local-Global Condition Network (CRCNet) to concurrently make predictions for both the support image and the query image to mine out the same category objects for the few-shot segmentation. To further improve object feature representation, we develop a local-global condition module to capture both global and local relations. As there is a massive variance in the object appearances, mining foreground regions in images can be multi-step. We also develop a mask refinement module to refine the prediction of the target object regions recurrently. After that, we propose a Query Guided Network (QGNet) to extract the information from the query itself independently to benefit the few-shot segmentation task. We propose a prior extractor to learn the query information from the unlabeled images with our proposed global-local contrastive learning. With the prior extractor, the extraction of query information is detached from the support branch, overcoming the limitation by support, and could obtain more informative query clues to achieve better interaction. In the second aspect, we focus on weakly-supervised segmentation, aiming to predict the pixel-level mask by learning a network supervised with the image-level annotation. The quality of the Class Activation Maps (CAMs) has a crucial impact on the performance of the weakly supervised segmentation model. Weakly supervised image segmentation trained with image-level labels usually suffers from inaccurate coverage of object areas during the generation of the pseudo groundtruth. This is because the CAMs are trained with the classification objective and lack the ability to generalize. We aim to improve the quality of CAMs to improve the weakly-supervised segmentation performance from different aspects. \vspace{-0.05cm} Firstly, we will discuss using a bipartite graph to locate the object-activated areas in two images containing common classes. The matching areas are then used to refine the predicted object regions in the CAMs. In particular, we propose the maximum bipartite matching network (MBMNet) to map the paired images with a bipartite graph. Then we utilize the maximum matching algorithm to locate corresponding areas in the paired images. The matching areas are used to enhance the corresponded feature representations. Based on the enhanced feature representations, we can generate better CAMs with more object regions involved. Finally, we propose a region prototypical network (RPNet) to explore the cross-image object diversity of the training set to enhance the object activated maps for weakly supervised segmentation. Similar object parts across images are identified via region feature comparison. Object confidence is propagated between regions to discover and re-activate new object areas while background regions are suppressed. We aim to obtain a more complete pseudo ground truth for the weakly supervised segmentation based on the re-activated feature maps. In summary, with CRCNet and QGNet, we improved the few-shot segmentation performance with a cross-reference mechanism and global-local contrastive learning. With our proposed MBMNet and RPNet, we enhanced the object activated maps and improved the performance of the weakly supervised segmentation by discovering new object areas. We have achieved new state-of-the-art segmentation performance on public benchmarks for both tasks.
author2 Lin Guosheng
author_facet Lin Guosheng
Liu, Weide
format Thesis-Doctor of Philosophy
author Liu, Weide
author_sort Liu, Weide
title Image segmentation with less manual labeling effort
title_short Image segmentation with less manual labeling effort
title_full Image segmentation with less manual labeling effort
title_fullStr Image segmentation with less manual labeling effort
title_full_unstemmed Image segmentation with less manual labeling effort
title_sort image segmentation with less manual labeling effort
publisher Nanyang Technological University
publishDate 2022
url https://hdl.handle.net/10356/162800
_version_ 1753801149871292416
spelling sg-ntu-dr.10356-1628002022-12-07T06:25:18Z Image segmentation with less manual labeling effort Liu, Weide Lin Guosheng School of Computer Science and Engineering gslin@ntu.edu.sg Engineering::Computer science and engineering Semantic segmentation is a task that classifies each pixel into a particular class. With the help of deep learning, fully supervised segmentation has achieved remarkable performance. However, fully supervised learning has critical intrinsic limitations, which is that it often requires a prohibitively large number of pixel-level annotated images for model training. Collecting the labeled data can be notoriously expensive in dense prediction tasks like semantic segmentation, instance segmentation, and video segmentation. To alleviate or even free researchers from the high cost of laborious annotations, this thesis tackles the problem mentioned above from two aspects: few-shot segmentation and weakly supervised segmentation. Few-shot segmentation is proposed to learn a network to predict segmentation masks for the novel classes with only a few newly annotated training samples. On the other hand, weakly supervised segmentation is proposed to learn a pixel-level network with weaker annotations. The annotations can be obtained in a much-eased manner, such as bounding boxes, scribbles, image labels, and points, rather than labeling all pixels in an image. In the first aspect, we aim to improve the few-shot segmentation performance with the following innovations: Firstly, we propose a Cross-Reference and Local-Global Condition Network (CRCNet) to concurrently make predictions for both the support image and the query image to mine out the same category objects for the few-shot segmentation. To further improve object feature representation, we develop a local-global condition module to capture both global and local relations. As there is a massive variance in the object appearances, mining foreground regions in images can be multi-step. We also develop a mask refinement module to refine the prediction of the target object regions recurrently. After that, we propose a Query Guided Network (QGNet) to extract the information from the query itself independently to benefit the few-shot segmentation task. We propose a prior extractor to learn the query information from the unlabeled images with our proposed global-local contrastive learning. With the prior extractor, the extraction of query information is detached from the support branch, overcoming the limitation by support, and could obtain more informative query clues to achieve better interaction. In the second aspect, we focus on weakly-supervised segmentation, aiming to predict the pixel-level mask by learning a network supervised with the image-level annotation. The quality of the Class Activation Maps (CAMs) has a crucial impact on the performance of the weakly supervised segmentation model. Weakly supervised image segmentation trained with image-level labels usually suffers from inaccurate coverage of object areas during the generation of the pseudo groundtruth. This is because the CAMs are trained with the classification objective and lack the ability to generalize. We aim to improve the quality of CAMs to improve the weakly-supervised segmentation performance from different aspects. \vspace{-0.05cm} Firstly, we will discuss using a bipartite graph to locate the object-activated areas in two images containing common classes. The matching areas are then used to refine the predicted object regions in the CAMs. In particular, we propose the maximum bipartite matching network (MBMNet) to map the paired images with a bipartite graph. Then we utilize the maximum matching algorithm to locate corresponding areas in the paired images. The matching areas are used to enhance the corresponded feature representations. Based on the enhanced feature representations, we can generate better CAMs with more object regions involved. Finally, we propose a region prototypical network (RPNet) to explore the cross-image object diversity of the training set to enhance the object activated maps for weakly supervised segmentation. Similar object parts across images are identified via region feature comparison. Object confidence is propagated between regions to discover and re-activate new object areas while background regions are suppressed. We aim to obtain a more complete pseudo ground truth for the weakly supervised segmentation based on the re-activated feature maps. In summary, with CRCNet and QGNet, we improved the few-shot segmentation performance with a cross-reference mechanism and global-local contrastive learning. With our proposed MBMNet and RPNet, we enhanced the object activated maps and improved the performance of the weakly supervised segmentation by discovering new object areas. We have achieved new state-of-the-art segmentation performance on public benchmarks for both tasks. Doctor of Philosophy 2022-11-09T05:57:47Z 2022-11-09T05:57:47Z 2022 Thesis-Doctor of Philosophy Liu, W. (2022). Image segmentation with less manual labeling effort. Doctoral thesis, Nanyang Technological University, Singapore. https://hdl.handle.net/10356/162800 https://hdl.handle.net/10356/162800 10.32657/10356/162800 en This work is licensed under a Creative Commons Attribution-NonCommercial 4.0 International License (CC BY-NC 4.0). application/pdf Nanyang Technological University