Weakly-supervised semantic segmentation

Semantic segmentation is a fundamental task in computer vision that assigns a label to every pixel in an image based on the semantic meaning of the objects present. It demands a large amount of pixel-level labeled images for training deep models. Weakly-supervised semantic segmentation (WSSS) is a m...

Full description

Saved in:
Bibliographic Details
Main Author: CHEN, Zhaozheng
Format: text
Language:English
Published: Institutional Knowledge at Singapore Management University 2023
Subjects:
Online Access:https://ink.library.smu.edu.sg/etd_coll/544
https://ink.library.smu.edu.sg/context/etd_coll/article/1542/viewcontent/GPIS_AY2019_PhD_CHEN_Zhaozheng.pdf
Tags: Add Tag
No Tags, Be the first to tag this record!
Institution: Singapore Management University
Language: English
id sg-smu-ink.etd_coll-1542
record_format dspace
spelling sg-smu-ink.etd_coll-15422024-06-20T01:46:16Z Weakly-supervised semantic segmentation CHEN, Zhaozheng Semantic segmentation is a fundamental task in computer vision that assigns a label to every pixel in an image based on the semantic meaning of the objects present. It demands a large amount of pixel-level labeled images for training deep models. Weakly-supervised semantic segmentation (WSSS) is a more feasible approach that uses only weak annotations to learn the segmentation task. Image-level label based WSSS is the most challenging and popular, where only the class label for the entire image is provided as supervision. To address this challenge, Class Activation Map (CAM) has emerged as a powerful technique in WSSS. CAM provides a way to visualize the areas of an image that are most relevant to a particular class without requiring pixel-level annotations. However, CAM is generated from the classification model, and it often only highlights the most discriminative parts of the object due to the discriminative nature of the model. This dissertation examines the key issues behind conventional CAM and proposes corresponding solutions. Two of our completed works focus on two crucial steps in CAM generation: training a classification model and computing CAM from the classification model. The first work discusses the disadvantage of a key component to training a good classification model — binary cross-entropy (BCE) loss function. We introduce a simple method: reactivating the converged CAM with BCE by using softmax cross-entropy loss (SCE). Thanks to the contrastive nature of SCE, the pixel response is disentangled into different classes, and hence less mask ambiguity is expected. Then, in our second completed work, we aim to improve the quality of CAM given a trained classification model. Specifically, we introduce a new computation method for CAM that captures non-discriminative features, resulting in expanded CAM coverage to cover whole objects. This is achieved by clustering on all local features of an object class to derive local prototypes, representing local semantics such as the “head”, “leg”, and “body” of a “sheep”. Our CAM captures all local features of the class without discrimination. Although the two completed works have brought significant improvements to conventional CAM, the improved CAM may still face a bottleneck due to the limited training data and the co-occurrence of objects and backgrounds. In this dissertation, we investigate the applicability of the recent visual foundation models, such as the Segment Anything Model (SAM), in the context of WSSS. SAM is a recent image segmentation model exhibiting superior performance across various segmentation tasks. It is remarkable for its capability to interpret diverse prompts and successively generate various object masks. We scrutinize SAM in two intriguing scenarios: text prompting and zero-shot learning, and we propose related pipelines for its application in WSSS. We provide insights into the potential and challenges of deploying visual foundation models for WSSS, facilitating future developments in this exciting research area. 2023-12-01T08:00:00Z text application/pdf https://ink.library.smu.edu.sg/etd_coll/544 https://ink.library.smu.edu.sg/context/etd_coll/article/1542/viewcontent/GPIS_AY2019_PhD_CHEN_Zhaozheng.pdf http://creativecommons.org/licenses/by-nc-nd/4.0/ Dissertations and Theses Collection (Open Access) eng Institutional Knowledge at Singapore Management University Computer Vision Semantic Segmentation Weakly-Supervised Learning Artificial Intelligence and Robotics Computer Sciences
institution Singapore Management University
building SMU Libraries
continent Asia
country Singapore
Singapore
content_provider SMU Libraries
collection InK@SMU
language English
topic Computer Vision
Semantic Segmentation
Weakly-Supervised Learning
Artificial Intelligence and Robotics
Computer Sciences
spellingShingle Computer Vision
Semantic Segmentation
Weakly-Supervised Learning
Artificial Intelligence and Robotics
Computer Sciences
CHEN, Zhaozheng
Weakly-supervised semantic segmentation
description Semantic segmentation is a fundamental task in computer vision that assigns a label to every pixel in an image based on the semantic meaning of the objects present. It demands a large amount of pixel-level labeled images for training deep models. Weakly-supervised semantic segmentation (WSSS) is a more feasible approach that uses only weak annotations to learn the segmentation task. Image-level label based WSSS is the most challenging and popular, where only the class label for the entire image is provided as supervision. To address this challenge, Class Activation Map (CAM) has emerged as a powerful technique in WSSS. CAM provides a way to visualize the areas of an image that are most relevant to a particular class without requiring pixel-level annotations. However, CAM is generated from the classification model, and it often only highlights the most discriminative parts of the object due to the discriminative nature of the model. This dissertation examines the key issues behind conventional CAM and proposes corresponding solutions. Two of our completed works focus on two crucial steps in CAM generation: training a classification model and computing CAM from the classification model. The first work discusses the disadvantage of a key component to training a good classification model — binary cross-entropy (BCE) loss function. We introduce a simple method: reactivating the converged CAM with BCE by using softmax cross-entropy loss (SCE). Thanks to the contrastive nature of SCE, the pixel response is disentangled into different classes, and hence less mask ambiguity is expected. Then, in our second completed work, we aim to improve the quality of CAM given a trained classification model. Specifically, we introduce a new computation method for CAM that captures non-discriminative features, resulting in expanded CAM coverage to cover whole objects. This is achieved by clustering on all local features of an object class to derive local prototypes, representing local semantics such as the “head”, “leg”, and “body” of a “sheep”. Our CAM captures all local features of the class without discrimination. Although the two completed works have brought significant improvements to conventional CAM, the improved CAM may still face a bottleneck due to the limited training data and the co-occurrence of objects and backgrounds. In this dissertation, we investigate the applicability of the recent visual foundation models, such as the Segment Anything Model (SAM), in the context of WSSS. SAM is a recent image segmentation model exhibiting superior performance across various segmentation tasks. It is remarkable for its capability to interpret diverse prompts and successively generate various object masks. We scrutinize SAM in two intriguing scenarios: text prompting and zero-shot learning, and we propose related pipelines for its application in WSSS. We provide insights into the potential and challenges of deploying visual foundation models for WSSS, facilitating future developments in this exciting research area.
format text
author CHEN, Zhaozheng
author_facet CHEN, Zhaozheng
author_sort CHEN, Zhaozheng
title Weakly-supervised semantic segmentation
title_short Weakly-supervised semantic segmentation
title_full Weakly-supervised semantic segmentation
title_fullStr Weakly-supervised semantic segmentation
title_full_unstemmed Weakly-supervised semantic segmentation
title_sort weakly-supervised semantic segmentation
publisher Institutional Knowledge at Singapore Management University
publishDate 2023
url https://ink.library.smu.edu.sg/etd_coll/544
https://ink.library.smu.edu.sg/context/etd_coll/article/1542/viewcontent/GPIS_AY2019_PhD_CHEN_Zhaozheng.pdf
_version_ 1814047579181154304