Weakly-supervised semantic segmentation

Semantic segmentation is a fundamental task in computer vision that assigns a label to every pixel in an image based on the semantic meaning of the objects present. It demands a large amount of pixel-level labeled images for training deep models. Weakly-supervised semantic segmentation (WSSS) is a m...

Full description

Saved in:

Bibliographic Details
Main Author:	CHEN, Zhaozheng
Format:	text
Language:	English
Published:	Institutional Knowledge at Singapore Management University 2023
Subjects:	Computer Vision Semantic Segmentation Weakly-Supervised Learning Artificial Intelligence and Robotics Computer Sciences
Online Access:	https://ink.library.smu.edu.sg/etd_coll/544 https://ink.library.smu.edu.sg/context/etd_coll/article/1542/viewcontent/GPIS_AY2019_PhD_CHEN_Zhaozheng.pdf
Tags:	Add Tag No Tags, Be the first to tag this record!
Institution:	Singapore Management University
Language:	English

id	sg-smu-ink.etd_coll-1542
record_format	dspace
spelling	sg-smu-ink.etd_coll-15422024-06-20T01:46:16Z Weakly-supervised semantic segmentation CHEN, Zhaozheng Semantic segmentation is a fundamental task in computer vision that assigns a label to every pixel in an image based on the semantic meaning of the objects present. It demands a large amount of pixel-level labeled images for training deep models. Weakly-supervised semantic segmentation (WSSS) is a more feasible approach that uses only weak annotations to learn the segmentation task. Image-level label based WSSS is the most challenging and popular, where only the class label for the entire image is provided as supervision. To address this challenge, Class Activation Map (CAM) has emerged as a powerful technique in WSSS. CAM provides a way to visualize the areas of an image that are most relevant to a particular class without requiring pixel-level annotations. However, CAM is generated from the classification model, and it often only highlights the most discriminative parts of the object due to the discriminative nature of the model. This dissertation examines the key issues behind conventional CAM and proposes corresponding solutions. Two of our completed works focus on two crucial steps in CAM generation: training a classification model and computing CAM from the classification model. The first work discusses the disadvantage of a key component to training a good classification model — binary cross-entropy (BCE) loss function. We introduce a simple method: reactivating the converged CAM with BCE by using softmax cross-entropy loss (SCE). Thanks to the contrastive nature of SCE, the pixel response is disentangled into different classes, and hence less mask ambiguity is expected. Then, in our second completed work, we aim to improve the quality of CAM given a trained classification model. Specifically, we introduce a new computation method for CAM that captures non-discriminative features, resulting in expanded CAM coverage to cover whole objects. This is achieved by clustering on all local features of an object class to derive local prototypes, representing local semantics such as the “head”, “leg”, and “body” of a “sheep”. Our CAM captures all local features of the class without discrimination. Although the two completed works have brought significant improvements to conventional CAM, the improved CAM may still face a bottleneck due to the limited training data and the co-occurrence of objects and backgrounds. In this dissertation, we investigate the applicability of the recent visual foundation models, such as the Segment Anything Model (SAM), in the context of WSSS. SAM is a recent image segmentation model exhibiting superior performance across various segmentation tasks. It is remarkable for its capability to interpret diverse prompts and successively generate various object masks. We scrutinize SAM in two intriguing scenarios: text prompting and zero-shot learning, and we propose related pipelines for its application in WSSS. We provide insights into the potential and challenges of deploying visual foundation models for WSSS, facilitating future developments in this exciting research area. 2023-12-01T08:00:00Z text application/pdf https://ink.library.smu.edu.sg/etd_coll/544 https://ink.library.smu.edu.sg/context/etd_coll/article/1542/viewcontent/GPIS_AY2019_PhD_CHEN_Zhaozheng.pdf http://creativecommons.org/licenses/by-nc-nd/4.0/ Dissertations and Theses Collection (Open Access) eng Institutional Knowledge at Singapore Management University Computer Vision Semantic Segmentation Weakly-Supervised Learning Artificial Intelligence and Robotics Computer Sciences
institution	Singapore Management University
building	SMU Libraries
continent	Asia
country	Singapore Singapore
content_provider	SMU Libraries
collection	InK@SMU
language	English
topic	Computer Vision Semantic Segmentation Weakly-Supervised Learning Artificial Intelligence and Robotics Computer Sciences
spellingShingle	Computer Vision Semantic Segmentation Weakly-Supervised Learning Artificial Intelligence and Robotics Computer Sciences CHEN, Zhaozheng Weakly-supervised semantic segmentation
description	Semantic segmentation is a fundamental task in computer vision that assigns a label to every pixel in an image based on the semantic meaning of the objects present. It demands a large amount of pixel-level labeled images for training deep models. Weakly-supervised semantic segmentation (WSSS) is a more feasible approach that uses only weak annotations to learn the segmentation task. Image-level label based WSSS is the most challenging and popular, where only the class label for the entire image is provided as supervision. To address this challenge, Class Activation Map (CAM) has emerged as a powerful technique in WSSS. CAM provides a way to visualize the areas of an image that are most relevant to a particular class without requiring pixel-level annotations. However, CAM is generated from the classification model, and it often only highlights the most discriminative parts of the object due to the discriminative nature of the model. This dissertation examines the key issues behind conventional CAM and proposes corresponding solutions. Two of our completed works focus on two crucial steps in CAM generation: training a classification model and computing CAM from the classification model. The first work discusses the disadvantage of a key component to training a good classification model — binary cross-entropy (BCE) loss function. We introduce a simple method: reactivating the converged CAM with BCE by using softmax cross-entropy loss (SCE). Thanks to the contrastive nature of SCE, the pixel response is disentangled into different classes, and hence less mask ambiguity is expected. Then, in our second completed work, we aim to improve the quality of CAM given a trained classification model. Specifically, we introduce a new computation method for CAM that captures non-discriminative features, resulting in expanded CAM coverage to cover whole objects. This is achieved by clustering on all local features of an object class to derive local prototypes, representing local semantics such as the “head”, “leg”, and “body” of a “sheep”. Our CAM captures all local features of the class without discrimination. Although the two completed works have brought significant improvements to conventional CAM, the improved CAM may still face a bottleneck due to the limited training data and the co-occurrence of objects and backgrounds. In this dissertation, we investigate the applicability of the recent visual foundation models, such as the Segment Anything Model (SAM), in the context of WSSS. SAM is a recent image segmentation model exhibiting superior performance across various segmentation tasks. It is remarkable for its capability to interpret diverse prompts and successively generate various object masks. We scrutinize SAM in two intriguing scenarios: text prompting and zero-shot learning, and we propose related pipelines for its application in WSSS. We provide insights into the potential and challenges of deploying visual foundation models for WSSS, facilitating future developments in this exciting research area.
format	text
author	CHEN, Zhaozheng
author_facet	CHEN, Zhaozheng
author_sort	CHEN, Zhaozheng
title	Weakly-supervised semantic segmentation
title_short	Weakly-supervised semantic segmentation
title_full	Weakly-supervised semantic segmentation
title_fullStr	Weakly-supervised semantic segmentation
title_full_unstemmed	Weakly-supervised semantic segmentation
title_sort	weakly-supervised semantic segmentation
publisher	Institutional Knowledge at Singapore Management University
publishDate	2023
url	https://ink.library.smu.edu.sg/etd_coll/544 https://ink.library.smu.edu.sg/context/etd_coll/article/1542/viewcontent/GPIS_AY2019_PhD_CHEN_Zhaozheng.pdf
_version_	1814047579181154304

Weakly-supervised semantic segmentation

Similar Items