Addressing the cold start problem in active learning using self-supervised learning

Active learning promises to improve annotation efficiency by iteratively selecting the most important data to be annotated first. However, we uncover a striking contrast to this promise: Active querying strategies fail to select data as effectively as random selection at the first choice. We identif...

Full description

Saved in:
Bibliographic Details
Main Author: Chen, Liangyu
Other Authors: Wen Bihan
Format: Final Year Project
Language:English
Published: Nanyang Technological University 2022
Subjects:
Online Access:https://hdl.handle.net/10356/158461
Tags: Add Tag
No Tags, Be the first to tag this record!
Institution: Nanyang Technological University
Language: English
id sg-ntu-dr.10356-158461
record_format dspace
spelling sg-ntu-dr.10356-1584612023-07-07T19:12:17Z Addressing the cold start problem in active learning using self-supervised learning Chen, Liangyu Wen Bihan School of Electrical and Electronic Engineering bihan.wen@ntu.edu.sg Engineering::Electrical and electronic engineering Active learning promises to improve annotation efficiency by iteratively selecting the most important data to be annotated first. However, we uncover a striking contrast to this promise: Active querying strategies fail to select data as effectively as random selection at the first choice. We identify it as the cold start problem in vision active learning. Systematic ablation experiments and qualitative visualizations reveal that the level of label uniformity (the uniform distribution of categories in a query) is an explicit criterion for determining the annotation importance. However, computing the label uniformity requires manual annotation, which is not available according to the nature of active learning. In this paper, we find that without manual annotation, contrastive learning can approximate label uniformity based on pseudo-labeled features generated from image feature clustering. Moreover, within each cluster, selecting hard-to-contrast data (low confidence in instance discrimination with low variability along the contrastive learning trajectory) is preferable to those ambiguous and easy-to-contrast data. In this paper, we find that without manual annotation, contrastive learning can approximate these two criteria based on pseudo-labeled features generated from image feature clustering. Extensive benchmark experiments show that our initial query sheds light on surpassing random sampling on medical imaging datasets (e.g. Colon Pathology, Dermatoscope, and Blood Cell Microscope). In summary, this study (1) illustrates the cold start problem in vision active learning, (2) investigates the underlying causes of the problem with rigorous analysis and visualization, and (3) determines effective initial queries to start the “human-in-the-loop” procedure. We hope our potential solution to the cold start problem can be used as a simple yet strong baseline to sample the initial query for active learning in image classification. Bachelor of Engineering (Information Engineering and Media) 2022-06-04T07:43:25Z 2022-06-04T07:43:25Z 2022 Final Year Project (FYP) Chen, L. (2022). Addressing the cold start problem in active learning using self-supervised learning. Final Year Project (FYP), Nanyang Technological University, Singapore. https://hdl.handle.net/10356/158461 https://hdl.handle.net/10356/158461 en A3282-211 application/pdf Nanyang Technological University
institution Nanyang Technological University
building NTU Library
continent Asia
country Singapore
Singapore
content_provider NTU Library
collection DR-NTU
language English
topic Engineering::Electrical and electronic engineering
spellingShingle Engineering::Electrical and electronic engineering
Chen, Liangyu
Addressing the cold start problem in active learning using self-supervised learning
description Active learning promises to improve annotation efficiency by iteratively selecting the most important data to be annotated first. However, we uncover a striking contrast to this promise: Active querying strategies fail to select data as effectively as random selection at the first choice. We identify it as the cold start problem in vision active learning. Systematic ablation experiments and qualitative visualizations reveal that the level of label uniformity (the uniform distribution of categories in a query) is an explicit criterion for determining the annotation importance. However, computing the label uniformity requires manual annotation, which is not available according to the nature of active learning. In this paper, we find that without manual annotation, contrastive learning can approximate label uniformity based on pseudo-labeled features generated from image feature clustering. Moreover, within each cluster, selecting hard-to-contrast data (low confidence in instance discrimination with low variability along the contrastive learning trajectory) is preferable to those ambiguous and easy-to-contrast data. In this paper, we find that without manual annotation, contrastive learning can approximate these two criteria based on pseudo-labeled features generated from image feature clustering. Extensive benchmark experiments show that our initial query sheds light on surpassing random sampling on medical imaging datasets (e.g. Colon Pathology, Dermatoscope, and Blood Cell Microscope). In summary, this study (1) illustrates the cold start problem in vision active learning, (2) investigates the underlying causes of the problem with rigorous analysis and visualization, and (3) determines effective initial queries to start the “human-in-the-loop” procedure. We hope our potential solution to the cold start problem can be used as a simple yet strong baseline to sample the initial query for active learning in image classification.
author2 Wen Bihan
author_facet Wen Bihan
Chen, Liangyu
format Final Year Project
author Chen, Liangyu
author_sort Chen, Liangyu
title Addressing the cold start problem in active learning using self-supervised learning
title_short Addressing the cold start problem in active learning using self-supervised learning
title_full Addressing the cold start problem in active learning using self-supervised learning
title_fullStr Addressing the cold start problem in active learning using self-supervised learning
title_full_unstemmed Addressing the cold start problem in active learning using self-supervised learning
title_sort addressing the cold start problem in active learning using self-supervised learning
publisher Nanyang Technological University
publishDate 2022
url https://hdl.handle.net/10356/158461
_version_ 1772828195186278400