S2match: self-paced sampling for data-limited semi-supervised learning

Data-limited semi-supervised learning tends to be severely degraded by miscalibration (i.e., misalignment between confidence and correctness of predicted pseudo labels) and stuck at poor local minima while learning from the same set of over-confident yet incorrect pseudo labels repeatedly. We design...

Full description

Saved in:
Bibliographic Details
Main Authors: Guan, Dayan, Xing, Yun, Huang, Jiaxing, Xiao, Aoran, El Saddik, Abdulmotaleb, Lu, Shijian
Other Authors: College of Computing and Data Science
Format: Article
Language:English
Published: 2025
Subjects:
Online Access:https://hdl.handle.net/10356/182563
Tags: Add Tag
No Tags, Be the first to tag this record!
Institution: Nanyang Technological University
Language: English
id sg-ntu-dr.10356-182563
record_format dspace
spelling sg-ntu-dr.10356-1825632025-02-10T04:38:32Z S2match: self-paced sampling for data-limited semi-supervised learning Guan, Dayan Xing, Yun Huang, Jiaxing Xiao, Aoran El Saddik, Abdulmotaleb Lu, Shijian College of Computing and Data Science Computer and Information Science Semi-supervised learning Self-paced learning Data-limited semi-supervised learning tends to be severely degraded by miscalibration (i.e., misalignment between confidence and correctness of predicted pseudo labels) and stuck at poor local minima while learning from the same set of over-confident yet incorrect pseudo labels repeatedly. We design a simple and effective self-paced sampling technique that can greatly alleviate the impact of miscalibration and learn more accurate semi-supervised models from limited training data. Instead of employing static or dynamic confidence thresholds which is sensitive to miscalibration, the proposed self-paced sampling follows a simple linear policy to select pseudo labels which eases repeated learning from the same set of falsely predicted pseudo labels at the early training stage and lowers the chance of being stuck at local minima effectively. Despite its simplicity, extensive evaluations over multiple data-limited semi-supervised tasks show the proposed self-paced sampling outperforms the state-of-the-art consistently by large margins. This research was funded by Talent Scientific Research Start-up Project of Harbin Institute of Technology. 2025-02-10T02:34:01Z 2025-02-10T02:34:01Z 2025 Journal Article Guan, D., Xing, Y., Huang, J., Xiao, A., El Saddik, A. & Lu, S. (2025). S2match: self-paced sampling for data-limited semi-supervised learning. Pattern Recognition, 159, 111121-. https://dx.doi.org/10.1016/j.patcog.2024.111121 0031-3203 https://hdl.handle.net/10356/182563 10.1016/j.patcog.2024.111121 2-s2.0-85208241871 159 111121 en Pattern Recognition © 2024 Elsevier Ltd. All rights are reserved, including those for text and data mining, AI training, and similar technologies.
institution Nanyang Technological University
building NTU Library
continent Asia
country Singapore
Singapore
content_provider NTU Library
collection DR-NTU
language English
topic Computer and Information Science
Semi-supervised learning
Self-paced learning
spellingShingle Computer and Information Science
Semi-supervised learning
Self-paced learning
Guan, Dayan
Xing, Yun
Huang, Jiaxing
Xiao, Aoran
El Saddik, Abdulmotaleb
Lu, Shijian
S2match: self-paced sampling for data-limited semi-supervised learning
description Data-limited semi-supervised learning tends to be severely degraded by miscalibration (i.e., misalignment between confidence and correctness of predicted pseudo labels) and stuck at poor local minima while learning from the same set of over-confident yet incorrect pseudo labels repeatedly. We design a simple and effective self-paced sampling technique that can greatly alleviate the impact of miscalibration and learn more accurate semi-supervised models from limited training data. Instead of employing static or dynamic confidence thresholds which is sensitive to miscalibration, the proposed self-paced sampling follows a simple linear policy to select pseudo labels which eases repeated learning from the same set of falsely predicted pseudo labels at the early training stage and lowers the chance of being stuck at local minima effectively. Despite its simplicity, extensive evaluations over multiple data-limited semi-supervised tasks show the proposed self-paced sampling outperforms the state-of-the-art consistently by large margins.
author2 College of Computing and Data Science
author_facet College of Computing and Data Science
Guan, Dayan
Xing, Yun
Huang, Jiaxing
Xiao, Aoran
El Saddik, Abdulmotaleb
Lu, Shijian
format Article
author Guan, Dayan
Xing, Yun
Huang, Jiaxing
Xiao, Aoran
El Saddik, Abdulmotaleb
Lu, Shijian
author_sort Guan, Dayan
title S2match: self-paced sampling for data-limited semi-supervised learning
title_short S2match: self-paced sampling for data-limited semi-supervised learning
title_full S2match: self-paced sampling for data-limited semi-supervised learning
title_fullStr S2match: self-paced sampling for data-limited semi-supervised learning
title_full_unstemmed S2match: self-paced sampling for data-limited semi-supervised learning
title_sort s2match: self-paced sampling for data-limited semi-supervised learning
publishDate 2025
url https://hdl.handle.net/10356/182563
_version_ 1823807400010842112