Advanced topics in weakly supervised learning
Machine learning has achieved great advances in various tasks, especially in supervised learning tasks. However, supervised learning requires all the correct labels of training examples to train an effective model, and collecting training examples with such strong supervision could incur unaffordabl...
Saved in:
Main Author: | |
---|---|
Other Authors: | |
Format: | Thesis-Doctor of Philosophy |
Language: | English |
Published: |
Nanyang Technological University
2021
|
Subjects: | |
Online Access: | https://hdl.handle.net/10356/145854 |
Tags: |
Add Tag
No Tags, Be the first to tag this record!
|
Institution: | Nanyang Technological University |
Language: | English |
Summary: | Machine learning has achieved great advances in various tasks, especially in supervised learning tasks. However, supervised learning requires all the correct labels of training examples to train an effective model, and collecting training examples with such strong supervision could incur unaffordable monetary or time cost. Therefore, weakly supervised learning, which aims to build predictive models by learning with weak supervision, has attracted increasing attention in recent years. This doctoral thesis is devoted to investigating some advanced topics in weakly supervised learning, including complementary-label learning and partial-label learning. Complementary-label learning solves the problem where each training example is supplied with a single complementary label (CL), which only specifies one of the classes that the example does not belong to. Although existing complementary-label learning approaches have provided solid theoretical foundations and achieved promising performance, they are all restricted to the case where each example is associated with a single CL. This case notably limits its potential since our labelers may easily identify multiple complementary labels (MCLs) to one example. To address this problem, we propose an extended problem setting to allow MCLs for each example and two ways for learning with MCLs. In the first way, we design two wrappers that decompose MCLs into many single CLs in different manners, so that we could use any method for learning with CLs. However, we find that the supervision information that MCLs hold is conceptually diluted after decomposition. Thus, in the second way, we derive an unbiased risk estimator; minimizing it processes each set of MCLs as a whole and possesses an estimation error bound. In addition, we improve the second way into minimizing properly chosen upper bounds for practical implementation. Experiments show that the former way works well for learning with MCLs while the latter is even better. Partial-label learning solves the problem where each training example is supplied with a set of candidate labels, only one of which is the correct label. By regarding each CL as a non-candidate label, complementary-label learning can be also considered as a special case of partial-label learning. Hence we focus more on partial-label learning in this thesis. We propose several effective methods to alleviate various issues in partial-label learning. Firstly, most of the existing methods fail to take into account the fact that different candidate labels give different contributions to model training. We formalize the probabilities of different candidate labels being the correct label as latent label distributions, and then propose a novel unified formulation to estimate the latent label distributions while training the model simultaneously. Secondly, self-training is a representative semi-supervised learning strategy that can directly label an unlabeled instance with enough high confidence, while the incorrectly labeled data could have contagiously negative impacts on the final predictions. Therefore, it is still unclear whether the idea of self-training can be used to improve the practical performance of partial-label learning. We provide the first attempt to improve self-training for partial-label learning by presenting a unified formulation with proper constraints to jointly train the desired model and perform pseudo-labeling. Thirdly, there still lacks a theoretical understanding of the consistency of partial-label learning methods through the lens of the generation process of partially labeled data. We propose the first generation model of partially labeled data, and develop two novel partial-label learning methods that are guaranteed to be provably consistent, i.e., one is risk-consistent and the other is classifier-consistent. Extensive experimental results clearly demonstrate the effectiveness of the above our proposed partial-label learning methods. |
---|