Budget efficient online active learning and its applications

Online Active Learning (OAL) has been an important research area in machine learning, which aims to minimize the number of labeled instances and maximize the predictive performance meanwhile. OAL has both the efficiency and effectiveness of online learning and the labeling frugality of active learni...

Full description

Saved in:
Bibliographic Details
Main Author: Hao, Shuji
Other Authors: Miao Chun Yan
Format: Theses and Dissertations
Language:English
Published: 2017
Subjects:
Online Access:http://hdl.handle.net/10356/69462
Tags: Add Tag
No Tags, Be the first to tag this record!
Institution: Nanyang Technological University
Language: English
id sg-ntu-dr.10356-69462
record_format dspace
institution Nanyang Technological University
building NTU Library
continent Asia
country Singapore
Singapore
content_provider NTU Library
collection DR-NTU
language English
topic DRNTU::Engineering::Computer science and engineering
spellingShingle DRNTU::Engineering::Computer science and engineering
Hao, Shuji
Budget efficient online active learning and its applications
description Online Active Learning (OAL) has been an important research area in machine learning, which aims to minimize the number of labeled instances and maximize the predictive performance meanwhile. OAL has both the efficiency and effectiveness of online learning and the labeling frugality of active learning. Due to these advantages, OAL has been widely used in real-world large-scale applications, such as information retrieval, data mining, recommendation system, and so on. However, there are still several problems existing in current OAL designs. First, in the online learning with expert advice setting, most of the exiting OAL algorithms assume that all the experts are comparably reliable, which is usually not true in reality. For example, noisy workers are quite common in the crowdsourcing platforms. To correct this weak assumption, this study proposes two robust online active learning algorithms, which not only consider the predictions of experts on current instance, but also consider the cumulative performance of experts on past instances. To validate the proposed algorithms, a series of experiments are conducted, in which the results show that the proposed algorithms greatly outperform the state-of-the-art existing algorithms and can achieve robust performance both in the normal and noisy scenarios. Second, to obtain reliable labels in crowdsourcing, most of the algorithms either require a set of golden questions to filter out the noisy workers, or require several labels for each instance. These types of requirements are labeling costly both in terms of money and time. To save costs, a framework of Active Crowdsourcing for Annotation (ACA) is proposed based on the online learning with expert advice. The proposed framework consists of two main components: “Who to label” and “When to query”. The first component actively allocates the instance to reliable workers to gain labels, and the second component actively decides which instance is worthy to be a golden question. The empirical studies both on simulated and real-world crowdsourcing datasets show that the proposed framework can robustly learn the reliability of each worker and wisely allocate the task to more reliable workers. Third, in the typical online learning setting, most of the OAL algorithms adopt the margin-based query strategies, which usually assume that the model is well trained and the margin value is accurate. However, this assumption is often not true in reality, such as in the early training phrase. To alleviate this assumption, a second-order based online active learning algorithm is proposed, which considers not only the margin value of current instance, but also the confidence value of the current model. To validate the efficacy of the proposed algorithm, a theoretical mistake bound is provided and a set of empirical studies are conducted on real-world datasets. Both the theoretical and empirical studies show that our proposed second-order based algorithm can achieve the best performance in terms of accuracy. Last, for the online relative similarity learning problem, most of the studies assume that there are large-scale labeled triplets. However, labeling datasets are usually costly and time consuming, especially for the large-scale similarity learning problems. To reduce the high computation cost, this study proposes two online active relative similarity learning algorithms: (i) first-order based Passive-Aggressive Active Similarity learning (PAAS); (ii) second-order based Confidence-Weighted Active Similarity learning (CWAS). In order to validate the effectiveness of our algorithms, the proposed algorithms are firstly theoretically analyzed, and then empirically evaluated on several real-world applications. The experiments show that the proposed PAAS and CWAS algorithms can greatly reduce the labeling cost in the relative similarity learning process. In sum, to tackle the critical challenges of existing OAL algorithms, this study proposes four main OAL algorithms, most of them are theoretically sound algorithms. And all of the proposed algorithms are carefully evaluated on a large number of large-scale real-world applications and achieved promising results. Although promising results have been generated from this study, the proposed OAL algorithms are far from perfect. In future, there are several directions to study, such as OAL for concept drifting problems, distributed OAL algorithms, OAL for crowdsourcing and so on.
author2 Miao Chun Yan
author_facet Miao Chun Yan
Hao, Shuji
format Theses and Dissertations
author Hao, Shuji
author_sort Hao, Shuji
title Budget efficient online active learning and its applications
title_short Budget efficient online active learning and its applications
title_full Budget efficient online active learning and its applications
title_fullStr Budget efficient online active learning and its applications
title_full_unstemmed Budget efficient online active learning and its applications
title_sort budget efficient online active learning and its applications
publishDate 2017
url http://hdl.handle.net/10356/69462
_version_ 1696984374352805888
spelling sg-ntu-dr.10356-694622021-03-20T13:04:59Z Budget efficient online active learning and its applications Hao, Shuji Miao Chun Yan Interdisciplinary Graduate School (IGS) Joint NTU-UBC Research Centre of Excellence in Active Living for the Elderly (LILY) DRNTU::Engineering::Computer science and engineering Online Active Learning (OAL) has been an important research area in machine learning, which aims to minimize the number of labeled instances and maximize the predictive performance meanwhile. OAL has both the efficiency and effectiveness of online learning and the labeling frugality of active learning. Due to these advantages, OAL has been widely used in real-world large-scale applications, such as information retrieval, data mining, recommendation system, and so on. However, there are still several problems existing in current OAL designs. First, in the online learning with expert advice setting, most of the exiting OAL algorithms assume that all the experts are comparably reliable, which is usually not true in reality. For example, noisy workers are quite common in the crowdsourcing platforms. To correct this weak assumption, this study proposes two robust online active learning algorithms, which not only consider the predictions of experts on current instance, but also consider the cumulative performance of experts on past instances. To validate the proposed algorithms, a series of experiments are conducted, in which the results show that the proposed algorithms greatly outperform the state-of-the-art existing algorithms and can achieve robust performance both in the normal and noisy scenarios. Second, to obtain reliable labels in crowdsourcing, most of the algorithms either require a set of golden questions to filter out the noisy workers, or require several labels for each instance. These types of requirements are labeling costly both in terms of money and time. To save costs, a framework of Active Crowdsourcing for Annotation (ACA) is proposed based on the online learning with expert advice. The proposed framework consists of two main components: “Who to label” and “When to query”. The first component actively allocates the instance to reliable workers to gain labels, and the second component actively decides which instance is worthy to be a golden question. The empirical studies both on simulated and real-world crowdsourcing datasets show that the proposed framework can robustly learn the reliability of each worker and wisely allocate the task to more reliable workers. Third, in the typical online learning setting, most of the OAL algorithms adopt the margin-based query strategies, which usually assume that the model is well trained and the margin value is accurate. However, this assumption is often not true in reality, such as in the early training phrase. To alleviate this assumption, a second-order based online active learning algorithm is proposed, which considers not only the margin value of current instance, but also the confidence value of the current model. To validate the efficacy of the proposed algorithm, a theoretical mistake bound is provided and a set of empirical studies are conducted on real-world datasets. Both the theoretical and empirical studies show that our proposed second-order based algorithm can achieve the best performance in terms of accuracy. Last, for the online relative similarity learning problem, most of the studies assume that there are large-scale labeled triplets. However, labeling datasets are usually costly and time consuming, especially for the large-scale similarity learning problems. To reduce the high computation cost, this study proposes two online active relative similarity learning algorithms: (i) first-order based Passive-Aggressive Active Similarity learning (PAAS); (ii) second-order based Confidence-Weighted Active Similarity learning (CWAS). In order to validate the effectiveness of our algorithms, the proposed algorithms are firstly theoretically analyzed, and then empirically evaluated on several real-world applications. The experiments show that the proposed PAAS and CWAS algorithms can greatly reduce the labeling cost in the relative similarity learning process. In sum, to tackle the critical challenges of existing OAL algorithms, this study proposes four main OAL algorithms, most of them are theoretically sound algorithms. And all of the proposed algorithms are carefully evaluated on a large number of large-scale real-world applications and achieved promising results. Although promising results have been generated from this study, the proposed OAL algorithms are far from perfect. In future, there are several directions to study, such as OAL for concept drifting problems, distributed OAL algorithms, OAL for crowdsourcing and so on. Doctor of Philosophy (IGS) 2017-01-24T04:54:36Z 2017-01-24T04:54:36Z 2017 Thesis Hao, S. (2017). Budget efficient online active learning and its applications. Doctoral thesis, Nanyang Technological University, Singapore. http://hdl.handle.net/10356/69462 10.32657/10356/69462 en 146 p. application/pdf