Active learning with confidence-based answers for crowdsourcing labeling tasks

Collecting labels for data is important for many practical applications (e.g., data mining). However, this process can be expensive and time-consuming since it needs extensive efforts of domain experts. To decrease the cost, many recent works combine crowdsourcing, which outsources labeling tasks (u...

Full description

Saved in:
Bibliographic Details
Main Authors: Song, Jinhua, Wang, Hao, Gao, Yang, An, Bo
Other Authors: School of Computer Science and Engineering
Format: Article
Language:English
Published: 2020
Subjects:
Online Access:https://hdl.handle.net/10356/139581
Tags: Add Tag
No Tags, Be the first to tag this record!
Institution: Nanyang Technological University
Language: English
id sg-ntu-dr.10356-139581
record_format dspace
spelling sg-ntu-dr.10356-1395812020-05-20T06:48:13Z Active learning with confidence-based answers for crowdsourcing labeling tasks Song, Jinhua Wang, Hao Gao, Yang An, Bo School of Computer Science and Engineering Engineering::Computer science and engineering Confidence-based Answer Active Learning Collecting labels for data is important for many practical applications (e.g., data mining). However, this process can be expensive and time-consuming since it needs extensive efforts of domain experts. To decrease the cost, many recent works combine crowdsourcing, which outsources labeling tasks (usually in the form of questions) to a large group of non-expert workers, and active learning, which actively selects the best instances to be labeled, to acquire labeled datasets. However, for difficult tasks where workers are uncertain about their answers, asking for discrete labels might lead to poor performance due to the low-quality labels. In this paper, we design questions to get continuous worker responses which are more informative and contain workers’ labels as well as their confidence. As crowd workers may make mistakes, multiple workers are hired to answer each question. Then, we propose a new aggregation method to integrate the responses. By considering workers’ confidence information, the accuracy of integrated labels is improved. Furthermore, based on the new answers, we propose a novel active learning framework to iteratively select instances for “labeling”. We define a score function for instance selection by combining the uncertainty derived from the classifier model and the uncertainty derived from the answer sets. The uncertainty derived from uncertain answers is more effective than that derived from labels. We also propose batch methods which select multiple instances at a time to further improve the efficiency of our approach. Experimental studies on both simulated and real data show that our methods are effective in increasing the labeling accuracy and achieve significantly better performance than existing methods. 2020-05-20T06:48:13Z 2020-05-20T06:48:13Z 2018 Journal Article Song, J., Wang, H., Gao, Y., & An, B. (2018). Active learning with confidence-based answers for crowdsourcing labeling tasks. Knowledge-Based Systems, 159, 244-258. doi:10.1016/j.knosys.2018.07.010 0950-7051 https://hdl.handle.net/10356/139581 10.1016/j.knosys.2018.07.010 2-s2.0-85049966648 159 244 258 en Knowledge-Based Systems © 2018 Elsevier B.V. All rights reserved.
institution Nanyang Technological University
building NTU Library
country Singapore
collection DR-NTU
language English
topic Engineering::Computer science and engineering
Confidence-based Answer
Active Learning
spellingShingle Engineering::Computer science and engineering
Confidence-based Answer
Active Learning
Song, Jinhua
Wang, Hao
Gao, Yang
An, Bo
Active learning with confidence-based answers for crowdsourcing labeling tasks
description Collecting labels for data is important for many practical applications (e.g., data mining). However, this process can be expensive and time-consuming since it needs extensive efforts of domain experts. To decrease the cost, many recent works combine crowdsourcing, which outsources labeling tasks (usually in the form of questions) to a large group of non-expert workers, and active learning, which actively selects the best instances to be labeled, to acquire labeled datasets. However, for difficult tasks where workers are uncertain about their answers, asking for discrete labels might lead to poor performance due to the low-quality labels. In this paper, we design questions to get continuous worker responses which are more informative and contain workers’ labels as well as their confidence. As crowd workers may make mistakes, multiple workers are hired to answer each question. Then, we propose a new aggregation method to integrate the responses. By considering workers’ confidence information, the accuracy of integrated labels is improved. Furthermore, based on the new answers, we propose a novel active learning framework to iteratively select instances for “labeling”. We define a score function for instance selection by combining the uncertainty derived from the classifier model and the uncertainty derived from the answer sets. The uncertainty derived from uncertain answers is more effective than that derived from labels. We also propose batch methods which select multiple instances at a time to further improve the efficiency of our approach. Experimental studies on both simulated and real data show that our methods are effective in increasing the labeling accuracy and achieve significantly better performance than existing methods.
author2 School of Computer Science and Engineering
author_facet School of Computer Science and Engineering
Song, Jinhua
Wang, Hao
Gao, Yang
An, Bo
format Article
author Song, Jinhua
Wang, Hao
Gao, Yang
An, Bo
author_sort Song, Jinhua
title Active learning with confidence-based answers for crowdsourcing labeling tasks
title_short Active learning with confidence-based answers for crowdsourcing labeling tasks
title_full Active learning with confidence-based answers for crowdsourcing labeling tasks
title_fullStr Active learning with confidence-based answers for crowdsourcing labeling tasks
title_full_unstemmed Active learning with confidence-based answers for crowdsourcing labeling tasks
title_sort active learning with confidence-based answers for crowdsourcing labeling tasks
publishDate 2020
url https://hdl.handle.net/10356/139581
_version_ 1681058615268474880