Active crowdsourcing for annotation
Crowdsourcing has shown great potential in obtaining large-scale and cheap labels for different tasks. However, obtaining reliable labels is challenging due to several reasons, such as noisy annotators, limited budget and so on. The state-of-the-art approaches, either suffer in some noisy scenarios,...
Saved in:
Main Authors: | , , , |
---|---|
Format: | text |
Language: | English |
Published: |
Institutional Knowledge at Singapore Management University
2015
|
Subjects: | |
Online Access: | https://ink.library.smu.edu.sg/sis_research/3173 https://ink.library.smu.edu.sg/context/sis_research/article/4174/viewcontent/Active_Crowdsourcing_for_Annotation_accepted.pdf |
Tags: |
Add Tag
No Tags, Be the first to tag this record!
|
Institution: | Singapore Management University |
Language: | English |
Summary: | Crowdsourcing has shown great potential in obtaining large-scale and cheap labels for different tasks. However, obtaining reliable labels is challenging due to several reasons, such as noisy annotators, limited budget and so on. The state-of-the-art approaches, either suffer in some noisy scenarios, or rely on unlimited resources to acquire reliable labels. In this article, we adopt the learning with expert~(AKA worker in crowdsourcing) advice framework to robustly infer accurate labels by considering the reliability of each worker. However, in order to accurately predict the reliability of each worker, traditional learning with expert advice will consult with external oracles~(AKA domain experts) on the true label of each instance. To reduce the cost of consultation, we proposed two active learning approaches, margin-based and weighted difference of advices based. Meanwhile, to address the problem of limited annotation budget, we proposed a reliability-based assigning approach which actively decides who to annotate the next instance based on each worker's cumulative performance. The experimental results both on real and simulated datasets show that our algorithms can achieve robust and promising performance both in the normal and noisy scenarios with limited budget. |
---|