Real: A representative error-driven approach for active learning

Given a limited labeling budget, active learning (al) aims to sample the most informative instances from an unlabeled pool to acquire labels for subsequent model training. To achieve this, al typically measures the informativeness of unlabeled instances based on uncertainty and diversity. However, i...

Full description

Saved in:

Bibliographic Details
Main Authors:	CHEN, Cheng, WANG, Yong, LIAO, Lizi, CHEN, Yueguo, DU, Xiaoyong
Format:	text
Language:	English
Published:	Institutional Knowledge at Singapore Management University 2023
Subjects:	Active Learning Error density Error-driven Informativeness Labelings Model training Neighbourhood Pseudo errors Text classification Databases and Information Systems
Online Access:	https://ink.library.smu.edu.sg/sis_research/8586 https://ink.library.smu.edu.sg/context/sis_research/article/9589/viewcontent/real.pdf
Tags:	Add Tag No Tags, Be the first to tag this record!
Institution:	Singapore Management University
Language:	English

id	sg-smu-ink.sis_research-9589
record_format	dspace
spelling	sg-smu-ink.sis_research-95892024-01-25T08:53:12Z Real: A representative error-driven approach for active learning CHEN, Cheng WANG, Yong LIAO, Lizi CHEN, Yueguo DU, Xiaoyong Given a limited labeling budget, active learning (al) aims to sample the most informative instances from an unlabeled pool to acquire labels for subsequent model training. To achieve this, al typically measures the informativeness of unlabeled instances based on uncertainty and diversity. However, it does not consider erroneous instances with their neighborhood error density, which have great potential to improve the model performance. To address this limitation, we propose Real, a novel approach to select data instances with Representative Errors for Active Learning. It identifies minority predictions as pseudo errors within a cluster and allocates an adaptive sampling budget for the cluster based on estimated error density. Extensive experiments on five text classification datasets demonstrate that Real consistently outperforms all best-performing baselines regarding accuracy and F1-macro scores across a wide range of hyperparameter settings. Our analysis also shows that Real selects the most representative pseudo errors that match the distribution of ground-truth errors along the decision boundary. Our code is publicly available at https://github.com/withchencheng/ECML_PKDD_23_Real. 2023-09-01T07:00:00Z text application/pdf https://ink.library.smu.edu.sg/sis_research/8586 info:doi/10.1007/978-3-031-43412-9_2 https://ink.library.smu.edu.sg/context/sis_research/article/9589/viewcontent/real.pdf http://creativecommons.org/licenses/by-nc-nd/4.0/ Research Collection School Of Computing and Information Systems eng Institutional Knowledge at Singapore Management University Active Learning Error density Error-driven Informativeness Labelings Model training Neighbourhood Pseudo errors Text classification Databases and Information Systems
institution	Singapore Management University
building	SMU Libraries
continent	Asia
country	Singapore Singapore
content_provider	SMU Libraries
collection	InK@SMU
language	English
topic	Active Learning Error density Error-driven Informativeness Labelings Model training Neighbourhood Pseudo errors Text classification Databases and Information Systems
spellingShingle	Active Learning Error density Error-driven Informativeness Labelings Model training Neighbourhood Pseudo errors Text classification Databases and Information Systems CHEN, Cheng WANG, Yong LIAO, Lizi CHEN, Yueguo DU, Xiaoyong Real: A representative error-driven approach for active learning
description	Given a limited labeling budget, active learning (al) aims to sample the most informative instances from an unlabeled pool to acquire labels for subsequent model training. To achieve this, al typically measures the informativeness of unlabeled instances based on uncertainty and diversity. However, it does not consider erroneous instances with their neighborhood error density, which have great potential to improve the model performance. To address this limitation, we propose Real, a novel approach to select data instances with Representative Errors for Active Learning. It identifies minority predictions as pseudo errors within a cluster and allocates an adaptive sampling budget for the cluster based on estimated error density. Extensive experiments on five text classification datasets demonstrate that Real consistently outperforms all best-performing baselines regarding accuracy and F1-macro scores across a wide range of hyperparameter settings. Our analysis also shows that Real selects the most representative pseudo errors that match the distribution of ground-truth errors along the decision boundary. Our code is publicly available at https://github.com/withchencheng/ECML_PKDD_23_Real.
format	text
author	CHEN, Cheng WANG, Yong LIAO, Lizi CHEN, Yueguo DU, Xiaoyong
author_facet	CHEN, Cheng WANG, Yong LIAO, Lizi CHEN, Yueguo DU, Xiaoyong
author_sort	CHEN, Cheng
title	Real: A representative error-driven approach for active learning
title_short	Real: A representative error-driven approach for active learning
title_full	Real: A representative error-driven approach for active learning
title_fullStr	Real: A representative error-driven approach for active learning
title_full_unstemmed	Real: A representative error-driven approach for active learning
title_sort	real: a representative error-driven approach for active learning
publisher	Institutional Knowledge at Singapore Management University
publishDate	2023
url	https://ink.library.smu.edu.sg/sis_research/8586 https://ink.library.smu.edu.sg/context/sis_research/article/9589/viewcontent/real.pdf
_version_	1789483280896098304

Real: A representative error-driven approach for active learning

Similar Items