Batch Mode Active Learning with Applications to Text Categorization and Image Retrieval

Most machine learning tasks in data classification and information retrieval require manually labeled data examples in the training stage. The goal of active learning is to select the most informative examples for manual labeling in these learning tasks. Most of the previous studies in active learni...

Full description

Saved in:
Bibliographic Details
Main Authors: HOI, Steven C. H., JIN, Rong, LYU, Michael R.
Format: text
Language:English
Published: Institutional Knowledge at Singapore Management University 2009
Subjects:
Online Access:https://ink.library.smu.edu.sg/sis_research/2310
https://ink.library.smu.edu.sg/context/sis_research/article/3310/viewcontent/BMAL_TextCat_2009_afv.pdf
Tags: Add Tag
No Tags, Be the first to tag this record!
Institution: Singapore Management University
Language: English
id sg-smu-ink.sis_research-3310
record_format dspace
spelling sg-smu-ink.sis_research-33102018-12-05T09:06:28Z Batch Mode Active Learning with Applications to Text Categorization and Image Retrieval HOI, Steven C. H. JIN, Rong LYU, Michael R. Most machine learning tasks in data classification and information retrieval require manually labeled data examples in the training stage. The goal of active learning is to select the most informative examples for manual labeling in these learning tasks. Most of the previous studies in active learning have focused on selecting a single unlabeled example in each iteration. This could be inefficient, since the classification model has to be retrained for every acquired labeled example. It is also inappropriate for the setup of information retrieval tasks where the user's relevance feedback is often provided for the top K retrieved items. In this paper, we present a framework for batch mode active learning, which selects a number of informative examples for manual labeling in each iteration. The key feature of batch mode active learning is to reduce the redundancy among the selected examples such that each example provides unique information for model updating. To this end, we employ the Fisher information matrix as the measurement of model uncertainty, and choose the set of unlabeled examples that can efficiently reduce the Fisher information of the classification model. We apply our batch mode active learning framework to both text categorization and image retrieval. Promising results show that our algorithms are significantly more effective than the active learning approaches that select unlabeled examples based only on their informativeness for the classification model. 2009-09-01T07:00:00Z text application/pdf https://ink.library.smu.edu.sg/sis_research/2310 info:doi/10.1109/TKDE.2009.60 https://ink.library.smu.edu.sg/context/sis_research/article/3310/viewcontent/BMAL_TextCat_2009_afv.pdf http://creativecommons.org/licenses/by-nc-nd/4.0/ Research Collection School Of Computing and Information Systems eng Institutional Knowledge at Singapore Management University Batch mode active learning convex optimization image retrieval kernel logistic regressions logistic regressions text categorization Databases and Information Systems Theory and Algorithms
institution Singapore Management University
building SMU Libraries
continent Asia
country Singapore
Singapore
content_provider SMU Libraries
collection InK@SMU
language English
topic Batch mode active learning
convex optimization
image retrieval
kernel logistic regressions
logistic regressions
text categorization
Databases and Information Systems
Theory and Algorithms
spellingShingle Batch mode active learning
convex optimization
image retrieval
kernel logistic regressions
logistic regressions
text categorization
Databases and Information Systems
Theory and Algorithms
HOI, Steven C. H.
JIN, Rong
LYU, Michael R.
Batch Mode Active Learning with Applications to Text Categorization and Image Retrieval
description Most machine learning tasks in data classification and information retrieval require manually labeled data examples in the training stage. The goal of active learning is to select the most informative examples for manual labeling in these learning tasks. Most of the previous studies in active learning have focused on selecting a single unlabeled example in each iteration. This could be inefficient, since the classification model has to be retrained for every acquired labeled example. It is also inappropriate for the setup of information retrieval tasks where the user's relevance feedback is often provided for the top K retrieved items. In this paper, we present a framework for batch mode active learning, which selects a number of informative examples for manual labeling in each iteration. The key feature of batch mode active learning is to reduce the redundancy among the selected examples such that each example provides unique information for model updating. To this end, we employ the Fisher information matrix as the measurement of model uncertainty, and choose the set of unlabeled examples that can efficiently reduce the Fisher information of the classification model. We apply our batch mode active learning framework to both text categorization and image retrieval. Promising results show that our algorithms are significantly more effective than the active learning approaches that select unlabeled examples based only on their informativeness for the classification model.
format text
author HOI, Steven C. H.
JIN, Rong
LYU, Michael R.
author_facet HOI, Steven C. H.
JIN, Rong
LYU, Michael R.
author_sort HOI, Steven C. H.
title Batch Mode Active Learning with Applications to Text Categorization and Image Retrieval
title_short Batch Mode Active Learning with Applications to Text Categorization and Image Retrieval
title_full Batch Mode Active Learning with Applications to Text Categorization and Image Retrieval
title_fullStr Batch Mode Active Learning with Applications to Text Categorization and Image Retrieval
title_full_unstemmed Batch Mode Active Learning with Applications to Text Categorization and Image Retrieval
title_sort batch mode active learning with applications to text categorization and image retrieval
publisher Institutional Knowledge at Singapore Management University
publishDate 2009
url https://ink.library.smu.edu.sg/sis_research/2310
https://ink.library.smu.edu.sg/context/sis_research/article/3310/viewcontent/BMAL_TextCat_2009_afv.pdf
_version_ 1770572077511737344