On the sampling of web images for learning visual concept classifiers

Visual concept learning often requires a large set of training images. In practice, nevertheless, acquiring noise-free training labels with sufficient positive examples is always expensive. A plausible solution for training data collection is by sampling the largely available user-tagged images from...

Full description

Saved in:

Bibliographic Details
Main Authors:	ZHU, Shiai, WANG, Gang, NGO, Chong-wah, JIANG, Yu-Gang
Format:	text
Language:	English
Published:	Institutional Knowledge at Singapore Management University 2010
Subjects:	Concept detection Sampling Web images Data Storage Systems Graphics and Human Computer Interfaces
Online Access:	https://ink.library.smu.edu.sg/sis_research/6479 https://ink.library.smu.edu.sg/context/sis_research/article/7482/viewcontent/On_the_sampling_of_web_images_for_learning_visual_concept_classifiers.pdf
Tags:	Add Tag No Tags, Be the first to tag this record!
Institution:	Singapore Management University
Language:	English

id	sg-smu-ink.sis_research-7482
record_format	dspace
spelling	sg-smu-ink.sis_research-74822022-01-10T05:36:57Z On the sampling of web images for learning visual concept classifiers ZHU, Shiai WANG, Gang NGO, Chong-wah JIANG, Yu-Gang Visual concept learning often requires a large set of training images. In practice, nevertheless, acquiring noise-free training labels with sufficient positive examples is always expensive. A plausible solution for training data collection is by sampling the largely available user-tagged images from social media websites. With the general belief that the probability of correct tagging is higher than that of incorrect tagging, such a solution often sounds feasible, though is not without challenges. First, user-tags can be subjective and, to certain extent, are ambiguous. For instance, an image tagged with “whales” may be simply a picture about ocean museum. Learning concept “whales” with such training samples will not be effective. Second, user-tags can be overly abbreviated. For instance, an image about concept “wedding” may be tagged with “love” or simply the couple’s names. As a result, crawling sufficient positive training examples is difficult. This paper empirically studies the impact of exploiting the tagged images towards concept learning, investigating the issue of how the quality of pseudo training images affects concept detection performance. In addition, we propose a simple approach, named semantic field, for predicting the relevance between a target concept and the tag list associated with the images. Specifically, the relevance is determined through concept-tag co-occurrence by exploring external sources such as WordNet and Wikipedia. The proposed approach is shown to be effective in selecting pseudo training examples, exhibiting better performance in concept learning than other approaches such as those based on keyword sampling and tag voting. 2010-07-01T07:00:00Z text application/pdf https://ink.library.smu.edu.sg/sis_research/6479 info:doi/10.1145/1816041.1816051 https://ink.library.smu.edu.sg/context/sis_research/article/7482/viewcontent/On_the_sampling_of_web_images_for_learning_visual_concept_classifiers.pdf http://creativecommons.org/licenses/by-nc-nd/4.0/ Research Collection School Of Computing and Information Systems eng Institutional Knowledge at Singapore Management University Concept detection Sampling Web images Data Storage Systems Graphics and Human Computer Interfaces
institution	Singapore Management University
building	SMU Libraries
continent	Asia
country	Singapore Singapore
content_provider	SMU Libraries
collection	InK@SMU
language	English
topic	Concept detection Sampling Web images Data Storage Systems Graphics and Human Computer Interfaces
spellingShingle	Concept detection Sampling Web images Data Storage Systems Graphics and Human Computer Interfaces ZHU, Shiai WANG, Gang NGO, Chong-wah JIANG, Yu-Gang On the sampling of web images for learning visual concept classifiers
description	Visual concept learning often requires a large set of training images. In practice, nevertheless, acquiring noise-free training labels with sufficient positive examples is always expensive. A plausible solution for training data collection is by sampling the largely available user-tagged images from social media websites. With the general belief that the probability of correct tagging is higher than that of incorrect tagging, such a solution often sounds feasible, though is not without challenges. First, user-tags can be subjective and, to certain extent, are ambiguous. For instance, an image tagged with “whales” may be simply a picture about ocean museum. Learning concept “whales” with such training samples will not be effective. Second, user-tags can be overly abbreviated. For instance, an image about concept “wedding” may be tagged with “love” or simply the couple’s names. As a result, crawling sufficient positive training examples is difficult. This paper empirically studies the impact of exploiting the tagged images towards concept learning, investigating the issue of how the quality of pseudo training images affects concept detection performance. In addition, we propose a simple approach, named semantic field, for predicting the relevance between a target concept and the tag list associated with the images. Specifically, the relevance is determined through concept-tag co-occurrence by exploring external sources such as WordNet and Wikipedia. The proposed approach is shown to be effective in selecting pseudo training examples, exhibiting better performance in concept learning than other approaches such as those based on keyword sampling and tag voting.
format	text
author	ZHU, Shiai WANG, Gang NGO, Chong-wah JIANG, Yu-Gang
author_facet	ZHU, Shiai WANG, Gang NGO, Chong-wah JIANG, Yu-Gang
author_sort	ZHU, Shiai
title	On the sampling of web images for learning visual concept classifiers
title_short	On the sampling of web images for learning visual concept classifiers
title_full	On the sampling of web images for learning visual concept classifiers
title_fullStr	On the sampling of web images for learning visual concept classifiers
title_full_unstemmed	On the sampling of web images for learning visual concept classifiers
title_sort	on the sampling of web images for learning visual concept classifiers
publisher	Institutional Knowledge at Singapore Management University
publishDate	2010
url	https://ink.library.smu.edu.sg/sis_research/6479 https://ink.library.smu.edu.sg/context/sis_research/article/7482/viewcontent/On_the_sampling_of_web_images_for_learning_visual_concept_classifiers.pdf
_version_	1770575972791222272

On the sampling of web images for learning visual concept classifiers

Similar Items