ImageSpirit: Verbal guided image parsing

Humans describe images in terms of nouns and adjectives while algorithms operate on images represented as sets of pixels. Bridging this gap between how humans would like to access images versus their typical representation is the goal of image parsing, which involves assigning object and attribute l...

Full description

Saved in:

Bibliographic Details
Main Authors:	CHENG, Ming-Ming, ZHENG, Shuai, LIN, Wen-yan, VINEET, Vibhav, STURGESS, Paul, CROOK, Nigel, MITRA, Niloy J., TORR, Philip
Format:	text
Language:	English
Published:	Institutional Knowledge at Singapore Management University 2014
Subjects:	Design Human Factors Languages Image parsing natural language control speech interface object class segmentation image parsing visual attributes multilabel CRF Graphics and Human Computer Interfaces
Online Access:	https://ink.library.smu.edu.sg/sis_research/4854 https://ink.library.smu.edu.sg/context/sis_research/article/5857/viewcontent/ImageSpirit__Verbal_Guided_Image_Parsing__AV.pdf
Tags:	Add Tag No Tags, Be the first to tag this record!
Institution:	Singapore Management University
Language:	English

id	sg-smu-ink.sis_research-5857
record_format	dspace
spelling	sg-smu-ink.sis_research-58572020-01-23T07:10:05Z ImageSpirit: Verbal guided image parsing CHENG, Ming-Ming ZHENG, Shuai LIN, Wen-yan VINEET, Vibhav STURGESS, Paul CROOK, Nigel MITRA, Niloy J. TORR, Philip Humans describe images in terms of nouns and adjectives while algorithms operate on images represented as sets of pixels. Bridging this gap between how humans would like to access images versus their typical representation is the goal of image parsing, which involves assigning object and attribute labels to pixels. In this article we propose treating nouns as object labels and adjectives as visual attribute labels. This allows us to formulate the image parsing problem as one of jointly estimating per-pixel object and attribute labels from a set of training images. We propose an efficient (interactive time) solution. Using the extracted labels as handles, our system empowers a user to verbally refine the results. This enables hands-free parsing of an image into pixel-wise object/attribute labels that correspond to human semantics. Verbally selecting objects of interest enables a novel and natural interaction modality that can possibly be used to interact with new generation devices (e.g., smartphones, Google Glass, livingroom devices). We demonstrate our system on a large number of real-world images with varying complexity. To help understand the trade-offs compared to traditional mouse-based interactions, results are reported for both a large-scale quantitative evaluation and a user study. 2014-11-01T07:00:00Z text application/pdf https://ink.library.smu.edu.sg/sis_research/4854 info:doi/10.1145/2682628 https://ink.library.smu.edu.sg/context/sis_research/article/5857/viewcontent/ImageSpirit__Verbal_Guided_Image_Parsing__AV.pdf http://creativecommons.org/licenses/by-nc-nd/4.0/ Research Collection School Of Computing and Information Systems eng Institutional Knowledge at Singapore Management University Design Human Factors Languages Image parsing natural language control speech interface object class segmentation image parsing visual attributes multilabel CRF Graphics and Human Computer Interfaces
institution	Singapore Management University
building	SMU Libraries
continent	Asia
country	Singapore Singapore
content_provider	SMU Libraries
collection	InK@SMU
language	English
topic	Design Human Factors Languages Image parsing natural language control speech interface object class segmentation image parsing visual attributes multilabel CRF Graphics and Human Computer Interfaces
spellingShingle	Design Human Factors Languages Image parsing natural language control speech interface object class segmentation image parsing visual attributes multilabel CRF Graphics and Human Computer Interfaces CHENG, Ming-Ming ZHENG, Shuai LIN, Wen-yan VINEET, Vibhav STURGESS, Paul CROOK, Nigel MITRA, Niloy J. TORR, Philip ImageSpirit: Verbal guided image parsing
description	Humans describe images in terms of nouns and adjectives while algorithms operate on images represented as sets of pixels. Bridging this gap between how humans would like to access images versus their typical representation is the goal of image parsing, which involves assigning object and attribute labels to pixels. In this article we propose treating nouns as object labels and adjectives as visual attribute labels. This allows us to formulate the image parsing problem as one of jointly estimating per-pixel object and attribute labels from a set of training images. We propose an efficient (interactive time) solution. Using the extracted labels as handles, our system empowers a user to verbally refine the results. This enables hands-free parsing of an image into pixel-wise object/attribute labels that correspond to human semantics. Verbally selecting objects of interest enables a novel and natural interaction modality that can possibly be used to interact with new generation devices (e.g., smartphones, Google Glass, livingroom devices). We demonstrate our system on a large number of real-world images with varying complexity. To help understand the trade-offs compared to traditional mouse-based interactions, results are reported for both a large-scale quantitative evaluation and a user study.
format	text
author	CHENG, Ming-Ming ZHENG, Shuai LIN, Wen-yan VINEET, Vibhav STURGESS, Paul CROOK, Nigel MITRA, Niloy J. TORR, Philip
author_facet	CHENG, Ming-Ming ZHENG, Shuai LIN, Wen-yan VINEET, Vibhav STURGESS, Paul CROOK, Nigel MITRA, Niloy J. TORR, Philip
author_sort	CHENG, Ming-Ming
title	ImageSpirit: Verbal guided image parsing
title_short	ImageSpirit: Verbal guided image parsing
title_full	ImageSpirit: Verbal guided image parsing
title_fullStr	ImageSpirit: Verbal guided image parsing
title_full_unstemmed	ImageSpirit: Verbal guided image parsing
title_sort	imagespirit: verbal guided image parsing
publisher	Institutional Knowledge at Singapore Management University
publishDate	2014
url	https://ink.library.smu.edu.sg/sis_research/4854 https://ink.library.smu.edu.sg/context/sis_research/article/5857/viewcontent/ImageSpirit__Verbal_Guided_Image_Parsing__AV.pdf
_version_	1770575064188583936

ImageSpirit: Verbal guided image parsing

Similar Items