Snap-and-ask: Answering multimodal question by naming visual instance

In real-life, it is easier to provide a visual cue when asking a question about a possibly unfamiliar topic, for example, asking the question, “Where was this crop circle found?”. Providing an image of the instance is far more convenient than texting a verbose description of the visual properties, e...

Full description

Saved in:

Bibliographic Details
Main Authors:	ZHANG, Wei, PANG, Lei, NGO, Chong-wah
Format:	text
Language:	English
Published:	Institutional Knowledge at Singapore Management University 2012
Subjects:	multimedia question answering similar question search visual instance search Graphics and Human Computer Interfaces Theory and Algorithms
Online Access:	https://ink.library.smu.edu.sg/sis_research/6441 https://ink.library.smu.edu.sg/context/sis_research/article/7444/viewcontent/2393347.2393432.pdf
Tags:	Add Tag No Tags, Be the first to tag this record!
Institution:	Singapore Management University
Language:	English

id	sg-smu-ink.sis_research-7444
record_format	dspace
spelling	sg-smu-ink.sis_research-74442022-01-10T06:25:49Z Snap-and-ask: Answering multimodal question by naming visual instance ZHANG, Wei PANG, Lei NGO, Chong-wah In real-life, it is easier to provide a visual cue when asking a question about a possibly unfamiliar topic, for example, asking the question, “Where was this crop circle found?”. Providing an image of the instance is far more convenient than texting a verbose description of the visual properties, especially when the name of the query instance is not known. Nevertheless, having to identify the visual instance before processing the question and eventually returning the answer makes multimodal question-answering technically challenging. This paper addresses the problem of visual-totext naming through the paradigm of answering-by-search in a two-stage computational framework, which is composed out of instance search (IS) and similar question ranking (QR). In IS, names of the instances are inferred from similar visual examples searched through a million-scale image dataset. For recalling instances of non-planar and non-rigid shapes, spatial configurations that emphasize topology consistency while allowing for local variations in matches have been incorporated. In QR, the candidate names of the instance are statistically identified from search results and directly utilized to retrieve similar questions from communitycontributed QA (cQA) archives. By parsing questions into syntactic trees, a fuzzy matching between the inquirer’s question and cQA questions is performed to locate answers and recommend related questions to the inquirer. The proposed framework is evaluated on a wide range of visual instances (e.g., fashion, art, food, pet, logo, and landmark) over various QA categories (e.g., factoid, definition, how-to, and opinion). 2012-11-01T07:00:00Z text application/pdf https://ink.library.smu.edu.sg/sis_research/6441 info:doi/10.1145/2393347.2393432 https://ink.library.smu.edu.sg/context/sis_research/article/7444/viewcontent/2393347.2393432.pdf http://creativecommons.org/licenses/by-nc-nd/4.0/ Research Collection School Of Computing and Information Systems eng Institutional Knowledge at Singapore Management University multimedia question answering similar question search visual instance search Graphics and Human Computer Interfaces Theory and Algorithms
institution	Singapore Management University
building	SMU Libraries
continent	Asia
country	Singapore Singapore
content_provider	SMU Libraries
collection	InK@SMU
language	English
topic	multimedia question answering similar question search visual instance search Graphics and Human Computer Interfaces Theory and Algorithms
spellingShingle	multimedia question answering similar question search visual instance search Graphics and Human Computer Interfaces Theory and Algorithms ZHANG, Wei PANG, Lei NGO, Chong-wah Snap-and-ask: Answering multimodal question by naming visual instance
description	In real-life, it is easier to provide a visual cue when asking a question about a possibly unfamiliar topic, for example, asking the question, “Where was this crop circle found?”. Providing an image of the instance is far more convenient than texting a verbose description of the visual properties, especially when the name of the query instance is not known. Nevertheless, having to identify the visual instance before processing the question and eventually returning the answer makes multimodal question-answering technically challenging. This paper addresses the problem of visual-totext naming through the paradigm of answering-by-search in a two-stage computational framework, which is composed out of instance search (IS) and similar question ranking (QR). In IS, names of the instances are inferred from similar visual examples searched through a million-scale image dataset. For recalling instances of non-planar and non-rigid shapes, spatial configurations that emphasize topology consistency while allowing for local variations in matches have been incorporated. In QR, the candidate names of the instance are statistically identified from search results and directly utilized to retrieve similar questions from communitycontributed QA (cQA) archives. By parsing questions into syntactic trees, a fuzzy matching between the inquirer’s question and cQA questions is performed to locate answers and recommend related questions to the inquirer. The proposed framework is evaluated on a wide range of visual instances (e.g., fashion, art, food, pet, logo, and landmark) over various QA categories (e.g., factoid, definition, how-to, and opinion).
format	text
author	ZHANG, Wei PANG, Lei NGO, Chong-wah
author_facet	ZHANG, Wei PANG, Lei NGO, Chong-wah
author_sort	ZHANG, Wei
title	Snap-and-ask: Answering multimodal question by naming visual instance
title_short	Snap-and-ask: Answering multimodal question by naming visual instance
title_full	Snap-and-ask: Answering multimodal question by naming visual instance
title_fullStr	Snap-and-ask: Answering multimodal question by naming visual instance
title_full_unstemmed	Snap-and-ask: Answering multimodal question by naming visual instance
title_sort	snap-and-ask: answering multimodal question by naming visual instance
publisher	Institutional Knowledge at Singapore Management University
publishDate	2012
url	https://ink.library.smu.edu.sg/sis_research/6441 https://ink.library.smu.edu.sg/context/sis_research/article/7444/viewcontent/2393347.2393432.pdf
_version_	1770575961027248128

Snap-and-ask: Answering multimodal question by naming visual instance

Similar Items