Cross-modal recipe retrieval with stacked attention model

Taking a picture of delicious food and sharing it in social media has been a popular trend. The ability to recommend recipes along will benefit users who want to cook a particular dish, and the feature is yet to be available. The challenge of recipe retrieval, nevertheless, comes from two aspects. F...

Full description

Saved in:
Bibliographic Details
Main Authors: CHEN, Jing-Jing, PANG, Lei, NGO, Chong-wah
Format: text
Language:English
Published: Institutional Knowledge at Singapore Management University 2018
Subjects:
Online Access:https://ink.library.smu.edu.sg/sis_research/6302
https://ink.library.smu.edu.sg/context/sis_research/article/7305/viewcontent/2018_CrossmodalRecipe.pdf
Tags: Add Tag
No Tags, Be the first to tag this record!
Institution: Singapore Management University
Language: English
id sg-smu-ink.sis_research-7305
record_format dspace
spelling sg-smu-ink.sis_research-73052021-11-23T07:18:30Z Cross-modal recipe retrieval with stacked attention model CHEN, Jing-Jing PANG, Lei NGO, Chong-wah Taking a picture of delicious food and sharing it in social media has been a popular trend. The ability to recommend recipes along will benefit users who want to cook a particular dish, and the feature is yet to be available. The challenge of recipe retrieval, nevertheless, comes from two aspects. First, the current technology in food recognition can only scale up to few hundreds of categories, which are yet to be practical for recognizing tens of thousands of food categories. Second, even one food category can have variants of recipes that differ in ingredient composition. Finding the best-match recipe requires knowledge of ingredients, which is a fine-grained recognition problem. In this paper, we consider the problem from the viewpoint of cross-modality analysis. Given a large number of image and recipe pairs acquired from the Internet, a joint space is learnt to locally capture the ingredient correspondence between images and recipes. As learning happens at the regional level for image and ingredient level for recipe, the model has the ability to generalize recognition to unseen food categories. Furthermore, the embedded multi-modal ingredient feature sheds light on the retrieval of best-match recipes. On an in-house dataset, our model can double the retrieval performance of DeViSE, a popular cross-modality model but not considering region information during learning. 2018-11-01T07:00:00Z text application/pdf https://ink.library.smu.edu.sg/sis_research/6302 info:doi/10.1007/s11042-018-5964-y https://ink.library.smu.edu.sg/context/sis_research/article/7305/viewcontent/2018_CrossmodalRecipe.pdf http://creativecommons.org/licenses/by-nc-nd/4.0/ Research Collection School Of Computing and Information Systems eng Institutional Knowledge at Singapore Management University Recipe retrieval Cross-modal retrieval Multi-modality embedding Computer Sciences Graphics and Human Computer Interfaces
institution Singapore Management University
building SMU Libraries
continent Asia
country Singapore
Singapore
content_provider SMU Libraries
collection InK@SMU
language English
topic Recipe retrieval
Cross-modal retrieval
Multi-modality embedding
Computer Sciences
Graphics and Human Computer Interfaces
spellingShingle Recipe retrieval
Cross-modal retrieval
Multi-modality embedding
Computer Sciences
Graphics and Human Computer Interfaces
CHEN, Jing-Jing
PANG, Lei
NGO, Chong-wah
Cross-modal recipe retrieval with stacked attention model
description Taking a picture of delicious food and sharing it in social media has been a popular trend. The ability to recommend recipes along will benefit users who want to cook a particular dish, and the feature is yet to be available. The challenge of recipe retrieval, nevertheless, comes from two aspects. First, the current technology in food recognition can only scale up to few hundreds of categories, which are yet to be practical for recognizing tens of thousands of food categories. Second, even one food category can have variants of recipes that differ in ingredient composition. Finding the best-match recipe requires knowledge of ingredients, which is a fine-grained recognition problem. In this paper, we consider the problem from the viewpoint of cross-modality analysis. Given a large number of image and recipe pairs acquired from the Internet, a joint space is learnt to locally capture the ingredient correspondence between images and recipes. As learning happens at the regional level for image and ingredient level for recipe, the model has the ability to generalize recognition to unseen food categories. Furthermore, the embedded multi-modal ingredient feature sheds light on the retrieval of best-match recipes. On an in-house dataset, our model can double the retrieval performance of DeViSE, a popular cross-modality model but not considering region information during learning.
format text
author CHEN, Jing-Jing
PANG, Lei
NGO, Chong-wah
author_facet CHEN, Jing-Jing
PANG, Lei
NGO, Chong-wah
author_sort CHEN, Jing-Jing
title Cross-modal recipe retrieval with stacked attention model
title_short Cross-modal recipe retrieval with stacked attention model
title_full Cross-modal recipe retrieval with stacked attention model
title_fullStr Cross-modal recipe retrieval with stacked attention model
title_full_unstemmed Cross-modal recipe retrieval with stacked attention model
title_sort cross-modal recipe retrieval with stacked attention model
publisher Institutional Knowledge at Singapore Management University
publishDate 2018
url https://ink.library.smu.edu.sg/sis_research/6302
https://ink.library.smu.edu.sg/context/sis_research/article/7305/viewcontent/2018_CrossmodalRecipe.pdf
_version_ 1770575930561921024