Cross-modal recipe retrieval: How to cook this dish?

In social media users like to share food pictures. One intelligent feature, potentially attractive to amateur chefs, is the recommendation of recipe along with food. Having this feature, unfortunately, is still technically challenging. First, the current technology in food recognition can only scale...

Full description

Saved in:

Bibliographic Details
Main Authors:	CHEN, Jingjing, PANG, Lei, NGO, Chong-wah
Format:	text
Language:	English
Published:	Institutional Knowledge at Singapore Management University 2017
Subjects:	Cross-modal retrieval Multi-modality embedding Recipe retrieval Databases and Information Systems Graphics and Human Computer Interfaces
Online Access:	https://ink.library.smu.edu.sg/sis_research/6674 https://ink.library.smu.edu.sg/context/sis_research/article/7677/viewcontent/10.1007_978_3_319_51811_4.pdf
Tags:	Add Tag No Tags, Be the first to tag this record!
Institution:	Singapore Management University
Language:	English

id	sg-smu-ink.sis_research-7677
record_format	dspace
spelling	sg-smu-ink.sis_research-76772023-08-21T00:38:54Z Cross-modal recipe retrieval: How to cook this dish? CHEN, Jingjing PANG, Lei NGO, Chong-wah In social media users like to share food pictures. One intelligent feature, potentially attractive to amateur chefs, is the recommendation of recipe along with food. Having this feature, unfortunately, is still technically challenging. First, the current technology in food recognition can only scale up to few hundreds of categories, which are yet to be practical for recognizing ten of thousands of food categories. Second, even one food category can have variants of recipes that differ in ingredient composition. Finding the best-match recipe requires knowledge of ingredients, which is a fine-grained recognition problem. In this paper, we consider the problem from the viewpoint of cross-modality analysis. Given a large number of image and recipe pairs acquired from the Internet, a joint space is learnt to locally capture the ingredient correspondence from images and recipes. As learning happens at the region level for image and ingredient level for recipe, the model has ability to generalize recognition to unseen food categories. Furthermore, the embedded multi-modal ingredient feature sheds light on the retrieval of best-match recipes. On an in-house dataset, our model can double the retrieval performance of DeViSE, a popular cross-modality model but not considering region information during learning. 2017-01-01T08:00:00Z text application/pdf https://ink.library.smu.edu.sg/sis_research/6674 info:doi/10.1007/978-3-319-51811-4_48 https://ink.library.smu.edu.sg/context/sis_research/article/7677/viewcontent/10.1007_978_3_319_51811_4.pdf http://creativecommons.org/licenses/by-nc-nd/4.0/ Research Collection School Of Computing and Information Systems eng Institutional Knowledge at Singapore Management University Cross-modal retrieval Multi-modality embedding Recipe retrieval Databases and Information Systems Graphics and Human Computer Interfaces
institution	Singapore Management University
building	SMU Libraries
continent	Asia
country	Singapore Singapore
content_provider	SMU Libraries
collection	InK@SMU
language	English
topic	Cross-modal retrieval Multi-modality embedding Recipe retrieval Databases and Information Systems Graphics and Human Computer Interfaces
spellingShingle	Cross-modal retrieval Multi-modality embedding Recipe retrieval Databases and Information Systems Graphics and Human Computer Interfaces CHEN, Jingjing PANG, Lei NGO, Chong-wah Cross-modal recipe retrieval: How to cook this dish?
description	In social media users like to share food pictures. One intelligent feature, potentially attractive to amateur chefs, is the recommendation of recipe along with food. Having this feature, unfortunately, is still technically challenging. First, the current technology in food recognition can only scale up to few hundreds of categories, which are yet to be practical for recognizing ten of thousands of food categories. Second, even one food category can have variants of recipes that differ in ingredient composition. Finding the best-match recipe requires knowledge of ingredients, which is a fine-grained recognition problem. In this paper, we consider the problem from the viewpoint of cross-modality analysis. Given a large number of image and recipe pairs acquired from the Internet, a joint space is learnt to locally capture the ingredient correspondence from images and recipes. As learning happens at the region level for image and ingredient level for recipe, the model has ability to generalize recognition to unseen food categories. Furthermore, the embedded multi-modal ingredient feature sheds light on the retrieval of best-match recipes. On an in-house dataset, our model can double the retrieval performance of DeViSE, a popular cross-modality model but not considering region information during learning.
format	text
author	CHEN, Jingjing PANG, Lei NGO, Chong-wah
author_facet	CHEN, Jingjing PANG, Lei NGO, Chong-wah
author_sort	CHEN, Jingjing
title	Cross-modal recipe retrieval: How to cook this dish?
title_short	Cross-modal recipe retrieval: How to cook this dish?
title_full	Cross-modal recipe retrieval: How to cook this dish?
title_fullStr	Cross-modal recipe retrieval: How to cook this dish?
title_full_unstemmed	Cross-modal recipe retrieval: How to cook this dish?
title_sort	cross-modal recipe retrieval: how to cook this dish?
publisher	Institutional Knowledge at Singapore Management University
publishDate	2017
url	https://ink.library.smu.edu.sg/sis_research/6674 https://ink.library.smu.edu.sg/context/sis_research/article/7677/viewcontent/10.1007_978_3_319_51811_4.pdf
_version_	1779156890359431168

Cross-modal recipe retrieval: How to cook this dish?

Similar Items