Deep understanding of cooking procedure for cross-modal recipe retrieval

Finding a right recipe that describes the cooking procedure for a dish from just one picture is inherently a difficult problem. Food preparation undergoes a complex process involving raw ingredients, utensils, cutting and cooking operations. This process gives clues to the multimedia presentation of...

Full description

Saved in:

Bibliographic Details
Main Authors:	CHEN, Jingjing, NGO, Chong-wah, FENG, Fu-Li, CHUA, Tat-Seng
Format:	text
Language:	English
Published:	Institutional Knowledge at Singapore Management University 2018
Subjects:	Cross-modal learning Hierarchical attention Recipe retrieval Databases and Information Systems Graphics and Human Computer Interfaces
Online Access:	https://ink.library.smu.edu.sg/sis_research/6461 https://ink.library.smu.edu.sg/context/sis_research/article/7464/viewcontent/2018_p1020_chen.pdf
Tags:	Add Tag No Tags, Be the first to tag this record!
Institution:	Singapore Management University
Language:	English

id	sg-smu-ink.sis_research-7464
record_format	dspace
spelling	sg-smu-ink.sis_research-74642023-08-04T01:37:22Z Deep understanding of cooking procedure for cross-modal recipe retrieval CHEN, Jingjing NGO, Chong-wah FENG, Fu-Li CHUA, Tat-Seng Finding a right recipe that describes the cooking procedure for a dish from just one picture is inherently a difficult problem. Food preparation undergoes a complex process involving raw ingredients, utensils, cutting and cooking operations. This process gives clues to the multimedia presentation of a dish (e.g., taste, colour, shape). However, the description of the process is implicit, implying only the cause of dish presentation rather than the visual effect that can be vividly observed on a picture. Therefore, different from other cross-modal retrieval problems in the literature, recipe search requires the understanding of textually described procedure to predict its possible consequence on visual appearance. In this paper, we approach this problem from the perspective of attention modeling. Specifically, we model the attention of words and sentences in a recipe and align them with its image feature such that both text and visual features share high similarity in multi-dimensional space. Through a large food dataset, Recipe1M, we empirically demonstrate that understanding the cooking procedure can lead to improvement in a large margin compared to the existing methods which mostly consider only ingredient information. Furthermore, with attention modeling, we show that language-specific namedentity extraction becomes optional. The result gives light to the feasibility of performing cross-lingual cross-modal recipe retrieval with off-the-shelf machine translation engines. 2018-10-01T07:00:00Z text application/pdf https://ink.library.smu.edu.sg/sis_research/6461 info:doi/10.1145/3240508.3240627 https://ink.library.smu.edu.sg/context/sis_research/article/7464/viewcontent/2018_p1020_chen.pdf http://creativecommons.org/licenses/by-nc-nd/4.0/ Research Collection School Of Computing and Information Systems eng Institutional Knowledge at Singapore Management University Cross-modal learning Hierarchical attention Recipe retrieval Databases and Information Systems Graphics and Human Computer Interfaces
institution	Singapore Management University
building	SMU Libraries
continent	Asia
country	Singapore Singapore
content_provider	SMU Libraries
collection	InK@SMU
language	English
topic	Cross-modal learning Hierarchical attention Recipe retrieval Databases and Information Systems Graphics and Human Computer Interfaces
spellingShingle	Cross-modal learning Hierarchical attention Recipe retrieval Databases and Information Systems Graphics and Human Computer Interfaces CHEN, Jingjing NGO, Chong-wah FENG, Fu-Li CHUA, Tat-Seng Deep understanding of cooking procedure for cross-modal recipe retrieval
description	Finding a right recipe that describes the cooking procedure for a dish from just one picture is inherently a difficult problem. Food preparation undergoes a complex process involving raw ingredients, utensils, cutting and cooking operations. This process gives clues to the multimedia presentation of a dish (e.g., taste, colour, shape). However, the description of the process is implicit, implying only the cause of dish presentation rather than the visual effect that can be vividly observed on a picture. Therefore, different from other cross-modal retrieval problems in the literature, recipe search requires the understanding of textually described procedure to predict its possible consequence on visual appearance. In this paper, we approach this problem from the perspective of attention modeling. Specifically, we model the attention of words and sentences in a recipe and align them with its image feature such that both text and visual features share high similarity in multi-dimensional space. Through a large food dataset, Recipe1M, we empirically demonstrate that understanding the cooking procedure can lead to improvement in a large margin compared to the existing methods which mostly consider only ingredient information. Furthermore, with attention modeling, we show that language-specific namedentity extraction becomes optional. The result gives light to the feasibility of performing cross-lingual cross-modal recipe retrieval with off-the-shelf machine translation engines.
format	text
author	CHEN, Jingjing NGO, Chong-wah FENG, Fu-Li CHUA, Tat-Seng
author_facet	CHEN, Jingjing NGO, Chong-wah FENG, Fu-Li CHUA, Tat-Seng
author_sort	CHEN, Jingjing
title	Deep understanding of cooking procedure for cross-modal recipe retrieval
title_short	Deep understanding of cooking procedure for cross-modal recipe retrieval
title_full	Deep understanding of cooking procedure for cross-modal recipe retrieval
title_fullStr	Deep understanding of cooking procedure for cross-modal recipe retrieval
title_full_unstemmed	Deep understanding of cooking procedure for cross-modal recipe retrieval
title_sort	deep understanding of cooking procedure for cross-modal recipe retrieval
publisher	Institutional Knowledge at Singapore Management University
publishDate	2018
url	https://ink.library.smu.edu.sg/sis_research/6461 https://ink.library.smu.edu.sg/context/sis_research/article/7464/viewcontent/2018_p1020_chen.pdf
_version_	1773551429853642752

Deep understanding of cooking procedure for cross-modal recipe retrieval

Similar Items