Learning from web recipe-image pairs for food recognition: Problem, baselines and performance

Cross-modal recipe retrieval has recently been explored for food recognition and understanding. Text-rich recipe provides not only visual content information (e.g., ingredients, dish presentation) but also procedure of food preparation (cutting and cooking styles). The paired data is leveraged to tr...

Full description

Saved in:

Bibliographic Details
Main Authors:	ZHU, Bin, NGO, Chong-wah, CHAN, Wing-Kwong
Format:	text
Language:	English
Published:	Institutional Knowledge at Singapore Management University 2022
Subjects:	Image recognition;Training;Generative adversarial networks;Feature extraction;Visualization;Data models;Context modeling;Food recognition;image-to-recipe retrieval;image-to-image retrieval Databases and Information Systems
Online Access:	https://ink.library.smu.edu.sg/sis_research/7246 https://ink.library.smu.edu.sg/context/sis_research/article/8249/viewcontent/tmm2021_zhu_ngo_chan.pdf
Tags:	Add Tag No Tags, Be the first to tag this record!
Institution:	Singapore Management University
Language:	English

id	sg-smu-ink.sis_research-8249
record_format	dspace
spelling	sg-smu-ink.sis_research-82492024-07-12T10:04:58Z Learning from web recipe-image pairs for food recognition: Problem, baselines and performance ZHU, Bin NGO, Chong-wah CHAN, Wing-Kwong Cross-modal recipe retrieval has recently been explored for food recognition and understanding. Text-rich recipe provides not only visual content information (e.g., ingredients, dish presentation) but also procedure of food preparation (cutting and cooking styles). The paired data is leveraged to train deep models to retrieve recipes for food images. Most recipes on the Web include sample pictures as the references. The paired multimedia data is not noise-free, due to errors such as pairing of images containing partially prepared dishes with recipes. The content of recipes and food images are not always consistent due to free-style writing and preparation of food in different environments. As a consequence, the effectiveness of learning cross-modal deep models from such noisy web data is questionable. This paper conducts an empirical study to provide insights whether the features learnt with noisy pair data are resilient and could capture the modality correspondence between visual and text. 2022-01-01T08:00:00Z text application/pdf https://ink.library.smu.edu.sg/sis_research/7246 info:doi/10.1109/TMM.2021.3123474 https://ink.library.smu.edu.sg/context/sis_research/article/8249/viewcontent/tmm2021_zhu_ngo_chan.pdf http://creativecommons.org/licenses/by-nc-nd/4.0/ Research Collection School Of Computing and Information Systems eng Institutional Knowledge at Singapore Management University Image recognition;Training;Generative adversarial networks;Feature extraction;Visualization;Data models;Context modeling;Food recognition;image-to-recipe retrieval;image-to-image retrieval Databases and Information Systems
institution	Singapore Management University
building	SMU Libraries
continent	Asia
country	Singapore Singapore
content_provider	SMU Libraries
collection	InK@SMU
language	English
topic	Image recognition;Training;Generative adversarial networks;Feature extraction;Visualization;Data models;Context modeling;Food recognition;image-to-recipe retrieval;image-to-image retrieval Databases and Information Systems
spellingShingle	Image recognition;Training;Generative adversarial networks;Feature extraction;Visualization;Data models;Context modeling;Food recognition;image-to-recipe retrieval;image-to-image retrieval Databases and Information Systems ZHU, Bin NGO, Chong-wah CHAN, Wing-Kwong Learning from web recipe-image pairs for food recognition: Problem, baselines and performance
description	Cross-modal recipe retrieval has recently been explored for food recognition and understanding. Text-rich recipe provides not only visual content information (e.g., ingredients, dish presentation) but also procedure of food preparation (cutting and cooking styles). The paired data is leveraged to train deep models to retrieve recipes for food images. Most recipes on the Web include sample pictures as the references. The paired multimedia data is not noise-free, due to errors such as pairing of images containing partially prepared dishes with recipes. The content of recipes and food images are not always consistent due to free-style writing and preparation of food in different environments. As a consequence, the effectiveness of learning cross-modal deep models from such noisy web data is questionable. This paper conducts an empirical study to provide insights whether the features learnt with noisy pair data are resilient and could capture the modality correspondence between visual and text.
format	text
author	ZHU, Bin NGO, Chong-wah CHAN, Wing-Kwong
author_facet	ZHU, Bin NGO, Chong-wah CHAN, Wing-Kwong
author_sort	ZHU, Bin
title	Learning from web recipe-image pairs for food recognition: Problem, baselines and performance
title_short	Learning from web recipe-image pairs for food recognition: Problem, baselines and performance
title_full	Learning from web recipe-image pairs for food recognition: Problem, baselines and performance
title_fullStr	Learning from web recipe-image pairs for food recognition: Problem, baselines and performance
title_full_unstemmed	Learning from web recipe-image pairs for food recognition: Problem, baselines and performance
title_sort	learning from web recipe-image pairs for food recognition: problem, baselines and performance
publisher	Institutional Knowledge at Singapore Management University
publishDate	2022
url	https://ink.library.smu.edu.sg/sis_research/7246 https://ink.library.smu.edu.sg/context/sis_research/article/8249/viewcontent/tmm2021_zhu_ngo_chan.pdf
_version_	1814047649620295680

Learning from web recipe-image pairs for food recognition: Problem, baselines and performance

Similar Items