Retrieval augmented recipe generation

The growing interest in generating recipes from food images has drawn substantial research attention in recent years. Existing works for recipe generation primarily utilize a two-stage training method—first predicting ingredients from a food image and then generating instructions from both the image...

Full description

Saved in:
Bibliographic Details
Main Authors: LIU, Guoshan, YIN, Hailong, ZHU, Bin, CHEN, Jingjing, NGO, Chong-wah, JIANG, Yu-Gang
Format: text
Language:English
Published: Institutional Knowledge at Singapore Management University 2025
Subjects:
Online Access:https://ink.library.smu.edu.sg/sis_research/9824
https://ink.library.smu.edu.sg/context/sis_research/article/10824/viewcontent/WACV_2025_Author_Kit_RARG.pdf
Tags: Add Tag
No Tags, Be the first to tag this record!
Institution: Singapore Management University
Language: English
id sg-smu-ink.sis_research-10824
record_format dspace
spelling sg-smu-ink.sis_research-108242024-12-24T03:39:00Z Retrieval augmented recipe generation LIU, Guoshan YIN, Hailong ZHU, Bin CHEN, Jingjing NGO, Chong-wah JIANG, Yu-Gang The growing interest in generating recipes from food images has drawn substantial research attention in recent years. Existing works for recipe generation primarily utilize a two-stage training method—first predicting ingredients from a food image and then generating instructions from both the image and ingredients. Large Multi-modal Models (LMMs), which have achieved notable success across a variety of vision and language tasks, shed light on generating both ingredients and instructions directly from images. Nevertheless, LMMs still face the common issue of hallu- cinations during recipe generation, leading to suboptimal performance. To tackle this issue, we propose a retrieval augmented large multimodal model for recipe generation. We first introduce Stochastic Diversified Retrieval Augmentation (SDRA) to retrieve recipes semantically related to the image from an existing datastore as a supplement, integrating them into the prompt to add diverse and rich context to the input image. Additionally, Self-Consistency Ensemble Voting mechanism is proposed to determine the most confident prediction recipes as the final output. It calculates the consistency among generated recipe candidates, which use different retrieval recipes as context for generation. Extensive experiments validate the effectiveness of our proposed method, which demonstrates state-of-the-art (SOTA) performance in recipe generation on the Recipe1M dataset. 2025-03-01T08:00:00Z text application/pdf https://ink.library.smu.edu.sg/sis_research/9824 https://ink.library.smu.edu.sg/context/sis_research/article/10824/viewcontent/WACV_2025_Author_Kit_RARG.pdf http://creativecommons.org/licenses/by-nc-nd/4.0/ Research Collection School Of Computing and Information Systems eng Institutional Knowledge at Singapore Management University Retrieval augmented generation recipe generation Large Multi-modal Model Databases and Information Systems
institution Singapore Management University
building SMU Libraries
continent Asia
country Singapore
Singapore
content_provider SMU Libraries
collection InK@SMU
language English
topic Retrieval augmented generation
recipe generation
Large Multi-modal Model
Databases and Information Systems
spellingShingle Retrieval augmented generation
recipe generation
Large Multi-modal Model
Databases and Information Systems
LIU, Guoshan
YIN, Hailong
ZHU, Bin
CHEN, Jingjing
NGO, Chong-wah
JIANG, Yu-Gang
Retrieval augmented recipe generation
description The growing interest in generating recipes from food images has drawn substantial research attention in recent years. Existing works for recipe generation primarily utilize a two-stage training method—first predicting ingredients from a food image and then generating instructions from both the image and ingredients. Large Multi-modal Models (LMMs), which have achieved notable success across a variety of vision and language tasks, shed light on generating both ingredients and instructions directly from images. Nevertheless, LMMs still face the common issue of hallu- cinations during recipe generation, leading to suboptimal performance. To tackle this issue, we propose a retrieval augmented large multimodal model for recipe generation. We first introduce Stochastic Diversified Retrieval Augmentation (SDRA) to retrieve recipes semantically related to the image from an existing datastore as a supplement, integrating them into the prompt to add diverse and rich context to the input image. Additionally, Self-Consistency Ensemble Voting mechanism is proposed to determine the most confident prediction recipes as the final output. It calculates the consistency among generated recipe candidates, which use different retrieval recipes as context for generation. Extensive experiments validate the effectiveness of our proposed method, which demonstrates state-of-the-art (SOTA) performance in recipe generation on the Recipe1M dataset.
format text
author LIU, Guoshan
YIN, Hailong
ZHU, Bin
CHEN, Jingjing
NGO, Chong-wah
JIANG, Yu-Gang
author_facet LIU, Guoshan
YIN, Hailong
ZHU, Bin
CHEN, Jingjing
NGO, Chong-wah
JIANG, Yu-Gang
author_sort LIU, Guoshan
title Retrieval augmented recipe generation
title_short Retrieval augmented recipe generation
title_full Retrieval augmented recipe generation
title_fullStr Retrieval augmented recipe generation
title_full_unstemmed Retrieval augmented recipe generation
title_sort retrieval augmented recipe generation
publisher Institutional Knowledge at Singapore Management University
publishDate 2025
url https://ink.library.smu.edu.sg/sis_research/9824
https://ink.library.smu.edu.sg/context/sis_research/article/10824/viewcontent/WACV_2025_Author_Kit_RARG.pdf
_version_ 1820027792127950848