Retrieval augmented recipe generation
The growing interest in generating recipes from food images has drawn substantial research attention in recent years. Existing works for recipe generation primarily utilize a two-stage training method—first predicting ingredients from a food image and then generating instructions from both the image...
Saved in:
Main Authors: | , , , , , |
---|---|
Format: | text |
Language: | English |
Published: |
Institutional Knowledge at Singapore Management University
2025
|
Subjects: | |
Online Access: | https://ink.library.smu.edu.sg/sis_research/9824 https://ink.library.smu.edu.sg/context/sis_research/article/10824/viewcontent/WACV_2025_Author_Kit_RARG.pdf |
Tags: |
Add Tag
No Tags, Be the first to tag this record!
|
Institution: | Singapore Management University |
Language: | English |
id |
sg-smu-ink.sis_research-10824 |
---|---|
record_format |
dspace |
spelling |
sg-smu-ink.sis_research-108242024-12-24T03:39:00Z Retrieval augmented recipe generation LIU, Guoshan YIN, Hailong ZHU, Bin CHEN, Jingjing NGO, Chong-wah JIANG, Yu-Gang The growing interest in generating recipes from food images has drawn substantial research attention in recent years. Existing works for recipe generation primarily utilize a two-stage training method—first predicting ingredients from a food image and then generating instructions from both the image and ingredients. Large Multi-modal Models (LMMs), which have achieved notable success across a variety of vision and language tasks, shed light on generating both ingredients and instructions directly from images. Nevertheless, LMMs still face the common issue of hallu- cinations during recipe generation, leading to suboptimal performance. To tackle this issue, we propose a retrieval augmented large multimodal model for recipe generation. We first introduce Stochastic Diversified Retrieval Augmentation (SDRA) to retrieve recipes semantically related to the image from an existing datastore as a supplement, integrating them into the prompt to add diverse and rich context to the input image. Additionally, Self-Consistency Ensemble Voting mechanism is proposed to determine the most confident prediction recipes as the final output. It calculates the consistency among generated recipe candidates, which use different retrieval recipes as context for generation. Extensive experiments validate the effectiveness of our proposed method, which demonstrates state-of-the-art (SOTA) performance in recipe generation on the Recipe1M dataset. 2025-03-01T08:00:00Z text application/pdf https://ink.library.smu.edu.sg/sis_research/9824 https://ink.library.smu.edu.sg/context/sis_research/article/10824/viewcontent/WACV_2025_Author_Kit_RARG.pdf http://creativecommons.org/licenses/by-nc-nd/4.0/ Research Collection School Of Computing and Information Systems eng Institutional Knowledge at Singapore Management University Retrieval augmented generation recipe generation Large Multi-modal Model Databases and Information Systems |
institution |
Singapore Management University |
building |
SMU Libraries |
continent |
Asia |
country |
Singapore Singapore |
content_provider |
SMU Libraries |
collection |
InK@SMU |
language |
English |
topic |
Retrieval augmented generation recipe generation Large Multi-modal Model Databases and Information Systems |
spellingShingle |
Retrieval augmented generation recipe generation Large Multi-modal Model Databases and Information Systems LIU, Guoshan YIN, Hailong ZHU, Bin CHEN, Jingjing NGO, Chong-wah JIANG, Yu-Gang Retrieval augmented recipe generation |
description |
The growing interest in generating recipes from food images has drawn substantial research attention in recent years. Existing works for recipe generation primarily utilize a two-stage training method—first predicting ingredients from a food image and then generating instructions from both the image and ingredients. Large Multi-modal Models (LMMs), which have achieved notable success across a variety of vision and language tasks, shed light on generating both ingredients and instructions directly from images. Nevertheless, LMMs still face the common issue of hallu- cinations during recipe generation, leading to suboptimal performance. To tackle this issue, we propose a retrieval augmented large multimodal model for recipe generation. We first introduce Stochastic Diversified Retrieval Augmentation (SDRA) to retrieve recipes semantically related to the image from an existing datastore as a supplement, integrating them into the prompt to add diverse and rich context to the input image. Additionally, Self-Consistency Ensemble Voting mechanism is proposed to determine the most confident prediction recipes as the final output. It calculates the consistency among generated recipe candidates, which use different retrieval recipes as context for generation. Extensive experiments validate the effectiveness of our proposed method, which demonstrates state-of-the-art (SOTA) performance in recipe generation on the Recipe1M dataset. |
format |
text |
author |
LIU, Guoshan YIN, Hailong ZHU, Bin CHEN, Jingjing NGO, Chong-wah JIANG, Yu-Gang |
author_facet |
LIU, Guoshan YIN, Hailong ZHU, Bin CHEN, Jingjing NGO, Chong-wah JIANG, Yu-Gang |
author_sort |
LIU, Guoshan |
title |
Retrieval augmented recipe generation |
title_short |
Retrieval augmented recipe generation |
title_full |
Retrieval augmented recipe generation |
title_fullStr |
Retrieval augmented recipe generation |
title_full_unstemmed |
Retrieval augmented recipe generation |
title_sort |
retrieval augmented recipe generation |
publisher |
Institutional Knowledge at Singapore Management University |
publishDate |
2025 |
url |
https://ink.library.smu.edu.sg/sis_research/9824 https://ink.library.smu.edu.sg/context/sis_research/article/10824/viewcontent/WACV_2025_Author_Kit_RARG.pdf |
_version_ |
1820027792127950848 |