Decomposing generation networks with structure prediction for recipe generation
Recipe generation from food images and ingredients is a challenging task, which requires the interpretation of the information from another modality. Different from the image captioning task, where the captions usually have one sentence, cooking instructions contain multiple sentences and have obvio...
Saved in:
Main Authors: | , , , |
---|---|
Other Authors: | |
Format: | Article |
Language: | English |
Published: |
2022
|
Subjects: | |
Online Access: | https://hdl.handle.net/10356/156089 |
Tags: |
Add Tag
No Tags, Be the first to tag this record!
|
Institution: | Nanyang Technological University |
Language: | English |
id |
sg-ntu-dr.10356-156089 |
---|---|
record_format |
dspace |
spelling |
sg-ntu-dr.10356-1560892022-04-06T08:57:15Z Decomposing generation networks with structure prediction for recipe generation Wang, Hao Lin, Guosheng Hoi, Steven C. H. Miao, Chunyan School of Computer Science and Engineering Joint NTU-UBC Research Centre of Excellence in Active Living for the Elderly (LILY) Engineering::Computer science and engineering Text Generation Vision-and-Language Recipe generation from food images and ingredients is a challenging task, which requires the interpretation of the information from another modality. Different from the image captioning task, where the captions usually have one sentence, cooking instructions contain multiple sentences and have obvious structures. To help the model capture the recipe structure and avoid missing some cooking details, we propose a novel framework: Decomposing Generation Networks (DGN) with structure prediction, to get more structured and complete recipe generation outputs. Specifically, we split each cooking instruction into several phases, and assign different sub-generators to each phase. Our approach includes two novel ideas: (i) learning the recipe structures with the global structure prediction component and (ii) producing recipe phases in the sub-generator output component based on the predicted structure. Extensive experiments on the challenging large-scale Recipe1M dataset validate the effectiveness of our proposed model, which improves the performance over the state-of-the-art results. AI Singapore Ministry of Education (MOE) Ministry of Health (MOH) National Research Foundation (NRF) Submitted/Accepted version This research is supported, in part, by the National Research Foundation (NRF), Singapore under its AI Singapore Programme (AISG Award No: AISG-GC-2019-003) and under its NRF Investigatorship Programme (NRFI Award No. NRF-NRFI05-2019-0002). This research is also supported, in part, by the Singapore Ministry of Health under its National Innovation Challenge on Active and Confident Ageing (NIC Project No. MOH/NIC/COG04/2017 and MOH/NIC/HAIG03/2017), and the MOE Tier-1 research grants: RG28/18 (S) and RG22/19 (S). 2022-04-06T08:57:15Z 2022-04-06T08:57:15Z 2022 Journal Article Wang, H., Lin, G., Hoi, S. C. H. & Miao, C. (2022). Decomposing generation networks with structure prediction for recipe generation. Pattern Recognition, 126, 108578-. https://dx.doi.org/10.1016/j.patcog.2022.108578 0031-3203 https://hdl.handle.net/10356/156089 10.1016/j.patcog.2022.108578 2-s2.0-85124796277 126 108578 en AISG-GC-2019-003 NRF-NRFI05-2019-0002 MOH/NIC/COG04/2017 MOH/NIC/HAIG03/2017 RG28/18 (S) RG22/19 (S) Pattern Recognition © 2022 Elsevier Ltd. All rights reserved. This paper was published in Pattern Recognition and is made available with permission of Elsevier Ltd. application/pdf |
institution |
Nanyang Technological University |
building |
NTU Library |
continent |
Asia |
country |
Singapore Singapore |
content_provider |
NTU Library |
collection |
DR-NTU |
language |
English |
topic |
Engineering::Computer science and engineering Text Generation Vision-and-Language |
spellingShingle |
Engineering::Computer science and engineering Text Generation Vision-and-Language Wang, Hao Lin, Guosheng Hoi, Steven C. H. Miao, Chunyan Decomposing generation networks with structure prediction for recipe generation |
description |
Recipe generation from food images and ingredients is a challenging task, which requires the interpretation of the information from another modality. Different from the image captioning task, where the captions usually have one sentence, cooking instructions contain multiple sentences and have obvious structures. To help the model capture the recipe structure and avoid missing some cooking details, we propose a novel framework: Decomposing Generation Networks (DGN) with structure prediction, to get more structured and complete recipe generation outputs. Specifically, we split each cooking instruction into several phases, and assign different sub-generators to each phase. Our approach includes two novel ideas: (i) learning the recipe structures with the global structure prediction component and (ii) producing recipe phases in the sub-generator output component based on the predicted structure. Extensive experiments on the challenging large-scale Recipe1M dataset validate the effectiveness of our proposed model, which improves the performance over the state-of-the-art results. |
author2 |
School of Computer Science and Engineering |
author_facet |
School of Computer Science and Engineering Wang, Hao Lin, Guosheng Hoi, Steven C. H. Miao, Chunyan |
format |
Article |
author |
Wang, Hao Lin, Guosheng Hoi, Steven C. H. Miao, Chunyan |
author_sort |
Wang, Hao |
title |
Decomposing generation networks with structure prediction for recipe generation |
title_short |
Decomposing generation networks with structure prediction for recipe generation |
title_full |
Decomposing generation networks with structure prediction for recipe generation |
title_fullStr |
Decomposing generation networks with structure prediction for recipe generation |
title_full_unstemmed |
Decomposing generation networks with structure prediction for recipe generation |
title_sort |
decomposing generation networks with structure prediction for recipe generation |
publishDate |
2022 |
url |
https://hdl.handle.net/10356/156089 |
_version_ |
1729789484516507648 |