Decomposing generation networks with structure prediction for recipe generation

Recipe generation from food images and ingredients is a challenging task, which requires the interpretation of the information from another modality. Different from the image captioning task, where the captions usually have one sentence, cooking instructions contain multiple sentences and have obvio...

Full description

Saved in:
Bibliographic Details
Main Authors: Wang, Hao, Lin, Guosheng, Hoi, Steven C. H., Miao, Chunyan
Other Authors: School of Computer Science and Engineering
Format: Article
Language:English
Published: 2022
Subjects:
Online Access:https://hdl.handle.net/10356/156089
Tags: Add Tag
No Tags, Be the first to tag this record!
Institution: Nanyang Technological University
Language: English
id sg-ntu-dr.10356-156089
record_format dspace
spelling sg-ntu-dr.10356-1560892022-04-06T08:57:15Z Decomposing generation networks with structure prediction for recipe generation Wang, Hao Lin, Guosheng Hoi, Steven C. H. Miao, Chunyan School of Computer Science and Engineering Joint NTU-UBC Research Centre of Excellence in Active Living for the Elderly (LILY) Engineering::Computer science and engineering Text Generation Vision-and-Language Recipe generation from food images and ingredients is a challenging task, which requires the interpretation of the information from another modality. Different from the image captioning task, where the captions usually have one sentence, cooking instructions contain multiple sentences and have obvious structures. To help the model capture the recipe structure and avoid missing some cooking details, we propose a novel framework: Decomposing Generation Networks (DGN) with structure prediction, to get more structured and complete recipe generation outputs. Specifically, we split each cooking instruction into several phases, and assign different sub-generators to each phase. Our approach includes two novel ideas: (i) learning the recipe structures with the global structure prediction component and (ii) producing recipe phases in the sub-generator output component based on the predicted structure. Extensive experiments on the challenging large-scale Recipe1M dataset validate the effectiveness of our proposed model, which improves the performance over the state-of-the-art results. AI Singapore Ministry of Education (MOE) Ministry of Health (MOH) National Research Foundation (NRF) Submitted/Accepted version This research is supported, in part, by the National Research Foundation (NRF), Singapore under its AI Singapore Programme (AISG Award No: AISG-GC-2019-003) and under its NRF Investigatorship Programme (NRFI Award No. NRF-NRFI05-2019-0002). This research is also supported, in part, by the Singapore Ministry of Health under its National Innovation Challenge on Active and Confident Ageing (NIC Project No. MOH/NIC/COG04/2017 and MOH/NIC/HAIG03/2017), and the MOE Tier-1 research grants: RG28/18 (S) and RG22/19 (S). 2022-04-06T08:57:15Z 2022-04-06T08:57:15Z 2022 Journal Article Wang, H., Lin, G., Hoi, S. C. H. & Miao, C. (2022). Decomposing generation networks with structure prediction for recipe generation. Pattern Recognition, 126, 108578-. https://dx.doi.org/10.1016/j.patcog.2022.108578 0031-3203 https://hdl.handle.net/10356/156089 10.1016/j.patcog.2022.108578 2-s2.0-85124796277 126 108578 en AISG-GC-2019-003 NRF-NRFI05-2019-0002 MOH/NIC/COG04/2017 MOH/NIC/HAIG03/2017 RG28/18 (S) RG22/19 (S) Pattern Recognition © 2022 Elsevier Ltd. All rights reserved. This paper was published in Pattern Recognition and is made available with permission of Elsevier Ltd. application/pdf
institution Nanyang Technological University
building NTU Library
continent Asia
country Singapore
Singapore
content_provider NTU Library
collection DR-NTU
language English
topic Engineering::Computer science and engineering
Text Generation
Vision-and-Language
spellingShingle Engineering::Computer science and engineering
Text Generation
Vision-and-Language
Wang, Hao
Lin, Guosheng
Hoi, Steven C. H.
Miao, Chunyan
Decomposing generation networks with structure prediction for recipe generation
description Recipe generation from food images and ingredients is a challenging task, which requires the interpretation of the information from another modality. Different from the image captioning task, where the captions usually have one sentence, cooking instructions contain multiple sentences and have obvious structures. To help the model capture the recipe structure and avoid missing some cooking details, we propose a novel framework: Decomposing Generation Networks (DGN) with structure prediction, to get more structured and complete recipe generation outputs. Specifically, we split each cooking instruction into several phases, and assign different sub-generators to each phase. Our approach includes two novel ideas: (i) learning the recipe structures with the global structure prediction component and (ii) producing recipe phases in the sub-generator output component based on the predicted structure. Extensive experiments on the challenging large-scale Recipe1M dataset validate the effectiveness of our proposed model, which improves the performance over the state-of-the-art results.
author2 School of Computer Science and Engineering
author_facet School of Computer Science and Engineering
Wang, Hao
Lin, Guosheng
Hoi, Steven C. H.
Miao, Chunyan
format Article
author Wang, Hao
Lin, Guosheng
Hoi, Steven C. H.
Miao, Chunyan
author_sort Wang, Hao
title Decomposing generation networks with structure prediction for recipe generation
title_short Decomposing generation networks with structure prediction for recipe generation
title_full Decomposing generation networks with structure prediction for recipe generation
title_fullStr Decomposing generation networks with structure prediction for recipe generation
title_full_unstemmed Decomposing generation networks with structure prediction for recipe generation
title_sort decomposing generation networks with structure prediction for recipe generation
publishDate 2022
url https://hdl.handle.net/10356/156089
_version_ 1729789484516507648