Learning structural representations for recipe generation and food retrieval

Food is significant to human daily life. In this paper, we are interested in learning structural representations for lengthy recipes, that can benefit the recipe generation and food cross-modal retrieval tasks. Different from the common vision-language data, here the food images contain mixed ingred...

全面介紹

Saved in:

書目詳細資料
Main Authors:	Wang, Hao, Lin, Guosheng, Hoi, Steven C. H., Miao, Chunyan
其他作者:	School of Computer Science and Engineering
格式:	Article
語言:	English
出版:	2022
主題:	Engineering::Computer science and engineering Text Generation Vision-and-Language
在線閱讀:	https://hdl.handle.net/10356/162545
標簽:	添加標簽沒有標簽, 成為第一個標記此記錄!
機構:	Nanyang Technological University
語言:	English

id	sg-ntu-dr.10356-162545
record_format	dspace
spelling	sg-ntu-dr.10356-1625452023-05-26T15:36:31Z Learning structural representations for recipe generation and food retrieval Wang, Hao Lin, Guosheng Hoi, Steven C. H. Miao, Chunyan School of Computer Science and Engineering Engineering::Computer science and engineering Text Generation Vision-and-Language Food is significant to human daily life. In this paper, we are interested in learning structural representations for lengthy recipes, that can benefit the recipe generation and food cross-modal retrieval tasks. Different from the common vision-language data, here the food images contain mixed ingredients and target recipes are lengthy paragraphs, where we do not have annotations on structure information. To address the above limitations, we propose a novel method to unsupervisedly learn the sentence-level tree structures for the cooking recipes. Our approach brings together several novel ideas in a systematic framework: (1) exploiting an unsupervised learning approach to obtain the sentence-level tree structure labels before training; (2) generating trees of target recipes from images with the supervision of tree structure labels learned from (1); and (3) integrating the learned tree structures into the recipe generation and food cross-modal retrieval procedure. Our proposed model can produce good-quality sentence-level tree structures and coherent recipes. We achieve the state-of-the-art recipe generation and food cross-modal retrieval performance on the benchmark Recipe1M dataset. AI Singapore National Research Foundation (NRF) Submitted/Accepted version This research is supported, in part, by the National Research Foundation (NRF), Singapore under its AI Singapore Programme (AISG Award No: AISG-GC-2019-003) and under its NRF Investigatorship Programme (NRFI Award No. NRF-NRFI05-2019-0002). 2022-10-31T05:26:40Z 2022-10-31T05:26:40Z 2022 Journal Article Wang, H., Lin, G., Hoi, S. C. H. & Miao, C. (2022). Learning structural representations for recipe generation and food retrieval. IEEE Transactions On Pattern Analysis and Machine Intelligence, 45(3), 3363-3377. https://dx.doi.org/10.1109/TPAMI.2022.3181294 0162-8828 https://hdl.handle.net/10356/162545 10.1109/TPAMI.2022.3181294 35687622 2-s2.0-85132791075 3 45 3363 3377 en AISG-GC-2019-003 NRF-NRFI05-2019-0002 IEEE Transactions on Pattern Analysis and Machine Intelligence © 2022 IEEE. All rights reserved. Personal use of this material is permitted. Permission from IEEE must be obtained for all other uses, in any current or future media, including reprinting/republishing this material for advertising or promotional purposes, creating new collective works, for resale or redistribution to servers or lists, or reuse of any copyrighted component of this work in other works. The published version is available at: https://doi.org/10.1109/TPAMI.2022.3181294. application/pdf
institution	Nanyang Technological University
building	NTU Library
continent	Asia
country	Singapore Singapore
content_provider	NTU Library
collection	DR-NTU
language	English
topic	Engineering::Computer science and engineering Text Generation Vision-and-Language
spellingShingle	Engineering::Computer science and engineering Text Generation Vision-and-Language Wang, Hao Lin, Guosheng Hoi, Steven C. H. Miao, Chunyan Learning structural representations for recipe generation and food retrieval
description	Food is significant to human daily life. In this paper, we are interested in learning structural representations for lengthy recipes, that can benefit the recipe generation and food cross-modal retrieval tasks. Different from the common vision-language data, here the food images contain mixed ingredients and target recipes are lengthy paragraphs, where we do not have annotations on structure information. To address the above limitations, we propose a novel method to unsupervisedly learn the sentence-level tree structures for the cooking recipes. Our approach brings together several novel ideas in a systematic framework: (1) exploiting an unsupervised learning approach to obtain the sentence-level tree structure labels before training; (2) generating trees of target recipes from images with the supervision of tree structure labels learned from (1); and (3) integrating the learned tree structures into the recipe generation and food cross-modal retrieval procedure. Our proposed model can produce good-quality sentence-level tree structures and coherent recipes. We achieve the state-of-the-art recipe generation and food cross-modal retrieval performance on the benchmark Recipe1M dataset.
author2	School of Computer Science and Engineering
author_facet	School of Computer Science and Engineering Wang, Hao Lin, Guosheng Hoi, Steven C. H. Miao, Chunyan
format	Article
author	Wang, Hao Lin, Guosheng Hoi, Steven C. H. Miao, Chunyan
author_sort	Wang, Hao
title	Learning structural representations for recipe generation and food retrieval
title_short	Learning structural representations for recipe generation and food retrieval
title_full	Learning structural representations for recipe generation and food retrieval
title_fullStr	Learning structural representations for recipe generation and food retrieval
title_full_unstemmed	Learning structural representations for recipe generation and food retrieval
title_sort	learning structural representations for recipe generation and food retrieval
publishDate	2022
url	https://hdl.handle.net/10356/162545
_version_	1772827441279008768

Learning structural representations for recipe generation and food retrieval

相似書籍