R2GAN: Cross-modal recipe retrieval with generative adversarial network

Representing procedure text such as recipe for crossmodal retrieval is inherently a difficult problem, not mentioning to generate image from recipe for visualization. This paper studies a new version of GAN, named Recipe Retrieval Generative Adversarial Network (R2GAN), to explore the feasibility of...

Full description

Saved in:
Bibliographic Details
Main Authors: ZHU, Bin, NGO, Chong-wah, CHEN, Jingjing, HAO, Yanbin
Format: text
Language:English
Published: Institutional Knowledge at Singapore Management University 2019
Subjects:
Online Access:https://ink.library.smu.edu.sg/sis_research/6456
https://ink.library.smu.edu.sg/context/sis_research/article/7459/viewcontent/Zhu_R2GAN_Cross_Modal_Recipe_Retrieval_With_Generative_Adversarial_Network_CVPR_2019_paper.pdf
Tags: Add Tag
No Tags, Be the first to tag this record!
Institution: Singapore Management University
Language: English
id sg-smu-ink.sis_research-7459
record_format dspace
spelling sg-smu-ink.sis_research-74592022-01-10T06:12:07Z R2GAN: Cross-modal recipe retrieval with generative adversarial network ZHU, Bin NGO, Chong-wah CHEN, Jingjing HAO, Yanbin Representing procedure text such as recipe for crossmodal retrieval is inherently a difficult problem, not mentioning to generate image from recipe for visualization. This paper studies a new version of GAN, named Recipe Retrieval Generative Adversarial Network (R2GAN), to explore the feasibility of generating image from procedure text for retrieval problem. The motivation of using GAN is twofold: learning compatible cross-modal features in an adversarial way, and explanation of search results by showing the images generated from recipes. The novelty of R2GAN comes from architecture design, specifically a GAN with one generator and dual discriminators is used, which makes the generation of image from recipe a feasible idea. Furthermore, empowered by the generated images, a two-level ranking loss in both embedding and image spaces are considered. These add-ons not only result in excellent retrieval performance, but also generate close-to-realistic food images useful for explaining ranking of recipes. On recipe1M dataset, R2GAN demonstrates high scalability to data size, outperforms all the existing approaches, and generates images intuitive for human to interpret the search results. 2019-06-01T07:00:00Z text application/pdf https://ink.library.smu.edu.sg/sis_research/6456 info:doi/10.1109/CVPR.2019.01174 https://ink.library.smu.edu.sg/context/sis_research/article/7459/viewcontent/Zhu_R2GAN_Cross_Modal_Recipe_Retrieval_With_Generative_Adversarial_Network_CVPR_2019_paper.pdf http://creativecommons.org/licenses/by-nc-nd/4.0/ Research Collection School Of Computing and Information Systems eng Institutional Knowledge at Singapore Management University Categorization Image and Video Synthesis Recognition: Detection Representation Learning Retrieval; Vision + Language Data Storage Systems Graphics and Human Computer Interfaces OS and Networks
institution Singapore Management University
building SMU Libraries
continent Asia
country Singapore
Singapore
content_provider SMU Libraries
collection InK@SMU
language English
topic Categorization
Image and Video Synthesis
Recognition: Detection
Representation Learning
Retrieval; Vision + Language
Data Storage Systems
Graphics and Human Computer Interfaces
OS and Networks
spellingShingle Categorization
Image and Video Synthesis
Recognition: Detection
Representation Learning
Retrieval; Vision + Language
Data Storage Systems
Graphics and Human Computer Interfaces
OS and Networks
ZHU, Bin
NGO, Chong-wah
CHEN, Jingjing
HAO, Yanbin
R2GAN: Cross-modal recipe retrieval with generative adversarial network
description Representing procedure text such as recipe for crossmodal retrieval is inherently a difficult problem, not mentioning to generate image from recipe for visualization. This paper studies a new version of GAN, named Recipe Retrieval Generative Adversarial Network (R2GAN), to explore the feasibility of generating image from procedure text for retrieval problem. The motivation of using GAN is twofold: learning compatible cross-modal features in an adversarial way, and explanation of search results by showing the images generated from recipes. The novelty of R2GAN comes from architecture design, specifically a GAN with one generator and dual discriminators is used, which makes the generation of image from recipe a feasible idea. Furthermore, empowered by the generated images, a two-level ranking loss in both embedding and image spaces are considered. These add-ons not only result in excellent retrieval performance, but also generate close-to-realistic food images useful for explaining ranking of recipes. On recipe1M dataset, R2GAN demonstrates high scalability to data size, outperforms all the existing approaches, and generates images intuitive for human to interpret the search results.
format text
author ZHU, Bin
NGO, Chong-wah
CHEN, Jingjing
HAO, Yanbin
author_facet ZHU, Bin
NGO, Chong-wah
CHEN, Jingjing
HAO, Yanbin
author_sort ZHU, Bin
title R2GAN: Cross-modal recipe retrieval with generative adversarial network
title_short R2GAN: Cross-modal recipe retrieval with generative adversarial network
title_full R2GAN: Cross-modal recipe retrieval with generative adversarial network
title_fullStr R2GAN: Cross-modal recipe retrieval with generative adversarial network
title_full_unstemmed R2GAN: Cross-modal recipe retrieval with generative adversarial network
title_sort r2gan: cross-modal recipe retrieval with generative adversarial network
publisher Institutional Knowledge at Singapore Management University
publishDate 2019
url https://ink.library.smu.edu.sg/sis_research/6456
https://ink.library.smu.edu.sg/context/sis_research/article/7459/viewcontent/Zhu_R2GAN_Cross_Modal_Recipe_Retrieval_With_Generative_Adversarial_Network_CVPR_2019_paper.pdf
_version_ 1770575963858403328