Multi-modal cooking workflow construction for food recipes
Understanding food recipe requires anticipating the implicit causal effects of cooking actions, such that the recipe can be converted into a graph describing the temporal workflow of the recipe. This is a non-trivial task that involves common-sense reasoning. However, existing efforts rely on hand-c...
Saved in:
Main Authors: | , , , , , , , |
---|---|
Format: | text |
Language: | English |
Published: |
Institutional Knowledge at Singapore Management University
2020
|
Subjects: | |
Online Access: | https://ink.library.smu.edu.sg/sis_research/6464 https://ink.library.smu.edu.sg/context/sis_research/article/7467/viewcontent/3394171.3413765.pdf |
Tags: |
Add Tag
No Tags, Be the first to tag this record!
|
Institution: | Singapore Management University |
Language: | English |
id |
sg-smu-ink.sis_research-7467 |
---|---|
record_format |
dspace |
spelling |
sg-smu-ink.sis_research-74672022-01-10T06:06:54Z Multi-modal cooking workflow construction for food recipes PAN, Liangming CHEN, Jingjing WU, Jianlong LIU, Shaoteng NGO, Chong-wah KAN, Min-Yen JIANG, Yugang CHUA, Tat-Seng Understanding food recipe requires anticipating the implicit causal effects of cooking actions, such that the recipe can be converted into a graph describing the temporal workflow of the recipe. This is a non-trivial task that involves common-sense reasoning. However, existing efforts rely on hand-crafted features to extract the workflow graph from recipes due to the lack of large-scale labeled datasets. Moreover, they fail to utilize the cooking images, which constitute an important part of food recipes. In this paper, we build MM-ReS, the first large-scale dataset for cooking workflow construction, consisting of 9,850 recipes with human-labeled workflow graphs. Cooking steps are multi-modal, featuring both text instructions and cooking images. We then propose a neural encoder–decoder model that utilizes both visual and textual information to construct the cooking workflow, which achieved over 20% performance gain over existing hand-crafted baselines. 2020-10-01T07:00:00Z text application/pdf https://ink.library.smu.edu.sg/sis_research/6464 info:doi/10.1145/3394171.3413765 https://ink.library.smu.edu.sg/context/sis_research/article/7467/viewcontent/3394171.3413765.pdf http://creativecommons.org/licenses/by-nc-nd/4.0/ Research Collection School Of Computing and Information Systems eng Institutional Knowledge at Singapore Management University cause-and-effect reasoning cooking workflow deep learning food recipes mm-res dataset multi-modal fusion Databases and Information Systems Graphics and Human Computer Interfaces |
institution |
Singapore Management University |
building |
SMU Libraries |
continent |
Asia |
country |
Singapore Singapore |
content_provider |
SMU Libraries |
collection |
InK@SMU |
language |
English |
topic |
cause-and-effect reasoning cooking workflow deep learning food recipes mm-res dataset multi-modal fusion Databases and Information Systems Graphics and Human Computer Interfaces |
spellingShingle |
cause-and-effect reasoning cooking workflow deep learning food recipes mm-res dataset multi-modal fusion Databases and Information Systems Graphics and Human Computer Interfaces PAN, Liangming CHEN, Jingjing WU, Jianlong LIU, Shaoteng NGO, Chong-wah KAN, Min-Yen JIANG, Yugang CHUA, Tat-Seng Multi-modal cooking workflow construction for food recipes |
description |
Understanding food recipe requires anticipating the implicit causal effects of cooking actions, such that the recipe can be converted into a graph describing the temporal workflow of the recipe. This is a non-trivial task that involves common-sense reasoning. However, existing efforts rely on hand-crafted features to extract the workflow graph from recipes due to the lack of large-scale labeled datasets. Moreover, they fail to utilize the cooking images, which constitute an important part of food recipes. In this paper, we build MM-ReS, the first large-scale dataset for cooking workflow construction, consisting of 9,850 recipes with human-labeled workflow graphs. Cooking steps are multi-modal, featuring both text instructions and cooking images. We then propose a neural encoder–decoder model that utilizes both visual and textual information to construct the cooking workflow, which achieved over 20% performance gain over existing hand-crafted baselines. |
format |
text |
author |
PAN, Liangming CHEN, Jingjing WU, Jianlong LIU, Shaoteng NGO, Chong-wah KAN, Min-Yen JIANG, Yugang CHUA, Tat-Seng |
author_facet |
PAN, Liangming CHEN, Jingjing WU, Jianlong LIU, Shaoteng NGO, Chong-wah KAN, Min-Yen JIANG, Yugang CHUA, Tat-Seng |
author_sort |
PAN, Liangming |
title |
Multi-modal cooking workflow construction for food recipes |
title_short |
Multi-modal cooking workflow construction for food recipes |
title_full |
Multi-modal cooking workflow construction for food recipes |
title_fullStr |
Multi-modal cooking workflow construction for food recipes |
title_full_unstemmed |
Multi-modal cooking workflow construction for food recipes |
title_sort |
multi-modal cooking workflow construction for food recipes |
publisher |
Institutional Knowledge at Singapore Management University |
publishDate |
2020 |
url |
https://ink.library.smu.edu.sg/sis_research/6464 https://ink.library.smu.edu.sg/context/sis_research/article/7467/viewcontent/3394171.3413765.pdf |
_version_ |
1770575967560925184 |