Embodied object hunt
This study investigates the use of multimodal encoders in the Embodied Object Hunt task. The motivation behind this approach is recent developments in joint multimodal encoders such as CLIP that are able to extract common features between images and text. This ability is ideal for tasks combining...
Saved in:
Main Author: | |
---|---|
Other Authors: | |
Format: | Final Year Project |
Language: | English |
Published: |
Nanyang Technological University
2024
|
Subjects: | |
Online Access: | https://hdl.handle.net/10356/175084 |
Tags: |
Add Tag
No Tags, Be the first to tag this record!
|
Institution: | Nanyang Technological University |
Language: | English |
id |
sg-ntu-dr.10356-175084 |
---|---|
record_format |
dspace |
spelling |
sg-ntu-dr.10356-1750842024-04-19T15:45:51Z Embodied object hunt Kam, Rainer I-Wen Cham Tat Jen School of Computer Science and Engineering ASTJCham@ntu.edu.sg Computer and Information Science This study investigates the use of multimodal encoders in the Embodied Object Hunt task. The motivation behind this approach is recent developments in joint multimodal encoders such as CLIP that are able to extract common features between images and text. This ability is ideal for tasks combining imagery and text, such as the Embodied Object Hunt using visual observations and textual input prompts. This study also explores using intrinsic curiosity rewards to supplement agent learning, encouraging agents to explore their environment and facilitate learning. This study compares agents trained using CLIP embeddings and intrinsic curiosity and those without, and analyzes the key differences between their training results. The results of this study can be used to understand the effectiveness and feasibility of using different approaches to train embodied agents, serving as an exploratory study that future improvements can be based upon. Bachelor's degree 2024-04-19T04:33:05Z 2024-04-19T04:33:05Z 2024 Final Year Project (FYP) Kam, R. I. (2024). Embodied object hunt. Final Year Project (FYP), Nanyang Technological University, Singapore. https://hdl.handle.net/10356/175084 https://hdl.handle.net/10356/175084 en SCSE23-0037 application/pdf Nanyang Technological University |
institution |
Nanyang Technological University |
building |
NTU Library |
continent |
Asia |
country |
Singapore Singapore |
content_provider |
NTU Library |
collection |
DR-NTU |
language |
English |
topic |
Computer and Information Science |
spellingShingle |
Computer and Information Science Kam, Rainer I-Wen Embodied object hunt |
description |
This study investigates the use of multimodal encoders in the Embodied Object Hunt task. The
motivation behind this approach is recent developments in joint multimodal encoders such as
CLIP that are able to extract common features between images and text. This ability is ideal for
tasks combining imagery and text, such as the Embodied Object Hunt using visual observations
and textual input prompts. This study also explores using intrinsic curiosity rewards to
supplement agent learning, encouraging agents to explore their environment and facilitate
learning. This study compares agents trained using CLIP embeddings and intrinsic curiosity and
those without, and analyzes the key differences between their training results. The results of this
study can be used to understand the effectiveness and feasibility of using different approaches
to train embodied agents, serving as an exploratory study that future improvements can be
based upon. |
author2 |
Cham Tat Jen |
author_facet |
Cham Tat Jen Kam, Rainer I-Wen |
format |
Final Year Project |
author |
Kam, Rainer I-Wen |
author_sort |
Kam, Rainer I-Wen |
title |
Embodied object hunt |
title_short |
Embodied object hunt |
title_full |
Embodied object hunt |
title_fullStr |
Embodied object hunt |
title_full_unstemmed |
Embodied object hunt |
title_sort |
embodied object hunt |
publisher |
Nanyang Technological University |
publishDate |
2024 |
url |
https://hdl.handle.net/10356/175084 |
_version_ |
1806059781248516096 |