Object-aware vision and language navigation for domestic robots
Vision and Language Navigation (VLN) problem demands a robot to navigate accurately by combining the natural language instruction and the visual perception of surrounding environment. Seamlessly combining and matching of textual instructions with visual features is challenging due to various entity...
Saved in:
Main Author: | |
---|---|
Other Authors: | |
Format: | Thesis-Master by Coursework |
Language: | English |
Published: |
Nanyang Technological University
2022
|
Subjects: | |
Online Access: | https://hdl.handle.net/10356/163793 |
Tags: |
Add Tag
No Tags, Be the first to tag this record!
|
Institution: | Nanyang Technological University |
Language: | English |
id |
sg-ntu-dr.10356-163793 |
---|---|
record_format |
dspace |
spelling |
sg-ntu-dr.10356-1637932023-07-04T17:55:19Z Object-aware vision and language navigation for domestic robots Zhao, Weiyi Wang Dan Wei School of Electrical and Electronic Engineering EDWWANG@ntu.edu.sg Engineering::Electrical and electronic engineering Vision and Language Navigation (VLN) problem demands a robot to navigate accurately by combining the natural language instruction and the visual perception of surrounding environment. Seamlessly combining and matching of textual instructions with visual features is challenging due to various entity clues, such as scene, object and direction, contained in both modal features. Based on the previous work \cite{entity}, we enrich the input feature information of the LSTM network by adding object features with different strategies to infer the state of the robot and propose OVLN (Object-aware Vision and Language Navigation) model. In OVLN, the addition of object features allow robot to be object aware and minimize the loss of visual information. The attention mechanism has been used to extract the specialized contexts and relational contexts of object, scene and direction for the language. Then a visual attention graph is constructed to obtain the entity aspects from vision to derive the navigation action. The model is trained on the Room-to-Room (R2R) dataset with a hierarchical structure. After the first stage training with imitation and reinforcement learning, the augmentation data is leveraged to fine-tune the model in the second stage to improve the generalizability. Experimental results show that OVLN improves both the successful rate (SR) and the successful rate weighted by path length (SPL) compared with previous methods. Meanwhile, OVLN alleviates the overshoot problem for the existing works, benefiting from the object awareness. Master of Science (Computer Control and Automation) 2022-12-17T15:02:36Z 2022-12-17T15:02:36Z 2022 Thesis-Master by Coursework Zhao, W. (2022). Object-aware vision and language navigation for domestic robots. Master's thesis, Nanyang Technological University, Singapore. https://hdl.handle.net/10356/163793 https://hdl.handle.net/10356/163793 en ISM-DISS-02859 application/pdf Nanyang Technological University |
institution |
Nanyang Technological University |
building |
NTU Library |
continent |
Asia |
country |
Singapore Singapore |
content_provider |
NTU Library |
collection |
DR-NTU |
language |
English |
topic |
Engineering::Electrical and electronic engineering |
spellingShingle |
Engineering::Electrical and electronic engineering Zhao, Weiyi Object-aware vision and language navigation for domestic robots |
description |
Vision and Language Navigation (VLN) problem demands a robot to navigate accurately by combining the natural language instruction and the visual perception of surrounding environment. Seamlessly combining and matching of textual instructions with visual features is challenging due to various entity clues, such as scene, object and direction, contained in both modal features. Based on the previous work \cite{entity}, we enrich the input feature information of the LSTM network by adding object features with different strategies to infer the state of the robot and propose OVLN (Object-aware Vision and Language Navigation) model. In OVLN, the addition of object features allow robot to be object aware and minimize the loss of visual information. The attention mechanism has been used to extract the specialized contexts and relational contexts of object, scene and direction for the language. Then a visual attention graph is constructed to obtain the entity aspects from vision to derive the navigation action. The model is trained on the Room-to-Room (R2R) dataset with a hierarchical structure. After the first stage training with imitation and reinforcement learning, the augmentation data is leveraged to fine-tune the model in the second stage to improve the generalizability. Experimental results show that OVLN improves both the successful rate (SR) and the successful rate weighted by path length (SPL) compared with previous methods. Meanwhile, OVLN alleviates the overshoot problem for the existing works, benefiting from the object awareness. |
author2 |
Wang Dan Wei |
author_facet |
Wang Dan Wei Zhao, Weiyi |
format |
Thesis-Master by Coursework |
author |
Zhao, Weiyi |
author_sort |
Zhao, Weiyi |
title |
Object-aware vision and language navigation for domestic robots |
title_short |
Object-aware vision and language navigation for domestic robots |
title_full |
Object-aware vision and language navigation for domestic robots |
title_fullStr |
Object-aware vision and language navigation for domestic robots |
title_full_unstemmed |
Object-aware vision and language navigation for domestic robots |
title_sort |
object-aware vision and language navigation for domestic robots |
publisher |
Nanyang Technological University |
publishDate |
2022 |
url |
https://hdl.handle.net/10356/163793 |
_version_ |
1772825959943110656 |