Object-aware vision and language navigation for domestic robots

Vision and Language Navigation (VLN) problem demands a robot to navigate accurately by combining the natural language instruction and the visual perception of surrounding environment. Seamlessly combining and matching of textual instructions with visual features is challenging due to various entity...

Full description

Saved in:
Bibliographic Details
Main Author: Zhao, Weiyi
Other Authors: Wang Dan Wei
Format: Thesis-Master by Coursework
Language:English
Published: Nanyang Technological University 2022
Subjects:
Online Access:https://hdl.handle.net/10356/163793
Tags: Add Tag
No Tags, Be the first to tag this record!
Institution: Nanyang Technological University
Language: English
id sg-ntu-dr.10356-163793
record_format dspace
spelling sg-ntu-dr.10356-1637932023-07-04T17:55:19Z Object-aware vision and language navigation for domestic robots Zhao, Weiyi Wang Dan Wei School of Electrical and Electronic Engineering EDWWANG@ntu.edu.sg Engineering::Electrical and electronic engineering Vision and Language Navigation (VLN) problem demands a robot to navigate accurately by combining the natural language instruction and the visual perception of surrounding environment. Seamlessly combining and matching of textual instructions with visual features is challenging due to various entity clues, such as scene, object and direction, contained in both modal features. Based on the previous work \cite{entity}, we enrich the input feature information of the LSTM network by adding object features with different strategies to infer the state of the robot and propose OVLN (Object-aware Vision and Language Navigation) model. In OVLN, the addition of object features allow robot to be object aware and minimize the loss of visual information. The attention mechanism has been used to extract the specialized contexts and relational contexts of object, scene and direction for the language. Then a visual attention graph is constructed to obtain the entity aspects from vision to derive the navigation action. The model is trained on the Room-to-Room (R2R) dataset with a hierarchical structure. After the first stage training with imitation and reinforcement learning, the augmentation data is leveraged to fine-tune the model in the second stage to improve the generalizability. Experimental results show that OVLN improves both the successful rate (SR) and the successful rate weighted by path length (SPL) compared with previous methods. Meanwhile, OVLN alleviates the overshoot problem for the existing works, benefiting from the object awareness. Master of Science (Computer Control and Automation) 2022-12-17T15:02:36Z 2022-12-17T15:02:36Z 2022 Thesis-Master by Coursework Zhao, W. (2022). Object-aware vision and language navigation for domestic robots. Master's thesis, Nanyang Technological University, Singapore. https://hdl.handle.net/10356/163793 https://hdl.handle.net/10356/163793 en ISM-DISS-02859 application/pdf Nanyang Technological University
institution Nanyang Technological University
building NTU Library
continent Asia
country Singapore
Singapore
content_provider NTU Library
collection DR-NTU
language English
topic Engineering::Electrical and electronic engineering
spellingShingle Engineering::Electrical and electronic engineering
Zhao, Weiyi
Object-aware vision and language navigation for domestic robots
description Vision and Language Navigation (VLN) problem demands a robot to navigate accurately by combining the natural language instruction and the visual perception of surrounding environment. Seamlessly combining and matching of textual instructions with visual features is challenging due to various entity clues, such as scene, object and direction, contained in both modal features. Based on the previous work \cite{entity}, we enrich the input feature information of the LSTM network by adding object features with different strategies to infer the state of the robot and propose OVLN (Object-aware Vision and Language Navigation) model. In OVLN, the addition of object features allow robot to be object aware and minimize the loss of visual information. The attention mechanism has been used to extract the specialized contexts and relational contexts of object, scene and direction for the language. Then a visual attention graph is constructed to obtain the entity aspects from vision to derive the navigation action. The model is trained on the Room-to-Room (R2R) dataset with a hierarchical structure. After the first stage training with imitation and reinforcement learning, the augmentation data is leveraged to fine-tune the model in the second stage to improve the generalizability. Experimental results show that OVLN improves both the successful rate (SR) and the successful rate weighted by path length (SPL) compared with previous methods. Meanwhile, OVLN alleviates the overshoot problem for the existing works, benefiting from the object awareness.
author2 Wang Dan Wei
author_facet Wang Dan Wei
Zhao, Weiyi
format Thesis-Master by Coursework
author Zhao, Weiyi
author_sort Zhao, Weiyi
title Object-aware vision and language navigation for domestic robots
title_short Object-aware vision and language navigation for domestic robots
title_full Object-aware vision and language navigation for domestic robots
title_fullStr Object-aware vision and language navigation for domestic robots
title_full_unstemmed Object-aware vision and language navigation for domestic robots
title_sort object-aware vision and language navigation for domestic robots
publisher Nanyang Technological University
publishDate 2022
url https://hdl.handle.net/10356/163793
_version_ 1772825959943110656