Object-aware vision and language navigation for domestic robots

Vision and Language Navigation (VLN) problem demands a robot to navigate accurately by combining the natural language instruction and the visual perception of surrounding environment. Seamlessly combining and matching of textual instructions with visual features is challenging due to various entity...

全面介紹

Saved in:
書目詳細資料
主要作者: Zhao, Weiyi
其他作者: Wang Dan Wei
格式: Thesis-Master by Coursework
語言:English
出版: Nanyang Technological University 2022
主題:
在線閱讀:https://hdl.handle.net/10356/163793
標簽: 添加標簽
沒有標簽, 成為第一個標記此記錄!
機構: Nanyang Technological University
語言: English
實物特徵
總結:Vision and Language Navigation (VLN) problem demands a robot to navigate accurately by combining the natural language instruction and the visual perception of surrounding environment. Seamlessly combining and matching of textual instructions with visual features is challenging due to various entity clues, such as scene, object and direction, contained in both modal features. Based on the previous work \cite{entity}, we enrich the input feature information of the LSTM network by adding object features with different strategies to infer the state of the robot and propose OVLN (Object-aware Vision and Language Navigation) model. In OVLN, the addition of object features allow robot to be object aware and minimize the loss of visual information. The attention mechanism has been used to extract the specialized contexts and relational contexts of object, scene and direction for the language. Then a visual attention graph is constructed to obtain the entity aspects from vision to derive the navigation action. The model is trained on the Room-to-Room (R2R) dataset with a hierarchical structure. After the first stage training with imitation and reinforcement learning, the augmentation data is leveraged to fine-tune the model in the second stage to improve the generalizability. Experimental results show that OVLN improves both the successful rate (SR) and the successful rate weighted by path length (SPL) compared with previous methods. Meanwhile, OVLN alleviates the overshoot problem for the existing works, benefiting from the object awareness.