Curriculum learning improves compositionality of reinforcement learning agent across concept classes

The compositional structure afforded by language allows humans to decompose complex phrases and map them to novel visual concepts, demonstrating flexible intelligence. Although there have been several algorithms that can demonstrate compositionality, they do not give us insights on how humans learn...

Full description

Saved in:
Bibliographic Details
Main Author: Lin, Zijun
Other Authors: Wen Bihan
Format: Final Year Project
Language:English
Published: Nanyang Technological University 2024
Subjects:
Online Access:https://hdl.handle.net/10356/176294
Tags: Add Tag
No Tags, Be the first to tag this record!
Institution: Nanyang Technological University
Language: English
Description
Summary:The compositional structure afforded by language allows humans to decompose complex phrases and map them to novel visual concepts, demonstrating flexible intelligence. Although there have been several algorithms that can demonstrate compositionality, they do not give us insights on how humans learn to compose concept classes to ground visual cues. To study this multi-modal learning problem, we created a 3-dimensional environment, where a reinforcement learning agent has to navigate to a location specified by a natural language phrase (instruction). The instruction is composed of nouns, attributes and additionally, determiners or prepositions. This visual grounding task increases the compositional complexity for reinforcement learning agents, as navigating to the blue cubes above some red spheres will not be rewarded when the instruction is to navigate to “some blue cubes below the red sphere”. We first demonstrate that reinforcement learning agents can ground determiner concepts to visual scenes but struggle to ground the more complex preposition concepts. Secondly, we show that curriculum learning, a strategy employed by humans, improves concept learning efficiency by reducing the total number of training episodes needed to achieve a certain performance criterion by 15% in determiner environment. Moreover, it enables the agents to learn the preposition concepts. Lastly, we establish that agents trained on determiner or preposition concepts can decompose held-out test instructions, and also rapidly map their navigation policies to unseen visual object combinations. Various text encoders are also being compared to see whether they could facilitate the agents’ training. To conclude, our results clarify that multi-modal reinforcement learning agents can achieve compositional understanding of complex concept classes, and demonstrate the effectiveness of human-like learning strategies to improve the learning efficiency for artificial systems.