Neural representation of future states in the mouse cortex
The ability to predict long-term future rewards is crucial for survival. Some animals may have to endure long periods without getting any reward and can plan and predict the future. How does the brain predict possible future states? We can conceive the above question in a reinforcement learning...
Saved in:
Main Author: | |
---|---|
Other Authors: | |
Format: | Thesis-Doctor of Philosophy |
Language: | English |
Published: |
Nanyang Technological University
2023
|
Subjects: | |
Online Access: | https://hdl.handle.net/10356/171355 |
Tags: |
Add Tag
No Tags, Be the first to tag this record!
|
Institution: | Nanyang Technological University |
Language: | English |
Summary: | The ability to predict long-term future rewards is crucial for survival. Some animals
may have to endure long periods without getting any reward and can plan and
predict the future. How does the brain predict possible future states?
We can conceive the above question in a reinforcement learning framework.
Reinforcement learning is mapping the states with appropriate actions that maximize
cumulative reward. Reinforcement learning can be solved by deriving the value
function, that is, the lasting appeal of being in each state. There are two prominent
families of algorithms that are frequently used to solve value, namely, the model based and model-free methods. Model-based methods solve value by incorporating
an internal model of the environment. It requires a significant amount of
computational resource to compute but flexible and adaptable to environmental
changes. On the other hand, model-free algorithms estimate value directly from
experience. It is computationally cheaper but inflexible and less adaptable to
environmental changes.
Neural correlates for model-free algorithms are extensively studied. The phasic
activity of dopaminergic neurons in the basal ganglia has been directly linked to the
reward prediction error computed in temporal difference learning algorithms. The
neural signatures of model-based methods, however, remained elusive. Yet, several
lesion studies have suggested that the dopamine pathway implicated in model-free
methods may also play a role in model-based behaviours. This led to the hypothesis
that the brain might use a third approach, namely the successor representation.
The successor representation is a tabular and discriminative representation of
discounted future state occupancies conditioned on a current state. In other words, it
is a predictive map that informs the probability of visiting a future state, given the
current state. The successor representation algorithm is learned through a model free method and has behavioural flexibility akin to a model-based approach. Recent
studies have proposed neural substrates for the successor representation algorithm
in the hippocampus. They theorized that place cells in the hippocampus do not
necessarily encode place per se but rather a retrodictive map derived from the
successor representation matrix. However, the tabular nature of the successor
representation makes it unlikely that the brain would adopt such an algorithm as the
cost to build the table would exponentially increase with increasing dimensionality of
the state space. Here, I proposed using the γ-model, a recently developed
generative and continuous analogue to the successor representation algorithm that
outputs a probability map of future states, given a current state and action. Unlike the
successor representation, the γ-model does not rely on the tabular representation.
The γ-model amalgamates the advantages of model-based and model-free
reinforcement learning algorithms. Like a model-based method, the γ-model is
flexible to environmental changes (e.g., change in the reward function). Like a
model-free method, the γ-model is conditioned on the policy and contains
information about the long-term future. The γ-model makes an excellent candidate to
study how the brain predicts future states.
In Chapter 1, I reviewed the literature on reinforcement learning, solving
reinforcement learning problems by estimating values using various methods,
outlined literature on the neural correlates of reinforcement learning in the brain, the
problems in the field, and how I aim to address these issues with my hypothesis and
specific aims. Next, I give a detailed method and materials used in this study in
Chapter 2. Chapter 3 details the findings of this study. I first introduce the novel
behavioural paradigm compatible with the γ-model in the context of reinforcement
learning. Next, I showed that the γ-model could be developed and used to predict
future states. Lastly, I attempt to identify neural correlates in the cortex of mice that
correspond to the predictive features of the γ-model. I showed that spatial
representations in the cortex of mice correlate with the predictive features of the γ model, which is similar to the theories postulated in the hippocampus. Lastly, I will
discuss the findings in Chapter 4 and conclude the thesis in Chapter 5. The
conclusions of this thesis will shed light on the neural mechanism of prediction in the
mammalian brain |
---|