Neural representation of future states in the mouse cortex

The ability to predict long-term future rewards is crucial for survival. Some animals may have to endure long periods without getting any reward and can plan and predict the future. How does the brain predict possible future states? We can conceive the above question in a reinforcement learning...

Full description

Saved in:
Bibliographic Details
Main Author: Ahmad Suhaimi Bin Ahmad Ishak
Other Authors: Hiroshi Makino
Format: Thesis-Doctor of Philosophy
Language:English
Published: Nanyang Technological University 2023
Subjects:
Online Access:https://hdl.handle.net/10356/171355
Tags: Add Tag
No Tags, Be the first to tag this record!
Institution: Nanyang Technological University
Language: English
Description
Summary:The ability to predict long-term future rewards is crucial for survival. Some animals may have to endure long periods without getting any reward and can plan and predict the future. How does the brain predict possible future states? We can conceive the above question in a reinforcement learning framework. Reinforcement learning is mapping the states with appropriate actions that maximize cumulative reward. Reinforcement learning can be solved by deriving the value function, that is, the lasting appeal of being in each state. There are two prominent families of algorithms that are frequently used to solve value, namely, the model based and model-free methods. Model-based methods solve value by incorporating an internal model of the environment. It requires a significant amount of computational resource to compute but flexible and adaptable to environmental changes. On the other hand, model-free algorithms estimate value directly from experience. It is computationally cheaper but inflexible and less adaptable to environmental changes. Neural correlates for model-free algorithms are extensively studied. The phasic activity of dopaminergic neurons in the basal ganglia has been directly linked to the reward prediction error computed in temporal difference learning algorithms. The neural signatures of model-based methods, however, remained elusive. Yet, several lesion studies have suggested that the dopamine pathway implicated in model-free methods may also play a role in model-based behaviours. This led to the hypothesis that the brain might use a third approach, namely the successor representation. The successor representation is a tabular and discriminative representation of discounted future state occupancies conditioned on a current state. In other words, it is a predictive map that informs the probability of visiting a future state, given the current state. The successor representation algorithm is learned through a model free method and has behavioural flexibility akin to a model-based approach. Recent studies have proposed neural substrates for the successor representation algorithm in the hippocampus. They theorized that place cells in the hippocampus do not necessarily encode place per se but rather a retrodictive map derived from the successor representation matrix. However, the tabular nature of the successor representation makes it unlikely that the brain would adopt such an algorithm as the cost to build the table would exponentially increase with increasing dimensionality of the state space. Here, I proposed using the γ-model, a recently developed generative and continuous analogue to the successor representation algorithm that outputs a probability map of future states, given a current state and action. Unlike the successor representation, the γ-model does not rely on the tabular representation. The γ-model amalgamates the advantages of model-based and model-free reinforcement learning algorithms. Like a model-based method, the γ-model is flexible to environmental changes (e.g., change in the reward function). Like a model-free method, the γ-model is conditioned on the policy and contains information about the long-term future. The γ-model makes an excellent candidate to study how the brain predicts future states. In Chapter 1, I reviewed the literature on reinforcement learning, solving reinforcement learning problems by estimating values using various methods, outlined literature on the neural correlates of reinforcement learning in the brain, the problems in the field, and how I aim to address these issues with my hypothesis and specific aims. Next, I give a detailed method and materials used in this study in Chapter 2. Chapter 3 details the findings of this study. I first introduce the novel behavioural paradigm compatible with the γ-model in the context of reinforcement learning. Next, I showed that the γ-model could be developed and used to predict future states. Lastly, I attempt to identify neural correlates in the cortex of mice that correspond to the predictive features of the γ-model. I showed that spatial representations in the cortex of mice correlate with the predictive features of the γ model, which is similar to the theories postulated in the hippocampus. Lastly, I will discuss the findings in Chapter 4 and conclude the thesis in Chapter 5. The conclusions of this thesis will shed light on the neural mechanism of prediction in the mammalian brain