Profit-maximizing sequential task allocation to a team of selfish agents with deep reinforcement learning

We study the problem of sequential task allocation among selfish agents through the lens of dynamic mechanism design framework. In this game, the manager has to maximize its own utility in face of a random team of selfish agents.The problem assumes a discrete-time setting in which each time step com...

Full description

Saved in:
Bibliographic Details
Main Author: Zhang, Shizhuo
Other Authors: Pun Chi Seng
Format: Final Year Project
Language:English
Published: Nanyang Technological University 2022
Subjects:
Online Access:https://hdl.handle.net/10356/157056
Tags: Add Tag
No Tags, Be the first to tag this record!
Institution: Nanyang Technological University
Language: English
Description
Summary:We study the problem of sequential task allocation among selfish agents through the lens of dynamic mechanism design framework. In this game, the manager has to maximize its own utility in face of a random team of selfish agents.The problem assumes a discrete-time setting in which each time step comprises of two sub-procedures: 1) contracting, where the manager offers payments to ask each agent to pursue certain goals and agents decide on whether they are satisfied; and 2) acting. The complication of this set-up lies in that reporting is involved as in traditional mechanism design settings, and truthful revelation of hidden information is impossible. Meanwhile, the agents act in a high-dimensional space, adding to the difficulty of making proper assumptions and devising optimization algorithms. To this end, we leverage the power of deep reinforcement learning. It is necessary to model the agents’ hidden information for the manager to make correct decisions, while this makes the learning problem non-Markovian, causing complications in applying reinforcement learning algorithms. We proposed a framework to tackle historical dependency leveraging the strong representation learning capability of deep learning methods and gradient-based multi-task updates, allowing the RL-based manager to act in a Markov latent space. We proposed the use of successor-representation based intrinsic reward to encourage strategic exploration. We performed empirical studies in various game settings to demonstrate the power of our proposed framework.