Profit-maximizing sequential task allocation to a team of selfish agents with deep reinforcement learning

We study the problem of sequential task allocation among selfish agents through the lens of dynamic mechanism design framework. In this game, the manager has to maximize its own utility in face of a random team of selfish agents.The problem assumes a discrete-time setting in which each time step com...

Full description

Saved in:
Bibliographic Details
Main Author: Zhang, Shizhuo
Other Authors: Pun Chi Seng
Format: Final Year Project
Language:English
Published: Nanyang Technological University 2022
Subjects:
Online Access:https://hdl.handle.net/10356/157056
Tags: Add Tag
No Tags, Be the first to tag this record!
Institution: Nanyang Technological University
Language: English
id sg-ntu-dr.10356-157056
record_format dspace
spelling sg-ntu-dr.10356-1570562023-02-28T23:11:23Z Profit-maximizing sequential task allocation to a team of selfish agents with deep reinforcement learning Zhang, Shizhuo Pun Chi Seng School of Physical and Mathematical Sciences Nixie Sapphira Lesmana cspun@ntu.edu.sg Science::Mathematics We study the problem of sequential task allocation among selfish agents through the lens of dynamic mechanism design framework. In this game, the manager has to maximize its own utility in face of a random team of selfish agents.The problem assumes a discrete-time setting in which each time step comprises of two sub-procedures: 1) contracting, where the manager offers payments to ask each agent to pursue certain goals and agents decide on whether they are satisfied; and 2) acting. The complication of this set-up lies in that reporting is involved as in traditional mechanism design settings, and truthful revelation of hidden information is impossible. Meanwhile, the agents act in a high-dimensional space, adding to the difficulty of making proper assumptions and devising optimization algorithms. To this end, we leverage the power of deep reinforcement learning. It is necessary to model the agents’ hidden information for the manager to make correct decisions, while this makes the learning problem non-Markovian, causing complications in applying reinforcement learning algorithms. We proposed a framework to tackle historical dependency leveraging the strong representation learning capability of deep learning methods and gradient-based multi-task updates, allowing the RL-based manager to act in a Markov latent space. We proposed the use of successor-representation based intrinsic reward to encourage strategic exploration. We performed empirical studies in various game settings to demonstrate the power of our proposed framework. Bachelor of Science in Mathematical Sciences 2022-05-08T12:16:29Z 2022-05-08T12:16:29Z 2022 Final Year Project (FYP) Zhang, S. (2022). Profit-maximizing sequential task allocation to a team of selfish agents with deep reinforcement learning. Final Year Project (FYP), Nanyang Technological University, Singapore. https://hdl.handle.net/10356/157056 https://hdl.handle.net/10356/157056 en application/pdf Nanyang Technological University
institution Nanyang Technological University
building NTU Library
continent Asia
country Singapore
Singapore
content_provider NTU Library
collection DR-NTU
language English
topic Science::Mathematics
spellingShingle Science::Mathematics
Zhang, Shizhuo
Profit-maximizing sequential task allocation to a team of selfish agents with deep reinforcement learning
description We study the problem of sequential task allocation among selfish agents through the lens of dynamic mechanism design framework. In this game, the manager has to maximize its own utility in face of a random team of selfish agents.The problem assumes a discrete-time setting in which each time step comprises of two sub-procedures: 1) contracting, where the manager offers payments to ask each agent to pursue certain goals and agents decide on whether they are satisfied; and 2) acting. The complication of this set-up lies in that reporting is involved as in traditional mechanism design settings, and truthful revelation of hidden information is impossible. Meanwhile, the agents act in a high-dimensional space, adding to the difficulty of making proper assumptions and devising optimization algorithms. To this end, we leverage the power of deep reinforcement learning. It is necessary to model the agents’ hidden information for the manager to make correct decisions, while this makes the learning problem non-Markovian, causing complications in applying reinforcement learning algorithms. We proposed a framework to tackle historical dependency leveraging the strong representation learning capability of deep learning methods and gradient-based multi-task updates, allowing the RL-based manager to act in a Markov latent space. We proposed the use of successor-representation based intrinsic reward to encourage strategic exploration. We performed empirical studies in various game settings to demonstrate the power of our proposed framework.
author2 Pun Chi Seng
author_facet Pun Chi Seng
Zhang, Shizhuo
format Final Year Project
author Zhang, Shizhuo
author_sort Zhang, Shizhuo
title Profit-maximizing sequential task allocation to a team of selfish agents with deep reinforcement learning
title_short Profit-maximizing sequential task allocation to a team of selfish agents with deep reinforcement learning
title_full Profit-maximizing sequential task allocation to a team of selfish agents with deep reinforcement learning
title_fullStr Profit-maximizing sequential task allocation to a team of selfish agents with deep reinforcement learning
title_full_unstemmed Profit-maximizing sequential task allocation to a team of selfish agents with deep reinforcement learning
title_sort profit-maximizing sequential task allocation to a team of selfish agents with deep reinforcement learning
publisher Nanyang Technological University
publishDate 2022
url https://hdl.handle.net/10356/157056
_version_ 1759853302036561920