Towards improving system performance in large scale multi-agent systems with selfish agents

Intelligent agents are becoming increasingly prevalent in a wide variety of domains including but not limited to transportation, safety and security. To better utilize the intelligence, there has been increasing focus on frameworks and methods for coordinating these intelligent agents. This thesis i...

Full description

Saved in:
Bibliographic Details
Main Author: KUMAR, Rajiv Ranjan
Format: text
Language:English
Published: Institutional Knowledge at Singapore Management University 2022
Subjects:
Online Access:https://ink.library.smu.edu.sg/etd_coll/428
https://ink.library.smu.edu.sg/context/etd_coll/article/1426/viewcontent/GPIS_AY2017_PhD_Rajiv_Ranjan_Kumar.pdf
Tags: Add Tag
No Tags, Be the first to tag this record!
Institution: Singapore Management University
Language: English
id sg-smu-ink.etd_coll-1426
record_format dspace
institution Singapore Management University
building SMU Libraries
continent Asia
country Singapore
Singapore
content_provider SMU Libraries
collection InK@SMU
language English
topic Artificial Intelligence and Robotics
Graphics and Human Computer Interfaces
spellingShingle Artificial Intelligence and Robotics
Graphics and Human Computer Interfaces
KUMAR, Rajiv Ranjan
Towards improving system performance in large scale multi-agent systems with selfish agents
description Intelligent agents are becoming increasingly prevalent in a wide variety of domains including but not limited to transportation, safety and security. To better utilize the intelligence, there has been increasing focus on frameworks and methods for coordinating these intelligent agents. This thesis is specifically targeted at providing solution approaches for improving large scale multi-agent systems with selfish intelligent agents. In such systems, the performance of an agent depends on not just his/her own efforts, but also on other agent’s decisions. The complexity of interactions among multiple agents, coupled with the large scale nature of the problem domains and the uncertainties associated with the environment, make decision making very challenging. In this work, we specifically study the problem from the perspective of a centralized aggregator, that needs to maximize the revenue of the entire system.To that end, we study this problem from strategic and operational point of view. With regards to strategic decision making, we propose planning and deep reinforcement learning based solution algorithms to improve the system performance by optimizing the adaptive operating hours of selfish agents and by providing flexible work schedules to them. From operational point of view, we propose novel mechanism to incentivise selfish agents, so that performance of all the agents and the overall system improve . Basically, through strategic and operational decision making, we assist selfish agents in making intelligent decisions that results in improved system performance. In the first part of this thesis, we focus on making strategic decisions for the workers in the digital gig economy. To provide a concrete context, we focus on taxi drivers in the transport gig economy. Taxi fleets and car aggregation systems are an important component of the urban public transportation system. Taxis and cars in taxi fleets and car aggregation systems (e.g., Uber) are dependent on a large number of self-controlled and profitdriven taxi drivers, which introduces inefficiencies in the system. There are two ways in which taxi fleet performance can be optimized: (i) Operational decision making: improve assignment of taxis/cars to customers, while accounting for future demand; (ii) strategic decision making: optimize operating hours of (taxi and car) drivers. Existing research has primarily focused on the operational decisions in (i) and we focus on the strategic decisions in (ii). We first model this complex real world decision making problem (with thousands of taxi drivers) as a multi-stage stochastic congestion game with a non dedicated set of agents (i.e., agents start operation at a random stage and exit the game after a fixed time), where there is a dynamic population of agents (constrained by the maximum number of drivers). We provide planning and learning methods for computing the ideal operating hours in such a game, so as to improve efficiency of the overall fleet. In our experimental results, we demonstrate that our planning based approach provides up to 16% improvement in revenue over existing method on a real world taxi dataset. The learning based approach further improves the performance and achieves up to 10% more revenue than the planning approach. In second part of this thesis, We focus on: a) addressing the problem of handling schedule constraints of individual agents (e.g., breaks during work hours) to provide a flexible work schedule for them; and b) provide a scalable solution approach in such large scale problem settings. We introduced a simulation based (faster) equilibrium computation method that relies on policy imputation. We studied and analyzed different imputation methods and show that a good imputation method coupled with a well designed simulation based best response computation can help in achieving better symmetric equilibrium for large scale systems, in a time efficient manner. We demonstrate that our methods provide significantly better policies than the previous approach in terms of improving individual agent revenue and overall agent availability. In the third/final part of the thesis, we focus of operational decision making, where we improve system performance by inducing cooperation among selfish agents. Here we focus on principal-agent problem setting. Principalagent relationships, where a principal employs several agents to accomplish tasks on its behalf, are prevalent in many domains (e.g., Manufacturer distributors for product distribution, Uber-taxi drivers for transportation, FoodPanda-delivery personnel for food delivery). Principal has a global observation on all the tasks, while agents only have local observations with regards to local tasks. This limited observability coupled with selfish interest of agents results in a misalignment between Principal and agents objectives. We provide Multi-Agent Reinforcement Learning (MARL) approaches for sequentially designing incentives that improves objectives for principal and agents. We demonstrate that our approaches are able to outperform the state of art approaches for sequential incentive design on Escape-Room and adapted StarCraft-2 environments.
format text
author KUMAR, Rajiv Ranjan
author_facet KUMAR, Rajiv Ranjan
author_sort KUMAR, Rajiv Ranjan
title Towards improving system performance in large scale multi-agent systems with selfish agents
title_short Towards improving system performance in large scale multi-agent systems with selfish agents
title_full Towards improving system performance in large scale multi-agent systems with selfish agents
title_fullStr Towards improving system performance in large scale multi-agent systems with selfish agents
title_full_unstemmed Towards improving system performance in large scale multi-agent systems with selfish agents
title_sort towards improving system performance in large scale multi-agent systems with selfish agents
publisher Institutional Knowledge at Singapore Management University
publishDate 2022
url https://ink.library.smu.edu.sg/etd_coll/428
https://ink.library.smu.edu.sg/context/etd_coll/article/1426/viewcontent/GPIS_AY2017_PhD_Rajiv_Ranjan_Kumar.pdf
_version_ 1770567783115915264
spelling sg-smu-ink.etd_coll-14262022-09-22T09:33:37Z Towards improving system performance in large scale multi-agent systems with selfish agents KUMAR, Rajiv Ranjan Intelligent agents are becoming increasingly prevalent in a wide variety of domains including but not limited to transportation, safety and security. To better utilize the intelligence, there has been increasing focus on frameworks and methods for coordinating these intelligent agents. This thesis is specifically targeted at providing solution approaches for improving large scale multi-agent systems with selfish intelligent agents. In such systems, the performance of an agent depends on not just his/her own efforts, but also on other agent’s decisions. The complexity of interactions among multiple agents, coupled with the large scale nature of the problem domains and the uncertainties associated with the environment, make decision making very challenging. In this work, we specifically study the problem from the perspective of a centralized aggregator, that needs to maximize the revenue of the entire system.To that end, we study this problem from strategic and operational point of view. With regards to strategic decision making, we propose planning and deep reinforcement learning based solution algorithms to improve the system performance by optimizing the adaptive operating hours of selfish agents and by providing flexible work schedules to them. From operational point of view, we propose novel mechanism to incentivise selfish agents, so that performance of all the agents and the overall system improve . Basically, through strategic and operational decision making, we assist selfish agents in making intelligent decisions that results in improved system performance. In the first part of this thesis, we focus on making strategic decisions for the workers in the digital gig economy. To provide a concrete context, we focus on taxi drivers in the transport gig economy. Taxi fleets and car aggregation systems are an important component of the urban public transportation system. Taxis and cars in taxi fleets and car aggregation systems (e.g., Uber) are dependent on a large number of self-controlled and profitdriven taxi drivers, which introduces inefficiencies in the system. There are two ways in which taxi fleet performance can be optimized: (i) Operational decision making: improve assignment of taxis/cars to customers, while accounting for future demand; (ii) strategic decision making: optimize operating hours of (taxi and car) drivers. Existing research has primarily focused on the operational decisions in (i) and we focus on the strategic decisions in (ii). We first model this complex real world decision making problem (with thousands of taxi drivers) as a multi-stage stochastic congestion game with a non dedicated set of agents (i.e., agents start operation at a random stage and exit the game after a fixed time), where there is a dynamic population of agents (constrained by the maximum number of drivers). We provide planning and learning methods for computing the ideal operating hours in such a game, so as to improve efficiency of the overall fleet. In our experimental results, we demonstrate that our planning based approach provides up to 16% improvement in revenue over existing method on a real world taxi dataset. The learning based approach further improves the performance and achieves up to 10% more revenue than the planning approach. In second part of this thesis, We focus on: a) addressing the problem of handling schedule constraints of individual agents (e.g., breaks during work hours) to provide a flexible work schedule for them; and b) provide a scalable solution approach in such large scale problem settings. We introduced a simulation based (faster) equilibrium computation method that relies on policy imputation. We studied and analyzed different imputation methods and show that a good imputation method coupled with a well designed simulation based best response computation can help in achieving better symmetric equilibrium for large scale systems, in a time efficient manner. We demonstrate that our methods provide significantly better policies than the previous approach in terms of improving individual agent revenue and overall agent availability. In the third/final part of the thesis, we focus of operational decision making, where we improve system performance by inducing cooperation among selfish agents. Here we focus on principal-agent problem setting. Principalagent relationships, where a principal employs several agents to accomplish tasks on its behalf, are prevalent in many domains (e.g., Manufacturer distributors for product distribution, Uber-taxi drivers for transportation, FoodPanda-delivery personnel for food delivery). Principal has a global observation on all the tasks, while agents only have local observations with regards to local tasks. This limited observability coupled with selfish interest of agents results in a misalignment between Principal and agents objectives. We provide Multi-Agent Reinforcement Learning (MARL) approaches for sequentially designing incentives that improves objectives for principal and agents. We demonstrate that our approaches are able to outperform the state of art approaches for sequential incentive design on Escape-Room and adapted StarCraft-2 environments. 2022-07-01T07:00:00Z text application/pdf https://ink.library.smu.edu.sg/etd_coll/428 https://ink.library.smu.edu.sg/context/etd_coll/article/1426/viewcontent/GPIS_AY2017_PhD_Rajiv_Ranjan_Kumar.pdf http://creativecommons.org/licenses/by-nc-nd/4.0/ Dissertations and Theses Collection (Open Access) eng Institutional Knowledge at Singapore Management University Artificial Intelligence and Robotics Graphics and Human Computer Interfaces