Coping with uncertainties in agent cooperation

In multi-agent systems, intelligent agents interact with one another to achieve either individual or shared goals. This thesis focuses on scenarios where agents cooperate to achieve their goals. In particular, agents can help one another by information sharing or collaboration. Thus, agent cooperati...

Full description

Saved in:
Bibliographic Details
Main Author: Chen, Shuo
Other Authors: [Supervisor not in the list]
Format: Thesis-Doctor of Philosophy
Language:English
Published: Nanyang Technological University 2020
Subjects:
Online Access:https://hdl.handle.net/10356/137074
Tags: Add Tag
No Tags, Be the first to tag this record!
Institution: Nanyang Technological University
Language: English
Description
Summary:In multi-agent systems, intelligent agents interact with one another to achieve either individual or shared goals. This thesis focuses on scenarios where agents cooperate to achieve their goals. In particular, agents can help one another by information sharing or collaboration. Thus, agent cooperation is usually beneficial for the agent performance. However, there exist uncertainties that can jeopardise the successful achievement of goals. This thesis tackles the problems that arise from those uncertainties. The first problem emerges when agents pursue individual goals. Agents can share the information of system states with one another to make more informed decisions while pursuing their goals. However, agents may send false information to mislead others and thus, increase their own benefits. Therefore, there is uncertainty about whether the information shared by other agents is trustworthy or not. The existing approaches use trust management schemes to compute the trustworthiness of shared information. However, they only focus on the accuracies of trustworthiness without considering the cost/delay incurred during the trust computation. This thesis proposes a partially observable Markov decision process (POMDP) model. The model queries the information about uncertain states from neighbouring agents while taking into account their potential malicious behaviours. We also propose an algorithm to learn model parameters in the dynamic scenario where malicious agents change their behaviour from time to time. Experimental results demonstrate that our model can effectively balance the decision quality and response time while still being robust to sophisticated malicious attacks. When goals are too complex for a single agent, agents can compose teams and coordinate their actions to achieve them. There, the uncertainty arises when agents cannot communicate or share team strategies with their teammates. Specifically, an agent has to understand the behaviour of its teammates and plan its actions accordingly. Here, the behaviour amounts to a function that takes a state as input and outputs an action that can further change the state. Note that the agent can only observe its teammates’ actions or state changes instead of their underlying behaviours. Thus, there is uncertainty about teammates’ behaviours. Agents performing the teamwork without relying on communication or team strategies is called the ad hoc teamwork. We refer to the domains where the agent can fully observe its teammates’ actions as simple domains. We refer to the domains where teammates’ actions are partially observable as complex domains. There, the agent can only rely on its teammates’ state changes. For ad hoc teamwork in simple domains, the existing approaches use teammates’ behaviour models to predict their actions and choose the ad hoc agent’s action accordingly. However, the behaviour models may not be accurate, which can compromise teamwork. In this thesis, we propose Ad Hoc Teamwork by Sub-task Inference and Selection (ATSIS) algorithm that uses a sub-task inference without relying on teammates’ models. First, the ad hoc agent observes its teammates to infer which sub-tasks they are handling. Based on that, it selects its own sub-task using a POMDP model that handles the uncertainty of the sub-task inference. Last, the ad hoc agent uses the Monte Carlo tree search (MCTS) to find the set of actions to perform the chosen sub-task. Experiments demonstrate that ATSIS achieves the teamwork robustly. Also, ATSIS makes much faster decisions than state-of-the-art schemes, which is significant for time-sensitive tasks. Moreover, ATSIS can further improve its performance by integrating the learned model. For ad hoc teamwork in complex domains, the most advanced approach learns policies based on previous experiences and reuses one of the policies to interact with new teammates. However, the selected policy in many cases is sub-optimal. Switching between policies to adapt to new teammates’ behaviour takes time, which threatens the successful performance of a task. In this thesis, we propose Achieving the Ad Hoc Teamwork by Employing the Attention Mechanism (AATEAM) algorithm that uses the attention-based neural networks to cope with new teammates’ behaviour in real-time. We train one attention network per teammate type. The attention networks learn both to extract the temporal correlations from the sequence of states (i.e. contexts) and the mapping from contexts to actions. Each attention network also learns to predict a future state given the current context and its output action. The prediction accuracies help to determine which actions the ad hoc agent should take. Experimental results indicate that when working with both known and unknown teammates, in most cases our algorithm outperforms the most advanced approach. This demonstrates that AATEAM can adapt to new teammates’ changing behaviour faster than the state-of-the-art.