Imitating opponent to win: Adversarial policy imitation learning in two-player competitive games

Recent research on vulnerabilities of deep reinforcement learning (RL) has shown that adversarial policies adopted by an adversary agent can influence a target RL agent (victim agent) to perform poorly in a multi-agent environment. In existing studies, adversarial policies are directly trained based...

Full description

Saved in:

Bibliographic Details
Main Authors:	BUI, The Viet, MAI, Tien, NGUYEN, Thanh H.
Format:	text
Language:	English
Published:	Institutional Knowledge at Singapore Management University 2023
Subjects:	Reinforcement Learning Non-zero-sum Multi-agent Competition Adversarial Policy Imitation Learning Artificial Intelligence and Robotics Numerical Analysis and Scientific Computing
Online Access:	https://ink.library.smu.edu.sg/sis_research/8332 https://ink.library.smu.edu.sg/context/sis_research/article/9335/viewcontent/AAMAS23.pdf
Tags:	Add Tag No Tags, Be the first to tag this record!
Institution:	Singapore Management University
Language:	English

id	sg-smu-ink.sis_research-9335
record_format	dspace
spelling	sg-smu-ink.sis_research-93352023-12-05T02:56:22Z Imitating opponent to win: Adversarial policy imitation learning in two-player competitive games BUI, The Viet MAI, Tien NGUYEN, Thanh H. Recent research on vulnerabilities of deep reinforcement learning (RL) has shown that adversarial policies adopted by an adversary agent can influence a target RL agent (victim agent) to perform poorly in a multi-agent environment. In existing studies, adversarial policies are directly trained based on experiences of interacting with the victim agent. There is a key shortcoming of this approach --- knowledge derived from historical interactions may not be properly generalized to unexplored policy regions of the victim agent, making the trained adversarial policy significantly less effective. In this work, we design a new effective adversarial policy learning algorithm that overcomes this shortcoming. The core idea of our new algorithm is to create a new imitator --- the imitator will learn to imitate the victim agent's policy while the adversarial policy will be trained not only based on interactions with the victim agent but also based on feedback from the imitator to forecast victim's intention. By doing so, we can leverage the capability of imitation learning in well capturing underlying characteristics of the victim policy only based on sample trajectories of the victim. Our victim imitation learning model differs from prior models as the environment's dynamics are driven by adversary's policy and will keep changing during the adversarial policy training. We provide a provable bound to guarantee a desired imitating policy when the adversary's policy becomes stable. We further strengthen our adversarial policy learning by making our imitator a stronger version of the victim. That is, we incorporate the opposite of the adversary's value function to the imitation objective, leading the imitator not only to learn the victim policy but also to be adversarial to the adversary. Finally, our extensive experiments using four competitive MuJoCo game environments show that our proposed adversarial policy learning algorithm outperforms state-of-the-art algorithms. 2023-06-01T07:00:00Z text application/pdf https://ink.library.smu.edu.sg/sis_research/8332 info:doi/10.48550/arXiv.2210.16915 https://ink.library.smu.edu.sg/context/sis_research/article/9335/viewcontent/AAMAS23.pdf http://creativecommons.org/licenses/by-nc-nd/4.0/ Research Collection School Of Computing and Information Systems eng Institutional Knowledge at Singapore Management University Reinforcement Learning Non-zero-sum Multi-agent Competition Adversarial Policy Imitation Learning Artificial Intelligence and Robotics Numerical Analysis and Scientific Computing
institution	Singapore Management University
building	SMU Libraries
continent	Asia
country	Singapore Singapore
content_provider	SMU Libraries
collection	InK@SMU
language	English
topic	Reinforcement Learning Non-zero-sum Multi-agent Competition Adversarial Policy Imitation Learning Artificial Intelligence and Robotics Numerical Analysis and Scientific Computing
spellingShingle	Reinforcement Learning Non-zero-sum Multi-agent Competition Adversarial Policy Imitation Learning Artificial Intelligence and Robotics Numerical Analysis and Scientific Computing BUI, The Viet MAI, Tien NGUYEN, Thanh H. Imitating opponent to win: Adversarial policy imitation learning in two-player competitive games
description	Recent research on vulnerabilities of deep reinforcement learning (RL) has shown that adversarial policies adopted by an adversary agent can influence a target RL agent (victim agent) to perform poorly in a multi-agent environment. In existing studies, adversarial policies are directly trained based on experiences of interacting with the victim agent. There is a key shortcoming of this approach --- knowledge derived from historical interactions may not be properly generalized to unexplored policy regions of the victim agent, making the trained adversarial policy significantly less effective. In this work, we design a new effective adversarial policy learning algorithm that overcomes this shortcoming. The core idea of our new algorithm is to create a new imitator --- the imitator will learn to imitate the victim agent's policy while the adversarial policy will be trained not only based on interactions with the victim agent but also based on feedback from the imitator to forecast victim's intention. By doing so, we can leverage the capability of imitation learning in well capturing underlying characteristics of the victim policy only based on sample trajectories of the victim. Our victim imitation learning model differs from prior models as the environment's dynamics are driven by adversary's policy and will keep changing during the adversarial policy training. We provide a provable bound to guarantee a desired imitating policy when the adversary's policy becomes stable. We further strengthen our adversarial policy learning by making our imitator a stronger version of the victim. That is, we incorporate the opposite of the adversary's value function to the imitation objective, leading the imitator not only to learn the victim policy but also to be adversarial to the adversary. Finally, our extensive experiments using four competitive MuJoCo game environments show that our proposed adversarial policy learning algorithm outperforms state-of-the-art algorithms.
format	text
author	BUI, The Viet MAI, Tien NGUYEN, Thanh H.
author_facet	BUI, The Viet MAI, Tien NGUYEN, Thanh H.
author_sort	BUI, The Viet
title	Imitating opponent to win: Adversarial policy imitation learning in two-player competitive games
title_short	Imitating opponent to win: Adversarial policy imitation learning in two-player competitive games
title_full	Imitating opponent to win: Adversarial policy imitation learning in two-player competitive games
title_fullStr	Imitating opponent to win: Adversarial policy imitation learning in two-player competitive games
title_full_unstemmed	Imitating opponent to win: Adversarial policy imitation learning in two-player competitive games
title_sort	imitating opponent to win: adversarial policy imitation learning in two-player competitive games
publisher	Institutional Knowledge at Singapore Management University
publishDate	2023
url	https://ink.library.smu.edu.sg/sis_research/8332 https://ink.library.smu.edu.sg/context/sis_research/article/9335/viewcontent/AAMAS23.pdf
_version_	1784855636988657664

Imitating opponent to win: Adversarial policy imitation learning in two-player competitive games

Similar Items