Improving sample efficiency using attention in deep reinforcement learning

Reinforcement learning is becoming increasingly popular due to its cumulative feats in mainstream games such as DOTA2 and Go as well as its applicability to many fields. It has displayed potential in exceeding human levels of performance in complicated environments and sequential decision-making pro...

وصف كامل

محفوظ في:

التفاصيل البيبلوغرافية
المؤلف الرئيسي:	Ong, Dorvin Poh Jie
مؤلفون آخرون:	Lee Bu Sung, Francis
التنسيق:	Final Year Project
اللغة:	English
منشور في:	Nanyang Technological University 2021
الموضوعات:	Engineering::Computer science and engineering::Computing methodologies::Artificial intelligence
الوصول للمادة أونلاين:	https://hdl.handle.net/10356/150563
الوسوم:	إضافة وسم لا توجد وسوم, كن أول من يضع وسما على هذه التسجيلة!
المؤسسة:	Nanyang Technological University
اللغة:	English

id	sg-ntu-dr.10356-150563
record_format	dspace
spelling	sg-ntu-dr.10356-1505632021-06-14T01:09:19Z Improving sample efficiency using attention in deep reinforcement learning Ong, Dorvin Poh Jie Lee Bu Sung, Francis School of Computer Science and Engineering EBSLEE@ntu.edu.sg Engineering::Computer science and engineering::Computing methodologies::Artificial intelligence Reinforcement learning is becoming increasingly popular due to its cumulative feats in mainstream games such as DOTA2 and Go as well as its applicability to many fields. It has displayed potential in exceeding human levels of performance in complicated environments and sequential decision-making problems. However, one limitation that has plagued reinforcement learning is the lacking sample efficiency. Reinforcement learning, amongst the three paradigms of machine learning, requires the most samples to produce a useful result. With more samples, more energy and time would be required to train a useful model, which is expensive. In this report, we conducted rigorous study into the reinforcement learning field, implemented the Proximal Policy Algorithm (PPO) and attempted to improve sample efficiency of reinforcement learning algorithms using self-attention models. Borrowing ideas from previous implementation of self-attention models, we experiment on variants of the Self-Attending Network(SAN) such as Channel-wise Self Attending (C-SAN) and Cross Attending Network (CAN), which is a combination of channel-column-wise and channel-row-wise attention. Our results have shown that CAN was distinctly more sample efficient than the original SAN and the vanilla PPO (No Attention) model in the game of Pong. However, shifting implementations towards Stable Baselines3 has returned results that differs from our findings in the earlier experiments. We attribute the discrepancy of the results to the implementation differences in the PPO algorithm. On the next experiment, we tested SAN, C-SAN and CAN on 49 Atari 2600 games. C-SAN was found to be better than the No Attention model by 15.36% on average while CAN and SAN were found to be worse by -14.44% and -1.47% respectively. Based on the results, we hypothesize that self-attention models could potentially perform better in complex environments because the benefits of a better state representation could facilitate learning a better policy. Further re-evaluation on more complex environments for a longer training duration has shown potential in CAN which managed to outperform other models. However, preliminary investigation of the reasons why self-attention works was inconclusive. Nevertheless, we provide some hypothesis in explaining the effect of self-attention models. Bachelor of Engineering (Computer Engineering) 2021-06-14T01:09:18Z 2021-06-14T01:09:18Z 2021 Final Year Project (FYP) Ong, D. P. J. (2021). Improving sample efficiency using attention in deep reinforcement learning. Final Year Project (FYP), Nanyang Technological University, Singapore. https://hdl.handle.net/10356/150563 https://hdl.handle.net/10356/150563 en SCSE20-0527 application/pdf Nanyang Technological University
institution	Nanyang Technological University
building	NTU Library
continent	Asia
country	Singapore Singapore
content_provider	NTU Library
collection	DR-NTU
language	English
topic	Engineering::Computer science and engineering::Computing methodologies::Artificial intelligence
spellingShingle	Engineering::Computer science and engineering::Computing methodologies::Artificial intelligence Ong, Dorvin Poh Jie Improving sample efficiency using attention in deep reinforcement learning
description	Reinforcement learning is becoming increasingly popular due to its cumulative feats in mainstream games such as DOTA2 and Go as well as its applicability to many fields. It has displayed potential in exceeding human levels of performance in complicated environments and sequential decision-making problems. However, one limitation that has plagued reinforcement learning is the lacking sample efficiency. Reinforcement learning, amongst the three paradigms of machine learning, requires the most samples to produce a useful result. With more samples, more energy and time would be required to train a useful model, which is expensive. In this report, we conducted rigorous study into the reinforcement learning field, implemented the Proximal Policy Algorithm (PPO) and attempted to improve sample efficiency of reinforcement learning algorithms using self-attention models. Borrowing ideas from previous implementation of self-attention models, we experiment on variants of the Self-Attending Network(SAN) such as Channel-wise Self Attending (C-SAN) and Cross Attending Network (CAN), which is a combination of channel-column-wise and channel-row-wise attention. Our results have shown that CAN was distinctly more sample efficient than the original SAN and the vanilla PPO (No Attention) model in the game of Pong. However, shifting implementations towards Stable Baselines3 has returned results that differs from our findings in the earlier experiments. We attribute the discrepancy of the results to the implementation differences in the PPO algorithm. On the next experiment, we tested SAN, C-SAN and CAN on 49 Atari 2600 games. C-SAN was found to be better than the No Attention model by 15.36% on average while CAN and SAN were found to be worse by -14.44% and -1.47% respectively. Based on the results, we hypothesize that self-attention models could potentially perform better in complex environments because the benefits of a better state representation could facilitate learning a better policy. Further re-evaluation on more complex environments for a longer training duration has shown potential in CAN which managed to outperform other models. However, preliminary investigation of the reasons why self-attention works was inconclusive. Nevertheless, we provide some hypothesis in explaining the effect of self-attention models.
author2	Lee Bu Sung, Francis
author_facet	Lee Bu Sung, Francis Ong, Dorvin Poh Jie
format	Final Year Project
author	Ong, Dorvin Poh Jie
author_sort	Ong, Dorvin Poh Jie
title	Improving sample efficiency using attention in deep reinforcement learning
title_short	Improving sample efficiency using attention in deep reinforcement learning
title_full	Improving sample efficiency using attention in deep reinforcement learning
title_fullStr	Improving sample efficiency using attention in deep reinforcement learning
title_full_unstemmed	Improving sample efficiency using attention in deep reinforcement learning
title_sort	improving sample efficiency using attention in deep reinforcement learning
publisher	Nanyang Technological University
publishDate	2021
url	https://hdl.handle.net/10356/150563
_version_	1703971200868286464

Improving sample efficiency using attention in deep reinforcement learning

مواد مشابهة