Recommendation via reinforcement learning methods

Recommender system has been a persistent research goal for decades, which aims at recommending suitable items such as movies to users. Supervised learning methods are widely adopted by modeling recommendation problems as prediction tasks. However, with the rise of online e-commerce platforms, variou...

Full description

Saved in:

Bibliographic Details
Main Author:	Xu, He
Other Authors:	Bo An
Format:	Thesis-Doctor of Philosophy
Language:	English
Published:	Nanyang Technological University 2021
Subjects:	Engineering::Computer science and engineering::Computing methodologies::Artificial intelligence
Online Access:	https://hdl.handle.net/10356/152271
Tags:	Add Tag No Tags, Be the first to tag this record!
Institution:	Nanyang Technological University
Language:	English

id	sg-ntu-dr.10356-152271
record_format	dspace
institution	Nanyang Technological University
building	NTU Library
continent	Asia
country	Singapore Singapore
content_provider	NTU Library
collection	DR-NTU
language	English
topic	Engineering::Computer science and engineering::Computing methodologies::Artificial intelligence
spellingShingle	Engineering::Computer science and engineering::Computing methodologies::Artificial intelligence Xu, He Recommendation via reinforcement learning methods
description	Recommender system has been a persistent research goal for decades, which aims at recommending suitable items such as movies to users. Supervised learning methods are widely adopted by modeling recommendation problems as prediction tasks. However, with the rise of online e-commerce platforms, various scenarios appear, which allow users to make sequential decisions rather than one-time decisions. Therefore, reinforcement learning methods have attracted increasing attention in recent years to solve these problems. This doctoral thesis is devoted to investigating some recommendation settings that can be solved by reinforcement learning methods, including multi-arm bandit and multi-agent reinforcement learning. For the recommendation domain, most scenarios only involve a single agent that generates recommended items to users aiming at maximizing some metrics like click-through rate (CTR). Since candidate items change all the time in many online recommendation scenarios, one crucial issue is the trade-off between exploration and exploitation. Thus, we consider multi-arm bandit problems, a special topic in online learning and reinforcement learning to balance exploration and exploitation. We propose two methods to alleviate issues in recommendation problems. Firstly, we consider how users give feedback to items or actions chosen by an agent. Previous works rarely consider the uncertainty when humans provide feedback, especially in cases that the optimal actions are not obvious to the users. For example, when similar items are recommended to a user, the user is likely to provide positive feedback to suboptimal items, negative feedback to the optimal item and even do not provide feedback in some confusing situations. To involve uncertainties in the learning environment and human feedback, we introduce a feedback model. Moreover, a novel method is proposed to nd the optimal policy and proper feedback model simultaneously. Secondly, for the online recommendation in mobile devices, positions of items have a significant influence on clicks due to the limited screen size of mobile devices: 1) Higher positions lead to more clicks for one commodity. 2) The `pseudo-exposure' issue: Only a few recommended items are shown at first glance and users need to slide the screen to browse other items. Therefore, some recommended items ranked behind are not viewed by users and it is not proper to treat these items as negative samples. To address these two issues, we model the online recommendation as a contextual combinatorial bandit problem and define the reward of a recommended set. Then, we propose a novel contextual combinatorial bandit method and provide a formal regret analysis. An online experiment is implemented in Taobao, one of the most popular e-commerce platforms in the world. Results on two metrics show that our algorithm outperforms the other contextual bandit algorithms. For multi-agent reinforcement learning setting, we focus on a kind of recommendation scenario in online e-commerce platforms, which involves multiple modules to recommend items with different properties such as huge discounts. A web page often consists of some independent modules. The ranking policies of these modules are decided by different teams and optimized individually without cooperation, which would result in competition between modules. Thus, the global policy of the whole page could be sub-optimal. To address this issue, we propose a novel multi-agent cooperative reinforcement learning approach with the restriction that modules cannot communicate with others. Experiments based on real-world e-commerce data demonstrate that our algorithm obtains superior performance over baselines.
author2	Bo An
author_facet	Bo An Xu, He
format	Thesis-Doctor of Philosophy
author	Xu, He
author_sort	Xu, He
title	Recommendation via reinforcement learning methods
title_short	Recommendation via reinforcement learning methods
title_full	Recommendation via reinforcement learning methods
title_fullStr	Recommendation via reinforcement learning methods
title_full_unstemmed	Recommendation via reinforcement learning methods
title_sort	recommendation via reinforcement learning methods
publisher	Nanyang Technological University
publishDate	2021
url	https://hdl.handle.net/10356/152271
_version_	1710686946571845632
spelling	sg-ntu-dr.10356-1522712021-09-06T02:34:42Z Recommendation via reinforcement learning methods Xu, He Bo An School of Computer Science and Engineering boan@ntu.edu.sg Engineering::Computer science and engineering::Computing methodologies::Artificial intelligence Recommender system has been a persistent research goal for decades, which aims at recommending suitable items such as movies to users. Supervised learning methods are widely adopted by modeling recommendation problems as prediction tasks. However, with the rise of online e-commerce platforms, various scenarios appear, which allow users to make sequential decisions rather than one-time decisions. Therefore, reinforcement learning methods have attracted increasing attention in recent years to solve these problems. This doctoral thesis is devoted to investigating some recommendation settings that can be solved by reinforcement learning methods, including multi-arm bandit and multi-agent reinforcement learning. For the recommendation domain, most scenarios only involve a single agent that generates recommended items to users aiming at maximizing some metrics like click-through rate (CTR). Since candidate items change all the time in many online recommendation scenarios, one crucial issue is the trade-off between exploration and exploitation. Thus, we consider multi-arm bandit problems, a special topic in online learning and reinforcement learning to balance exploration and exploitation. We propose two methods to alleviate issues in recommendation problems. Firstly, we consider how users give feedback to items or actions chosen by an agent. Previous works rarely consider the uncertainty when humans provide feedback, especially in cases that the optimal actions are not obvious to the users. For example, when similar items are recommended to a user, the user is likely to provide positive feedback to suboptimal items, negative feedback to the optimal item and even do not provide feedback in some confusing situations. To involve uncertainties in the learning environment and human feedback, we introduce a feedback model. Moreover, a novel method is proposed to nd the optimal policy and proper feedback model simultaneously. Secondly, for the online recommendation in mobile devices, positions of items have a significant influence on clicks due to the limited screen size of mobile devices: 1) Higher positions lead to more clicks for one commodity. 2) The `pseudo-exposure' issue: Only a few recommended items are shown at first glance and users need to slide the screen to browse other items. Therefore, some recommended items ranked behind are not viewed by users and it is not proper to treat these items as negative samples. To address these two issues, we model the online recommendation as a contextual combinatorial bandit problem and define the reward of a recommended set. Then, we propose a novel contextual combinatorial bandit method and provide a formal regret analysis. An online experiment is implemented in Taobao, one of the most popular e-commerce platforms in the world. Results on two metrics show that our algorithm outperforms the other contextual bandit algorithms. For multi-agent reinforcement learning setting, we focus on a kind of recommendation scenario in online e-commerce platforms, which involves multiple modules to recommend items with different properties such as huge discounts. A web page often consists of some independent modules. The ranking policies of these modules are decided by different teams and optimized individually without cooperation, which would result in competition between modules. Thus, the global policy of the whole page could be sub-optimal. To address this issue, we propose a novel multi-agent cooperative reinforcement learning approach with the restriction that modules cannot communicate with others. Experiments based on real-world e-commerce data demonstrate that our algorithm obtains superior performance over baselines. Doctor of Philosophy 2021-07-28T06:07:44Z 2021-07-28T06:07:44Z 2021 Thesis-Doctor of Philosophy Xu, H. (2021). Recommendation via reinforcement learning methods. Doctoral thesis, Nanyang Technological University, Singapore. https://hdl.handle.net/10356/152271 https://hdl.handle.net/10356/152271 10.32657/10356/152271 en This work is licensed under a Creative Commons Attribution-NonCommercial 4.0 International License (CC BY-NC 4.0). application/pdf Nanyang Technological University

Recommendation via reinforcement learning methods

Similar Items