Burst-induced Multi-Armed Bandit for learning recommendation
In this paper, we introduce a non-stationary and context-free Multi-Armed Bandit (MAB) problem and a novel algorithm (which we refer to as BMAB) to solve it. The problem is context-free in the sense that no side information about users or items is needed. We work in a continuous-time setting where e...
Saved in:
Main Authors: | , , |
---|---|
Format: | text |
Language: | English |
Published: |
Institutional Knowledge at Singapore Management University
2021
|
Subjects: | |
Online Access: | https://ink.library.smu.edu.sg/sis_research/7209 https://ink.library.smu.edu.sg/context/sis_research/article/8212/viewcontent/3460231.3474250.pdf |
Tags: |
Add Tag
No Tags, Be the first to tag this record!
|
Institution: | Singapore Management University |
Language: | English |
id |
sg-smu-ink.sis_research-8212 |
---|---|
record_format |
dspace |
spelling |
sg-smu-ink.sis_research-82122022-08-04T08:45:28Z Burst-induced Multi-Armed Bandit for learning recommendation ALVES, Rodrigo LEDENT, Antoine KLOFT, Marius In this paper, we introduce a non-stationary and context-free Multi-Armed Bandit (MAB) problem and a novel algorithm (which we refer to as BMAB) to solve it. The problem is context-free in the sense that no side information about users or items is needed. We work in a continuous-time setting where each timestamp corresponds to a visit by a user and a corresponding decision regarding recommendation. The main novelty is that we model the reward distribution as a consequence of variations in the intensity of the activity, and thereby we assist the exploration/exploitation dilemma by exploring the temporal dynamics of the audience. To achieve this, we assume that the recommendation procedure can be split into two different states: the loyal and the curious state. We identify the current state by modelling the events as a mixture of two Poisson processes, one for each of the possible states. We further assume that the loyal audience is associated with a single stationary reward distribution, but each bursty period comes with its own reward distribution. We test our algorithm and compare it to several baselines in two strands of experiments: synthetic data simulations and real-world datasets. The results demonstrate that BMAB achieves competitive results when compared to state-of-the-art methods. 2021-10-01T07:00:00Z text application/pdf https://ink.library.smu.edu.sg/sis_research/7209 info:doi/10.1145/3460231.3474250 https://ink.library.smu.edu.sg/context/sis_research/article/8212/viewcontent/3460231.3474250.pdf http://creativecommons.org/licenses/by-nc-nd/4.0/ Research Collection School Of Computing and Information Systems eng Institutional Knowledge at Singapore Management University Recommender Systems Reinforcement Learning Online learning Poisson processes Time Series Analysis bursty methods audience dynamics Artificial Intelligence and Robotics Numerical Analysis and Scientific Computing |
institution |
Singapore Management University |
building |
SMU Libraries |
continent |
Asia |
country |
Singapore Singapore |
content_provider |
SMU Libraries |
collection |
InK@SMU |
language |
English |
topic |
Recommender Systems Reinforcement Learning Online learning Poisson processes Time Series Analysis bursty methods audience dynamics Artificial Intelligence and Robotics Numerical Analysis and Scientific Computing |
spellingShingle |
Recommender Systems Reinforcement Learning Online learning Poisson processes Time Series Analysis bursty methods audience dynamics Artificial Intelligence and Robotics Numerical Analysis and Scientific Computing ALVES, Rodrigo LEDENT, Antoine KLOFT, Marius Burst-induced Multi-Armed Bandit for learning recommendation |
description |
In this paper, we introduce a non-stationary and context-free Multi-Armed Bandit (MAB) problem and a novel algorithm (which we refer to as BMAB) to solve it. The problem is context-free in the sense that no side information about users or items is needed. We work in a continuous-time setting where each timestamp corresponds to a visit by a user and a corresponding decision regarding recommendation. The main novelty is that we model the reward distribution as a consequence of variations in the intensity of the activity, and thereby we assist the exploration/exploitation dilemma by exploring the temporal dynamics of the audience. To achieve this, we assume that the recommendation procedure can be split into two different states: the loyal and the curious state. We identify the current state by modelling the events as a mixture of two Poisson processes, one for each of the possible states. We further assume that the loyal audience is associated with a single stationary reward distribution, but each bursty period comes with its own reward distribution. We test our algorithm and compare it to several baselines in two strands of experiments: synthetic data simulations and real-world datasets. The results demonstrate that BMAB achieves competitive results when compared to state-of-the-art methods. |
format |
text |
author |
ALVES, Rodrigo LEDENT, Antoine KLOFT, Marius |
author_facet |
ALVES, Rodrigo LEDENT, Antoine KLOFT, Marius |
author_sort |
ALVES, Rodrigo |
title |
Burst-induced Multi-Armed Bandit for learning recommendation |
title_short |
Burst-induced Multi-Armed Bandit for learning recommendation |
title_full |
Burst-induced Multi-Armed Bandit for learning recommendation |
title_fullStr |
Burst-induced Multi-Armed Bandit for learning recommendation |
title_full_unstemmed |
Burst-induced Multi-Armed Bandit for learning recommendation |
title_sort |
burst-induced multi-armed bandit for learning recommendation |
publisher |
Institutional Knowledge at Singapore Management University |
publishDate |
2021 |
url |
https://ink.library.smu.edu.sg/sis_research/7209 https://ink.library.smu.edu.sg/context/sis_research/article/8212/viewcontent/3460231.3474250.pdf |
_version_ |
1770576270387576832 |