Burst-induced Multi-Armed Bandit for learning recommendation

In this paper, we introduce a non-stationary and context-free Multi-Armed Bandit (MAB) problem and a novel algorithm (which we refer to as BMAB) to solve it. The problem is context-free in the sense that no side information about users or items is needed. We work in a continuous-time setting where e...

Full description

Saved in:
Bibliographic Details
Main Authors: ALVES, Rodrigo, LEDENT, Antoine, KLOFT, Marius
Format: text
Language:English
Published: Institutional Knowledge at Singapore Management University 2021
Subjects:
Online Access:https://ink.library.smu.edu.sg/sis_research/7209
https://ink.library.smu.edu.sg/context/sis_research/article/8212/viewcontent/3460231.3474250.pdf
Tags: Add Tag
No Tags, Be the first to tag this record!
Institution: Singapore Management University
Language: English
id sg-smu-ink.sis_research-8212
record_format dspace
spelling sg-smu-ink.sis_research-82122022-08-04T08:45:28Z Burst-induced Multi-Armed Bandit for learning recommendation ALVES, Rodrigo LEDENT, Antoine KLOFT, Marius In this paper, we introduce a non-stationary and context-free Multi-Armed Bandit (MAB) problem and a novel algorithm (which we refer to as BMAB) to solve it. The problem is context-free in the sense that no side information about users or items is needed. We work in a continuous-time setting where each timestamp corresponds to a visit by a user and a corresponding decision regarding recommendation. The main novelty is that we model the reward distribution as a consequence of variations in the intensity of the activity, and thereby we assist the exploration/exploitation dilemma by exploring the temporal dynamics of the audience. To achieve this, we assume that the recommendation procedure can be split into two different states: the loyal and the curious state. We identify the current state by modelling the events as a mixture of two Poisson processes, one for each of the possible states. We further assume that the loyal audience is associated with a single stationary reward distribution, but each bursty period comes with its own reward distribution. We test our algorithm and compare it to several baselines in two strands of experiments: synthetic data simulations and real-world datasets. The results demonstrate that BMAB achieves competitive results when compared to state-of-the-art methods. 2021-10-01T07:00:00Z text application/pdf https://ink.library.smu.edu.sg/sis_research/7209 info:doi/10.1145/3460231.3474250 https://ink.library.smu.edu.sg/context/sis_research/article/8212/viewcontent/3460231.3474250.pdf http://creativecommons.org/licenses/by-nc-nd/4.0/ Research Collection School Of Computing and Information Systems eng Institutional Knowledge at Singapore Management University Recommender Systems Reinforcement Learning Online learning Poisson processes Time Series Analysis bursty methods audience dynamics Artificial Intelligence and Robotics Numerical Analysis and Scientific Computing
institution Singapore Management University
building SMU Libraries
continent Asia
country Singapore
Singapore
content_provider SMU Libraries
collection InK@SMU
language English
topic Recommender Systems
Reinforcement Learning
Online learning
Poisson processes
Time Series Analysis
bursty methods
audience dynamics
Artificial Intelligence and Robotics
Numerical Analysis and Scientific Computing
spellingShingle Recommender Systems
Reinforcement Learning
Online learning
Poisson processes
Time Series Analysis
bursty methods
audience dynamics
Artificial Intelligence and Robotics
Numerical Analysis and Scientific Computing
ALVES, Rodrigo
LEDENT, Antoine
KLOFT, Marius
Burst-induced Multi-Armed Bandit for learning recommendation
description In this paper, we introduce a non-stationary and context-free Multi-Armed Bandit (MAB) problem and a novel algorithm (which we refer to as BMAB) to solve it. The problem is context-free in the sense that no side information about users or items is needed. We work in a continuous-time setting where each timestamp corresponds to a visit by a user and a corresponding decision regarding recommendation. The main novelty is that we model the reward distribution as a consequence of variations in the intensity of the activity, and thereby we assist the exploration/exploitation dilemma by exploring the temporal dynamics of the audience. To achieve this, we assume that the recommendation procedure can be split into two different states: the loyal and the curious state. We identify the current state by modelling the events as a mixture of two Poisson processes, one for each of the possible states. We further assume that the loyal audience is associated with a single stationary reward distribution, but each bursty period comes with its own reward distribution. We test our algorithm and compare it to several baselines in two strands of experiments: synthetic data simulations and real-world datasets. The results demonstrate that BMAB achieves competitive results when compared to state-of-the-art methods.
format text
author ALVES, Rodrigo
LEDENT, Antoine
KLOFT, Marius
author_facet ALVES, Rodrigo
LEDENT, Antoine
KLOFT, Marius
author_sort ALVES, Rodrigo
title Burst-induced Multi-Armed Bandit for learning recommendation
title_short Burst-induced Multi-Armed Bandit for learning recommendation
title_full Burst-induced Multi-Armed Bandit for learning recommendation
title_fullStr Burst-induced Multi-Armed Bandit for learning recommendation
title_full_unstemmed Burst-induced Multi-Armed Bandit for learning recommendation
title_sort burst-induced multi-armed bandit for learning recommendation
publisher Institutional Knowledge at Singapore Management University
publishDate 2021
url https://ink.library.smu.edu.sg/sis_research/7209
https://ink.library.smu.edu.sg/context/sis_research/article/8212/viewcontent/3460231.3474250.pdf
_version_ 1770576270387576832