Burst-induced Multi-Armed Bandit for learning recommendation

In this paper, we introduce a non-stationary and context-free Multi-Armed Bandit (MAB) problem and a novel algorithm (which we refer to as BMAB) to solve it. The problem is context-free in the sense that no side information about users or items is needed. We work in a continuous-time setting where e...

Full description

Saved in:

Bibliographic Details
Main Authors:	ALVES, Rodrigo, LEDENT, Antoine, KLOFT, Marius
Format:	text
Language:	English
Published:	Institutional Knowledge at Singapore Management University 2021
Subjects:	Recommender Systems Reinforcement Learning Online learning Poisson processes Time Series Analysis bursty methods audience dynamics Artificial Intelligence and Robotics Numerical Analysis and Scientific Computing
Online Access:	https://ink.library.smu.edu.sg/sis_research/7209 https://ink.library.smu.edu.sg/context/sis_research/article/8212/viewcontent/3460231.3474250.pdf
Tags:	Add Tag No Tags, Be the first to tag this record!
Institution:	Singapore Management University
Language:	English

id	sg-smu-ink.sis_research-8212
record_format	dspace
spelling	sg-smu-ink.sis_research-82122022-08-04T08:45:28Z Burst-induced Multi-Armed Bandit for learning recommendation ALVES, Rodrigo LEDENT, Antoine KLOFT, Marius In this paper, we introduce a non-stationary and context-free Multi-Armed Bandit (MAB) problem and a novel algorithm (which we refer to as BMAB) to solve it. The problem is context-free in the sense that no side information about users or items is needed. We work in a continuous-time setting where each timestamp corresponds to a visit by a user and a corresponding decision regarding recommendation. The main novelty is that we model the reward distribution as a consequence of variations in the intensity of the activity, and thereby we assist the exploration/exploitation dilemma by exploring the temporal dynamics of the audience. To achieve this, we assume that the recommendation procedure can be split into two different states: the loyal and the curious state. We identify the current state by modelling the events as a mixture of two Poisson processes, one for each of the possible states. We further assume that the loyal audience is associated with a single stationary reward distribution, but each bursty period comes with its own reward distribution. We test our algorithm and compare it to several baselines in two strands of experiments: synthetic data simulations and real-world datasets. The results demonstrate that BMAB achieves competitive results when compared to state-of-the-art methods. 2021-10-01T07:00:00Z text application/pdf https://ink.library.smu.edu.sg/sis_research/7209 info:doi/10.1145/3460231.3474250 https://ink.library.smu.edu.sg/context/sis_research/article/8212/viewcontent/3460231.3474250.pdf http://creativecommons.org/licenses/by-nc-nd/4.0/ Research Collection School Of Computing and Information Systems eng Institutional Knowledge at Singapore Management University Recommender Systems Reinforcement Learning Online learning Poisson processes Time Series Analysis bursty methods audience dynamics Artificial Intelligence and Robotics Numerical Analysis and Scientific Computing
institution	Singapore Management University
building	SMU Libraries
continent	Asia
country	Singapore Singapore
content_provider	SMU Libraries
collection	InK@SMU
language	English
topic	Recommender Systems Reinforcement Learning Online learning Poisson processes Time Series Analysis bursty methods audience dynamics Artificial Intelligence and Robotics Numerical Analysis and Scientific Computing
spellingShingle	Recommender Systems Reinforcement Learning Online learning Poisson processes Time Series Analysis bursty methods audience dynamics Artificial Intelligence and Robotics Numerical Analysis and Scientific Computing ALVES, Rodrigo LEDENT, Antoine KLOFT, Marius Burst-induced Multi-Armed Bandit for learning recommendation
description	In this paper, we introduce a non-stationary and context-free Multi-Armed Bandit (MAB) problem and a novel algorithm (which we refer to as BMAB) to solve it. The problem is context-free in the sense that no side information about users or items is needed. We work in a continuous-time setting where each timestamp corresponds to a visit by a user and a corresponding decision regarding recommendation. The main novelty is that we model the reward distribution as a consequence of variations in the intensity of the activity, and thereby we assist the exploration/exploitation dilemma by exploring the temporal dynamics of the audience. To achieve this, we assume that the recommendation procedure can be split into two different states: the loyal and the curious state. We identify the current state by modelling the events as a mixture of two Poisson processes, one for each of the possible states. We further assume that the loyal audience is associated with a single stationary reward distribution, but each bursty period comes with its own reward distribution. We test our algorithm and compare it to several baselines in two strands of experiments: synthetic data simulations and real-world datasets. The results demonstrate that BMAB achieves competitive results when compared to state-of-the-art methods.
format	text
author	ALVES, Rodrigo LEDENT, Antoine KLOFT, Marius
author_facet	ALVES, Rodrigo LEDENT, Antoine KLOFT, Marius
author_sort	ALVES, Rodrigo
title	Burst-induced Multi-Armed Bandit for learning recommendation
title_short	Burst-induced Multi-Armed Bandit for learning recommendation
title_full	Burst-induced Multi-Armed Bandit for learning recommendation
title_fullStr	Burst-induced Multi-Armed Bandit for learning recommendation
title_full_unstemmed	Burst-induced Multi-Armed Bandit for learning recommendation
title_sort	burst-induced multi-armed bandit for learning recommendation
publisher	Institutional Knowledge at Singapore Management University
publishDate	2021
url	https://ink.library.smu.edu.sg/sis_research/7209 https://ink.library.smu.edu.sg/context/sis_research/article/8212/viewcontent/3460231.3474250.pdf
_version_	1770576270387576832

Burst-induced Multi-Armed Bandit for learning recommendation

Similar Items