Burst-induced Multi-Armed Bandit for learning recommendation

In this paper, we introduce a non-stationary and context-free Multi-Armed Bandit (MAB) problem and a novel algorithm (which we refer to as BMAB) to solve it. The problem is context-free in the sense that no side information about users or items is needed. We work in a continuous-time setting where e...

全面介紹

Saved in:

書目詳細資料
Main Authors:	ALVES, Rodrigo, LEDENT, Antoine, KLOFT, Marius
格式:	text
語言:	English
出版:	Institutional Knowledge at Singapore Management University 2021
主題:	Recommender Systems Reinforcement Learning Online learning Poisson processes Time Series Analysis bursty methods audience dynamics Artificial Intelligence and Robotics Numerical Analysis and Scientific Computing
在線閱讀:	https://ink.library.smu.edu.sg/sis_research/7209 https://ink.library.smu.edu.sg/context/sis_research/article/8212/viewcontent/3460231.3474250.pdf
標簽:	添加標簽沒有標簽, 成為第一個標記此記錄!
機構:	Singapore Management University
語言:	English

實物特徵
總結:	In this paper, we introduce a non-stationary and context-free Multi-Armed Bandit (MAB) problem and a novel algorithm (which we refer to as BMAB) to solve it. The problem is context-free in the sense that no side information about users or items is needed. We work in a continuous-time setting where each timestamp corresponds to a visit by a user and a corresponding decision regarding recommendation. The main novelty is that we model the reward distribution as a consequence of variations in the intensity of the activity, and thereby we assist the exploration/exploitation dilemma by exploring the temporal dynamics of the audience. To achieve this, we assume that the recommendation procedure can be split into two different states: the loyal and the curious state. We identify the current state by modelling the events as a mixture of two Poisson processes, one for each of the possible states. We further assume that the loyal audience is associated with a single stationary reward distribution, but each bursty period comes with its own reward distribution. We test our algorithm and compare it to several baselines in two strands of experiments: synthetic data simulations and real-world datasets. The results demonstrate that BMAB achieves competitive results when compared to state-of-the-art methods.

Burst-induced Multi-Armed Bandit for learning recommendation

相似書籍