A near-optimal change-detection based algorithm for piecewise-stationary combinatorial semi-bandits

We investigate the piecewise-stationary combinatorial semi-bandit problem. Compared to the original combinatorial semi-bandit problem, our setting assumes the reward distributions of base arms may change in a piecewise-stationary manner at unknown time steps. We propose an algorithm, GLR-CUCB, which...

Full description

Saved in:

Bibliographic Details
Main Authors:	ZHOU, Huozhi, WANG, Lingda, VARSHNEY, Lav N., LIM, Ee-Peng
Format:	text
Language:	English
Published:	Institutional Knowledge at Singapore Management University 2020
Subjects:	Databases and Information Systems Theory and Algorithms
Online Access:	https://ink.library.smu.edu.sg/sis_research/4968 https://ink.library.smu.edu.sg/context/sis_research/article/5971/viewcontent/Near_optimal_change_detection_av.pdf
Tags:	Add Tag No Tags, Be the first to tag this record!
Institution:	Singapore Management University
Language:	English

Description
Summary:	We investigate the piecewise-stationary combinatorial semi-bandit problem. Compared to the original combinatorial semi-bandit problem, our setting assumes the reward distributions of base arms may change in a piecewise-stationary manner at unknown time steps. We propose an algorithm, GLR-CUCB, which incorporates an eﬃcient combinatorial semi-bandit algorithm, CUCB, with an almost parameter-free change-point detector, the Generalized Likelihood Ratio Test (GLRT). Our analysis shows that the regret of GLR-CUCB is upper bounded by O(√NKT logT), where N is the number of piecewise-stationary segments, K is the number of base arms, and T is the number of time steps. As a complement, we also derive a nearly matching regret lower bound on the order of Ω(√NKT), for both piecewise-stationary multi-armed bandits and combinatorial semi-bandits, using information-theoretic techniques and judiciously constructed piecewisestationary bandit instances. Our lower bound is tighter than the best available regret lower bound, which is Ω(√T). Numerical experiments on both synthetic and real-world datasets demonstrate the superiority of GLR-CUCB compared to other state-of-the-art algorithms.

A near-optimal change-detection based algorithm for piecewise-stationary combinatorial semi-bandits

Similar Items