Multi-armed linear bandits with latent biases

In a linear stochastic bandit model, each arm corresponds to a vector in Euclidean space, and the expected return observed at each time step is determined by an unknown linear function of the selected arm. This paper addresses the challenge of identifying the optimal arm in a linear stochastic bandi...

Full description

Saved in:

Bibliographic Details
Main Authors:	Kang, Qiyu, Tay, Wee Peng, She, Rui, Wang, Sijie, Liu, Xiaoqian, Yang, Yuan-Rui
Other Authors:	School of Electrical and Electronic Engineering
Format:	Article
Language:	English
Published:	2024
Subjects:	Computer and Information Science Linear bandit Multi-armed bandit
Online Access:	https://hdl.handle.net/10356/175416
Tags:	Add Tag No Tags, Be the first to tag this record!
Institution:	Nanyang Technological University
Language:	English

id	sg-ntu-dr.10356-175416
record_format	dspace
spelling	sg-ntu-dr.10356-1754162024-04-26T15:55:30Z Multi-armed linear bandits with latent biases Kang, Qiyu Tay, Wee Peng She, Rui Wang, Sijie Liu, Xiaoqian Yang, Yuan-Rui School of Electrical and Electronic Engineering Computer and Information Science Linear bandit Multi-armed bandit In a linear stochastic bandit model, each arm corresponds to a vector in Euclidean space, and the expected return observed at each time step is determined by an unknown linear function of the selected arm. This paper addresses the challenge of identifying the optimal arm in a linear stochastic bandit model, where latent biases corrupt each arm's expected reward. Unlike traditional linear bandit problems, where the observed return directly represents the reward, this paper considers a scenario where the unbiased reward at each time step remains unobservable. This model is particularly relevant in situations where the observed return is influenced by latent biases that need to be carefully excluded from the learning model. For example, in recommendation systems designed to prevent racially discriminatory suggestions, it is crucial to ensure that the users' race does not influence the system. However, the observed return, such as click-through rates, may have already been influenced by racial attributes. In the case where there are finitely many arms, we develop a strategy to achieve O(\|D\|log⁡n) regret, where \|D\| is the number of arms and n is the number of time steps. In the case where each arm is chosen from an infinite compact set, our strategy achieves O(n2/3(log⁡n)1/2) regret. Experiments verify the efficiency of our strategy. Ministry of Education (MOE) Submitted/Accepted version This research is supported by the Singapore Ministry of Education Academic Research Fund Tier 2 grant MOE-T2EP20220-0002. 2024-04-23T05:32:23Z 2024-04-23T05:32:23Z 2024 Journal Article Kang, Q., Tay, W. P., She, R., Wang, S., Liu, X. & Yang, Y. (2024). Multi-armed linear bandits with latent biases. Information Sciences, 660, 120103-. https://dx.doi.org/10.1016/j.ins.2024.120103 0020-0255 https://hdl.handle.net/10356/175416 10.1016/j.ins.2024.120103 2-s2.0-85183462419 660 120103 en MOE-T2EP20220-0002 Information Sciences © 2024 Elsevier Inc. All rights reserved. This article may be downloaded for personal use only. Any other use requires prior permission of the copyright holder. The Version of Record is available online at http://doi.org/10.1016/j.ins.2024.120103. application/pdf
institution	Nanyang Technological University
building	NTU Library
continent	Asia
country	Singapore Singapore
content_provider	NTU Library
collection	DR-NTU
language	English
topic	Computer and Information Science Linear bandit Multi-armed bandit
spellingShingle	Computer and Information Science Linear bandit Multi-armed bandit Kang, Qiyu Tay, Wee Peng She, Rui Wang, Sijie Liu, Xiaoqian Yang, Yuan-Rui Multi-armed linear bandits with latent biases
description	In a linear stochastic bandit model, each arm corresponds to a vector in Euclidean space, and the expected return observed at each time step is determined by an unknown linear function of the selected arm. This paper addresses the challenge of identifying the optimal arm in a linear stochastic bandit model, where latent biases corrupt each arm's expected reward. Unlike traditional linear bandit problems, where the observed return directly represents the reward, this paper considers a scenario where the unbiased reward at each time step remains unobservable. This model is particularly relevant in situations where the observed return is influenced by latent biases that need to be carefully excluded from the learning model. For example, in recommendation systems designed to prevent racially discriminatory suggestions, it is crucial to ensure that the users' race does not influence the system. However, the observed return, such as click-through rates, may have already been influenced by racial attributes. In the case where there are finitely many arms, we develop a strategy to achieve O(\|D\|log⁡n) regret, where \|D\| is the number of arms and n is the number of time steps. In the case where each arm is chosen from an infinite compact set, our strategy achieves O(n2/3(log⁡n)1/2) regret. Experiments verify the efficiency of our strategy.
author2	School of Electrical and Electronic Engineering
author_facet	School of Electrical and Electronic Engineering Kang, Qiyu Tay, Wee Peng She, Rui Wang, Sijie Liu, Xiaoqian Yang, Yuan-Rui
format	Article
author	Kang, Qiyu Tay, Wee Peng She, Rui Wang, Sijie Liu, Xiaoqian Yang, Yuan-Rui
author_sort	Kang, Qiyu
title	Multi-armed linear bandits with latent biases
title_short	Multi-armed linear bandits with latent biases
title_full	Multi-armed linear bandits with latent biases
title_fullStr	Multi-armed linear bandits with latent biases
title_full_unstemmed	Multi-armed linear bandits with latent biases
title_sort	multi-armed linear bandits with latent biases
publishDate	2024
url	https://hdl.handle.net/10356/175416
_version_	1814047090047713280

Multi-armed linear bandits with latent biases

Similar Items