Variational learning from implicit bandit feedback

Recommendations are prevalent in Web applications (e.g., search ranking, item recommendation, advertisement placement). Learning from bandit feedback is challenging due to the sparsity of feedback limited to system-provided actions. In this work, we focus on batch learning from logs of recommender s...

Full description

Saved in:

Bibliographic Details
Main Authors:	TRUONG, Quoc Tuan, LAUW, Hady W.
Format:	text
Language:	English
Published:	Institutional Knowledge at Singapore Management University 2021
Subjects:	Variational learning Bandit feedback Recommender systems Computational advertising Databases and Information Systems Data Science
Online Access:	https://ink.library.smu.edu.sg/sis_research/6431 https://ink.library.smu.edu.sg/context/sis_research/article/7434/viewcontent/ml21.pdf
Tags:	Add Tag No Tags, Be the first to tag this record!
Institution:	Singapore Management University
Language:	English

id	sg-smu-ink.sis_research-7434
record_format	dspace
spelling	sg-smu-ink.sis_research-74342021-12-14T05:34:48Z Variational learning from implicit bandit feedback TRUONG, Quoc Tuan LAUW, Hady W. Recommendations are prevalent in Web applications (e.g., search ranking, item recommendation, advertisement placement). Learning from bandit feedback is challenging due to the sparsity of feedback limited to system-provided actions. In this work, we focus on batch learning from logs of recommender systems involving both bandit and organic feedbacks. We develop a probabilistic framework with a likelihood function for estimating not only explicit positive observations but also implicit negative observations inferred from the data. Moreover, we introduce a latent variable model for organic-bandit feedbacks to robustly capture user preference distributions. Next, we analyze the behavior of the new likelihood under two scenarios, i.e., with and without counterfactual re-weighting. For speedier item ranking, we further investigate the possibility of using Maximum-a-Posteriori (MAP) estimate instead of Monte Carlo (MC)-based approximation for prediction. Experiments on both real datasets as well as data from a simulation environment show substantial performance improvements over comparable baselines. 2021-07-01T07:00:00Z text application/pdf https://ink.library.smu.edu.sg/sis_research/6431 info:doi/10.1007/s10994-021-06028-0 https://ink.library.smu.edu.sg/context/sis_research/article/7434/viewcontent/ml21.pdf http://creativecommons.org/licenses/by-nc-nd/4.0/ Research Collection School Of Computing and Information Systems eng Institutional Knowledge at Singapore Management University Variational learning Bandit feedback Recommender systems Computational advertising Databases and Information Systems Data Science
institution	Singapore Management University
building	SMU Libraries
continent	Asia
country	Singapore Singapore
content_provider	SMU Libraries
collection	InK@SMU
language	English
topic	Variational learning Bandit feedback Recommender systems Computational advertising Databases and Information Systems Data Science
spellingShingle	Variational learning Bandit feedback Recommender systems Computational advertising Databases and Information Systems Data Science TRUONG, Quoc Tuan LAUW, Hady W. Variational learning from implicit bandit feedback
description	Recommendations are prevalent in Web applications (e.g., search ranking, item recommendation, advertisement placement). Learning from bandit feedback is challenging due to the sparsity of feedback limited to system-provided actions. In this work, we focus on batch learning from logs of recommender systems involving both bandit and organic feedbacks. We develop a probabilistic framework with a likelihood function for estimating not only explicit positive observations but also implicit negative observations inferred from the data. Moreover, we introduce a latent variable model for organic-bandit feedbacks to robustly capture user preference distributions. Next, we analyze the behavior of the new likelihood under two scenarios, i.e., with and without counterfactual re-weighting. For speedier item ranking, we further investigate the possibility of using Maximum-a-Posteriori (MAP) estimate instead of Monte Carlo (MC)-based approximation for prediction. Experiments on both real datasets as well as data from a simulation environment show substantial performance improvements over comparable baselines.
format	text
author	TRUONG, Quoc Tuan LAUW, Hady W.
author_facet	TRUONG, Quoc Tuan LAUW, Hady W.
author_sort	TRUONG, Quoc Tuan
title	Variational learning from implicit bandit feedback
title_short	Variational learning from implicit bandit feedback
title_full	Variational learning from implicit bandit feedback
title_fullStr	Variational learning from implicit bandit feedback
title_full_unstemmed	Variational learning from implicit bandit feedback
title_sort	variational learning from implicit bandit feedback
publisher	Institutional Knowledge at Singapore Management University
publishDate	2021
url	https://ink.library.smu.edu.sg/sis_research/6431 https://ink.library.smu.edu.sg/context/sis_research/article/7434/viewcontent/ml21.pdf
_version_	1770575959283466240

Variational learning from implicit bandit feedback

Similar Items