A critical study on data leakage in recommender system offline evaluation

Recommender models are hard to evaluate, particularly under offline setting. In this paper, we provide a comprehensive and critical analysis of the data leakage issue in recommender system offline evaluation. Data leakage is caused by not observing global timeline in evaluating recommenders, e.g....

Full description

Saved in:

Bibliographic Details
Main Authors:	Ji, Yitong, Sun, Aixin, Zhang, Jie, Li, Chenliang
Other Authors:	School of Computer Science and Engineering
Format:	Article
Language:	English
Published:	2023
Subjects:	Engineering::Computer science and engineering Data Mining Collaborative Filtering
Online Access:	https://hdl.handle.net/10356/170569
Tags:	Add Tag No Tags, Be the first to tag this record!
Institution:	Nanyang Technological University
Language:	English

id	sg-ntu-dr.10356-170569
record_format	dspace
spelling	sg-ntu-dr.10356-1705692023-09-19T06:44:36Z A critical study on data leakage in recommender system offline evaluation Ji, Yitong Sun, Aixin Zhang, Jie Li, Chenliang School of Computer Science and Engineering Engineering::Computer science and engineering Data Mining Collaborative Filtering Recommender models are hard to evaluate, particularly under offline setting. In this paper, we provide a comprehensive and critical analysis of the data leakage issue in recommender system offline evaluation. Data leakage is caused by not observing global timeline in evaluating recommenders, e.g., train/test data split does not follow global timeline. As a result, a model learns from the user-item interactions that are not expected to be available at prediction time. We first show the temporal dynamics of user-item interactions along global timeline, then explain why data leakage exists for collaborative filtering models. Through carefully designed experiments, we show that all models indeed recommend future items that are not available at the time point of a test instance, as the result of data leakage. The experiments are conducted with four widely used baseline models - BPR, NeuMF, SASRec, and LightGCN, on four popular offline datasets - MovieLens-25M, Yelp, Amazon-music, and Amazon-electronic, adopting leave-last-one-out data split. We further show that data leakage does impact models' recommendation accuracy. Their relative performance orders thus become unpredictable with different amount of leaked future data in training. To evaluate recommendation systems in a realistic manner in offline setting, we propose a timeline scheme, which calls for a revisit of the recommendation model design. 2023-09-19T06:44:36Z 2023-09-19T06:44:36Z 2023 Journal Article Ji, Y., Sun, A., Zhang, J. & Li, C. (2023). A critical study on data leakage in recommender system offline evaluation. ACM Trans. Inf. Syst. 41, 3 (2023), 75:1-75:27, 41(3), 1-27. https://dx.doi.org/10.1145/3569930 1046-8188 https://hdl.handle.net/10356/170569 10.1145/3569930 2-s2.0-85159681430 3 41 1 27 en ACM Trans. Inf. Syst. 41, 3 (2023), 75:1-75:27 © 2023 Association for Computing Machinery. All rights reserved.
institution	Nanyang Technological University
building	NTU Library
continent	Asia
country	Singapore Singapore
content_provider	NTU Library
collection	DR-NTU
language	English
topic	Engineering::Computer science and engineering Data Mining Collaborative Filtering
spellingShingle	Engineering::Computer science and engineering Data Mining Collaborative Filtering Ji, Yitong Sun, Aixin Zhang, Jie Li, Chenliang A critical study on data leakage in recommender system offline evaluation
description	Recommender models are hard to evaluate, particularly under offline setting. In this paper, we provide a comprehensive and critical analysis of the data leakage issue in recommender system offline evaluation. Data leakage is caused by not observing global timeline in evaluating recommenders, e.g., train/test data split does not follow global timeline. As a result, a model learns from the user-item interactions that are not expected to be available at prediction time. We first show the temporal dynamics of user-item interactions along global timeline, then explain why data leakage exists for collaborative filtering models. Through carefully designed experiments, we show that all models indeed recommend future items that are not available at the time point of a test instance, as the result of data leakage. The experiments are conducted with four widely used baseline models - BPR, NeuMF, SASRec, and LightGCN, on four popular offline datasets - MovieLens-25M, Yelp, Amazon-music, and Amazon-electronic, adopting leave-last-one-out data split. We further show that data leakage does impact models' recommendation accuracy. Their relative performance orders thus become unpredictable with different amount of leaked future data in training. To evaluate recommendation systems in a realistic manner in offline setting, we propose a timeline scheme, which calls for a revisit of the recommendation model design.
author2	School of Computer Science and Engineering
author_facet	School of Computer Science and Engineering Ji, Yitong Sun, Aixin Zhang, Jie Li, Chenliang
format	Article
author	Ji, Yitong Sun, Aixin Zhang, Jie Li, Chenliang
author_sort	Ji, Yitong
title	A critical study on data leakage in recommender system offline evaluation
title_short	A critical study on data leakage in recommender system offline evaluation
title_full	A critical study on data leakage in recommender system offline evaluation
title_fullStr	A critical study on data leakage in recommender system offline evaluation
title_full_unstemmed	A critical study on data leakage in recommender system offline evaluation
title_sort	critical study on data leakage in recommender system offline evaluation
publishDate	2023
url	https://hdl.handle.net/10356/170569
_version_	1779156271266529280

A critical study on data leakage in recommender system offline evaluation

Similar Items