Exploiting ratings and trust to resolve the data sparsity and cold start of recommender systems

Collaborative filtering (CF) is a widely used technique for recommender systems. The essential principle is that users with similar preference in the past are likely to give similar ratings on the items of interest in the future. However, collaborative filtering inherently suffers from two severe issu...

Full description

Saved in:
Bibliographic Details
Main Author: Guo, Guibing
Other Authors: Jie Zhang
Format: Theses and Dissertations
Language:English
Published: 2015
Subjects:
Online Access:https://hdl.handle.net/10356/64555
Tags: Add Tag
No Tags, Be the first to tag this record!
Institution: Nanyang Technological University
Language: English
Description
Summary:Collaborative filtering (CF) is a widely used technique for recommender systems. The essential principle is that users with similar preference in the past are likely to give similar ratings on the items of interest in the future. However, collaborative filtering inherently suffers from two severe issues: data sparsity and cold start. The former issue refers to the difficulty in finding sufficient and reliable similar users, given that users generally rate only a small portion of items, while the latter issue refers to the difficulty presented by the cold-start users who rate zero or only a few items. Both issues are due to a lack of user ratings, and severely prevent recommender systems from generating accurate and personalized recommendations. To help resolve these issues, we have worked on two lines of research in this thesis by exploiting the value of both ratings and trust. Firstly, we propose two approaches to leverage user ratings for recommender systems. The first approach is to design a Bayesian similarity measure based on Bayesian inference, taking into consideration both the direction and length of rating vectors. We posit that not all the rating pairs should be equally counted in order to accurately model user correlation. Three different evidence factors are designed to compute the importance weights of rating pairs. Further, our principled method reduces the correlation due to chance and potential system bias. Experimental results on six real-world data sets show that our approach achieves superior accuracy in comparison with other counterparts. This method aims to make better use of existing user ratings. Secondly, we propose a new information source for recommender systems, called prior ratings. Prior ratings are based on users’ experiences of virtual products represented in a mediated environment, and they can be submitted prior to purchase. A conceptual model of prior ratings is proposed, integrating the environmental factor presence whose effects on product evaluation have not been studied previously. A user study conducted in website and virtual store modalities demonstrates the validity of the conceptual model, in that users are more willing and confident to provide prior ratings in virtual environments. A method is proposed to show how to leverage prior ratings in collaborative filtering. Experimental results indicate the effectiveness of prior ratings for recommender systems. By eliciting more kinds of user ratings, user preference can be better modelled and thus recommendations are improved. The second research line is to adopt additional trust information to help model user preferences. In this thesis, we propose two approaches including one memory-based and one model-based. Firstly, the ratings of trusted neighbors are merged together to generate a new and more complete rating profile for the active users (who seek recommendations). Based on the new rating profile, a CF technique can be applied to find more reliable similar users, and thus recommendations can be better generated with higher accuracy and coverage. This strategy is especially useful for the cold-start users as their preference is approximated by the trusted neighbors. The underlying assumption is that trust and similarity are strongly and positively correlated which has been justified to be generally true in the literature. This strategy is applied and evaluated in three real-world data sets. Experimental results show that our approach can effectively cope with the concerned issues both in accuracy and coverage relative to other counterparts. The main strength of this memory-based approach is that user preferences can be complemented and derived by the ratings of trusted neighbors. Secondly, we focus on how to take better advantage of social trust in a matrix factorization model. Although a number of trust-based recommendation models have been proposed in the literature, even the state-of-the-art trust-based models can be inferior to other well-performing ratings-only recommendation methods. By analyzing the social trust data from four real-world data sets, we conclude that not only the explicit but also the implicit influence of both ratings and trust should be taken into consideration in a recommendation model. Hence, we build on top of a state-of-the-art recommendation algorithm SVD++ which inherently involves the explicit and implicit influence of rated items, by further incorporating both the explicit and implicit influence of trusted users on the prediction of items for an active user. To our knowledge, the work reported is the first to extend SVD++ with social trust information. Experimental results on the four real-world data sets demonstrate that our approach TrustSVD achieves better accuracy than other both trust-based and ratings-only counterparts (ten in total), and can better handle the concerned issues. To summarize, we have proposed four different approaches to exploit the value of ratings and trust in order to cope with the problems of data sparsity and cold start, improving the recommendation performance in terms of predictive accuracy and coverage.