A holistic approach to trust and reputation management in big data

Big Data is becoming an increasingly important part of everyday life. In this dissertation we consider two types of Big Data, in particular user-generated data and IoT-generated data. We refer to the Big Data generated by users of online services as user-generated data . Ratings submitted by users f...

Full description

Saved in:
Bibliographic Details
Main Author: Leonit Zeynalvand
Other Authors: Zhang, Jie
Format: Thesis-Doctor of Philosophy
Language:English
Published: Nanyang Technological University 2021
Subjects:
Online Access:https://hdl.handle.net/10356/146956
Tags: Add Tag
No Tags, Be the first to tag this record!
Institution: Nanyang Technological University
Language: English
Description
Summary:Big Data is becoming an increasingly important part of everyday life. In this dissertation we consider two types of Big Data, in particular user-generated data and IoT-generated data. We refer to the Big Data generated by users of online services as user-generated data . Ratings submitted by users for online purchases are examples of user-generated data. We refer to the Big Data generated by devices in the Internet of Things (IoT) as IoT-generated data. The key difference between these two types of Big Data is the density of data. Density here means with what frequency an entity perceives information about another entity. Hence, user-generated data is sparse while IoT-generated data is dense due to the higher frequency of interactions between IoT entities compared to that of social entities. On one hand a sparse dataset results in less accurate knowledge extraction compared to a dense dataset. On the other hand a dense dataset risks more privacy exposure compared to a sparse dataset. Regardless of the sparsity, a dataset may contain subjective information which affects the knowledge extraction. Common examples of knowledge extraction are trust and reputation management (TRM) systems. This dissertation addresses issues within TRM systems in the emerging Big Data environments. Particularly, issues concerning the subjectivity of the evidence space in such environments, the density of IoT-generated data, and the sparsity of user-generated data are identified and addressed. First we address subjectivity as an important issue in TRM for Big Data. Subjectivity means that the information provided by each user, represented by an agent, is influenced by the user's individual preference, which can be misleading in trust evaluation. In this dissertation, we seek to align the potentially subjective information with the information seeker's own subjectivity so that the acquired second-hand information is more useful and personalized. Accordingly, we propose a multi-agent subjectivity alignment (MASA) mechanism, which models the subjectivity using a regression technique and exchanges the models among agents as the input to an alignment process. This mechanism substantially counteracts biases incurred by different agents and improves the accuracy of second-hand information fusion as demonstrated by our simulations. In addition, we also conduct experiments using a real-world dataset which further validates the efficacy of MASA. Second, we explain how TRM plays an increasingly important role in large-scale online environments with dense evidence space such as multi-agent systems (MAS) and the IoT. One main objective of TRM is to achieve accurate trust assessment of entities such as agents or IoT service providers. However, this encounters an accuracy-privacy dilemma when the evidence space is dense as we identify in this dissertation. Thus, we propose a framework called Context-aware Bernoulli Neural Network based Reputation Assessment (COBRA) to address this challenge. COBRA encapsulates agent interactions or transactions, which are prone to privacy leak, in machine learning models, and aggregates multiple such models using a Bernoulli neural network to predict a trust score for an agent. COBRA preserves agent privacy and retains interaction contexts via the machine learning models, and achieves more accurate trust prediction than a fully-connected neural network alternative. COBRA is also robust to security attacks by agents who inject fake machine learning models; notably, it is resistant to the 51-percent attack. The performance of COBRA is validated by our experiments using a real dataset, and by our simulations, where we also show that COBRA outperforms other state-of-the-art TRM systems. Third, we elaborate on how user-generated data with the prevalence of e-commerce applications poses new security challenges that render traditional TRM approaches ineffective, especially in mitigating and discouraging trust attacks such as whitewashing and Sybil attacks. Specifically, there are three challenges. First, user-generated data is increasingly becoming sparse, making the derived trust models not robust to attacks. Second, the cost of attacks has significantly decreased over time due to the widespread presence of bots in e-commerce applications, and thus the traditional assumption of majority users being honest often no longer holds. This also further exacerbates the previous challenge. Third, e-commerce applications involve publicity (user-generated data originating from influencers and paid users), which is not formulated in existing trust models. In this dissertation, we propose a new TRM framework called BEQA. It uses Blockchain to transform multiple disjoint and sparse sets of user-generated data into a single dense-enough dataset, and formulates the cost of Sybil attacks using Blockchain transaction fees. In particular, publicity is formulated as a whitewashing deposit such that a higher level of publicity will impose higher cost on the Sybil attack. To evaluate the performance of BEQA, we conduct experiments using three real datasets, and additional simulations for more extensive scenarios. Our results show that BEQA outperforms the state-of-the-art TRM models. For instance, BEQA yields more accurate trustworthiness assessment and more rapid mitigation on Sybil attacks. To summarize, addressing these three issues enables us to take a holistic approach towards trustworthiness in Big Data. Particularly, issues concerning the subjectivity, density, and sparsity of the evidence space do not necessarily co-exist in one Big Data environment. Hence, a one-fits-all TRM solution is often not viable in Big Data. Instead, in this dissertation we propose a holistic collection of effective solutions which address such issues.