Towards characterization and detection of collusive reputation fraud in crowdsourced opinions

Crowdsourced opinion portals allow people to share experiences and assessments about products or services by posting reviews online, and have become an informative source for consumers to make purchase decisions. This, however, endows the generated crowdsourced opinions with the ability to affect th...

Full description

Saved in:
Bibliographic Details
Main Author: Xu, Chang
Other Authors: Zhang Jie
Format: Theses and Dissertations
Language:English
Published: 2017
Subjects:
Online Access:http://hdl.handle.net/10356/69708
Tags: Add Tag
No Tags, Be the first to tag this record!
Institution: Nanyang Technological University
Language: English
Description
Summary:Crowdsourced opinion portals allow people to share experiences and assessments about products or services by posting reviews online, and have become an informative source for consumers to make purchase decisions. This, however, endows the generated crowdsourced opinions with the ability to affect the profitability of involved businesses, with positive reviews building up a business's reputation to bring increase in sales and negative reviews damaging the public image to cause business failure. Unfortunately, such monetary incentive has spurred the rise of an underground market, where spammers are hired to manipulate (e.g., promote or degrade) a business's reputation by crafting fake, misleading reviews. Furthermore, this blackhat practice is increasing in sophistication by launching coordinated campaigns to produce reviews in collusion, which is considered much more harmful and difficult to detect due to the ability to dominate the overall sentiments quickly while dilute suspicious footprints through workload distribution. To address the collusive reputation fraud problem, the first objective of this thesis is to understand the behaviors of collusive spammers or we call colluders, through characterizing their inner structures in spam campaigns. Moreover, to eventually diminish the adversarial impact, the second objective is to design effective and efficient algorithms for collusion detection, such that the generated fake opinions can be accurately and timely removed from the system. Firstly, a novel collusion pattern is disclosed where multiple tiny spammer groups cooperate with each other on partially overlapping targets. To handle this new attack vector, two graph-based classification algorithms are proposed to spot colluders in supervised settings. These algorithms are designed to capture colluders' correlations by defining neighborhoods for users, such that the detection can benefit from the information collected from the neighborhood. Experimental results have shown that both algorithms outperform existing detectors that do not utilize relations between colluders. Secondly, to further study the underlying colluder correlations, a suite of pairwise collusive behavior measures are proposed and leveraged for detection. Compared to existing pointwise measures, pairwise measures are more fine-grained to directly reflect the relations between colluders. Moreover, a novel random walk-based detection framework \textsc{FraudInformer} is proposed to cooperate with the pairwise measures, which can work in unsupervised settings where no prior knowledge of collusion instances is needed. Extensive experiments have been conducted to evaluate the effectiveness of the proposed pairwise measures and detection framework. Thirdly, to remedy the deficiency of existing unsupervised detectors in collusion prediction, a novel statistical framework is proposed to perform both collusion inference and prediction together. The key perspective is to take a hybrid generative and discriminative probabilistic approach where both generative and discriminative processes can benefit from each other when performing collusion inference and prediction in unsupervised settings. In addition, a suite of homogeneity-based collusive behavior measures is proposed to capture the connections between colluders by measuring their intra-group similarities. Experiments on two real-world datasets have demonstrated the effectiveness of the proposed method and its improvements in learning and predictive abilities. Finally, an incremental detection framework FraudScan is proposed to counter collusive reputation fraud on temporal dimension, so as to quickly respond to newly-emerging campaign activities. The task of online reputation fraud campaign detection is formally defined. Then, a unified and scalable optimization framework is proposed which can adapt built detection model to emerging fraud patterns over time. Empirical analysis on real data has shown such a temporal-oriented detection paradigm can significantly increase the accuracy of fraud campaign detection. In summary, the proposed collusive behavior measures in this thesis aim to characterize the behaviors of colluders in reputation fraud campaigns, providing insights towards the development of algorithms for filtering collusive reputation fraud in crowdsourced opinions. The proposed detection frameworks can perform both collusion prediction and inference in supervised and unsupervised settings, and are designed to work in batch and incremental fashion for different detection scenarios.