Fraudulent click detection

Online advertising has been popular with the emergence of Internet and the pay-per-click advertising model was one of the popular advertising models, however it poses problems to advertisers because of fraudulent clicks. It is a difficult and time-consuming task to identify fraudulent clicks manuall...

Full description

Saved in:
Bibliographic Details
Main Author: Kang, Eileen Mun Yee.
Other Authors: Chan Syin
Format: Final Year Project
Language:English
Published: 2013
Subjects:
Online Access:http://hdl.handle.net/10356/52310
Tags: Add Tag
No Tags, Be the first to tag this record!
Institution: Nanyang Technological University
Language: English
Description
Summary:Online advertising has been popular with the emergence of Internet and the pay-per-click advertising model was one of the popular advertising models, however it poses problems to advertisers because of fraudulent clicks. It is a difficult and time-consuming task to identify fraudulent clicks manually. In this project, we try to solve this problem by using machine learning techniques. As the neural network is capable of solving classification problem, in this project, we study the features used in multilayer perceptron for fraudulent click detection and observed patterns in the clicks recorded. Three experiments were conducted in this project and their results were recorded. The first experiment was to use raw input data as the features, the second experiment was to construct new features based on the given set of features and finally the last experiment looks into the suitable features. Through the results of first and second experiment, we realize the importance in using features that are more representative. The correlation of raw features was investigated in Experiment 1 and we found out that most of the features are almost uncorrelated with each other, and it is difficult to see patterns from the unprocessed data. Whereas in Experiment 2, features are created to capture characteristic of each publisher and we observed that fraudulent publishers tend to produce more clicks within certain time intervals, hence the result of Experiment 2 improved quite a bit from Experiment 1. Finally, the last experiment investigate the importance of feature selection in obtaining subset of features that are more representative and the result of this experiment has improved compared to Experiment 2. With the findings we obtained in this project, it could serve as a guide of what kinds of features could be used for fraudulent click detection and some other ideas on how could some other features be useful in building the fraudulent click detection system.