Positive and unlabeled learning for anomaly detection

Anomaly detection is of great interest to big data applications but still remains a challenging problem for machine learning-based methods. For unsupervised learning, the performance may not be satisfactory due to the lack of label information while for supervised learning, it is difficult to acquir...

Full description

Saved in:
Bibliographic Details
Main Author: Zhang, Jiaqi
Other Authors: Tan Yap Peng
Format: Theses and Dissertations
Language:English
Published: 2018
Subjects:
Online Access:http://hdl.handle.net/10356/75883
Tags: Add Tag
No Tags, Be the first to tag this record!
Institution: Nanyang Technological University
Language: English
Description
Summary:Anomaly detection is of great interest to big data applications but still remains a challenging problem for machine learning-based methods. For unsupervised learning, the performance may not be satisfactory due to the lack of label information while for supervised learning, it is difficult to acquire labeled anomaly data for training which is usually rare and diversely distributed. To address the challenge, we propose a hybrid solution by applying Positive and Unlabeled (PU) Learning for anomaly detection problem. As a semi-supervised method, only normal (positive) data and unlabeled data (could be positive or negative) are required by the proposed method for anomaly detection. We start by using a linear model to extract the most reliable negative instances followed by an iterative self-learning process to update the classifier with different speeds based on the estimated positive class prior. Our proposed method is verified on several benchmark datasets and outperforms existing methods under different experiment settings.