Outlier detection

Outlier detection aims to capture or identify uncommon events or instances. This technique has been widely used in applications such as fraud detection, image processing and bioinformatics. Because of its diverse usage, outlier detection has emerged as a vibrant research topic in the fields of data...

Full description

Saved in:
Bibliographic Details
Main Author: Li, Shukai
Other Authors: Ng Wee Keong
Format: Thesis-Doctor of Philosophy
Language:English
Published: Nanyang Technological University 2013
Subjects:
Online Access:http://hdl.handle.net/10356/52515
Tags: Add Tag
No Tags, Be the first to tag this record!
Institution: Nanyang Technological University
Language: English
Description
Summary:Outlier detection aims to capture or identify uncommon events or instances. This technique has been widely used in applications such as fraud detection, image processing and bioinformatics. Because of its diverse usage, outlier detection has emerged as a vibrant research topic in the fields of data mining, machine learning and statistics. In this thesis, we investigate four different kinds of outlier detection problems. Amongst them, unsupervised outlier detection has been the most popular, while relative outlier detection has attracted increasing attention in recent years. Thus, our research will focus on these two classes of outlier detection problems. Unsupervised outlier detection methods are used when there are no labeled patterns. For this kind of problems, we propose a Maximum Margin Criterion to segregate the unknown outliers from the normal patterns in a given set of samples. However, the corresponding learning task is formulated as a Mixed Integer Programming (MIP) problem, which is computationally hard. To address this issue, we adopt a recently developed label generating technique to efficiently solve a convex relaxation of the MIP problem for outlier detection. Specifically, we propose an effective procedure of successive approximation to find a largely violated labeling vector for identifying the outliers from the normal patterns. The convergence of such a procedure has also been established and presented. Subsequently, a set of largely violated labeling vectors are combined via multiple kernel learning methods to robustly detect the outliers. To further enhance the efficacy of our outlier detector, we also explore the use of the Maximum Volume Criterion to measure the quality of separation between the outliers and the normal patterns. This criterion can be easily incorporated into our proposed model by introducing an additional regularization term. The efforts culminate to two novel outlier detection models named Maximum Margin Outlier Detection (MMOD) and Maximum Volume Outlier Detection (MVOD) respectively.