Detecting Extreme Rank Anomalous Collections

Anomaly or outlier detection has a wide range of applications, including fraud and spam detection. Most existing studies focus on detecting point anomalies, i.e., individual, isolated entities. However, there is an increasing number of applications in which anomalies do not occur individually, but i...

Full description

Saved in:
Bibliographic Details
Main Authors: DAI, Hanbo, ZHU, Feida, LIM, Ee-Peng, PANG, Hwee Hwa
Format: text
Language:English
Published: Institutional Knowledge at Singapore Management University 2012
Subjects:
Online Access:https://ink.library.smu.edu.sg/sis_research/2870
https://ink.library.smu.edu.sg/context/sis_research/article/3870/viewcontent/ERAC_SDM_cr.pdf
Tags: Add Tag
No Tags, Be the first to tag this record!
Institution: Singapore Management University
Language: English
id sg-smu-ink.sis_research-3870
record_format dspace
spelling sg-smu-ink.sis_research-38702017-07-11T07:10:18Z Detecting Extreme Rank Anomalous Collections DAI, Hanbo ZHU, Feida LIM, Ee-Peng PANG, Hwee Hwa Anomaly or outlier detection has a wide range of applications, including fraud and spam detection. Most existing studies focus on detecting point anomalies, i.e., individual, isolated entities. However, there is an increasing number of applications in which anomalies do not occur individually, but in small collections. Unlike the majority, entities in an anomalous collection tend to share certain extreme behavioral traits. The knowledge essential in understanding why and how the set of entities becomes outliers would only be revealed by examining at the collection level. A good example is web spammers adopting common spamming techniques. To discover this kind of anomalous collections, we introduce a novel definition of anomaly, called Extreme Rank Anomalous Collection. We propose a statistical model to quantify the anomalousness of such a collection, and present an exact as well as a heuristic algorithms for finding top-K extreme rank anomalous collections. We apply the algorithms on real Web spam data to detect spamming sites, and on IMDB data to detect unusual actor groups. Our algorithms achieve higher precisions compared to existing spam and anomaly detection methods. More importantly, our approach succeeds in finding meaningful anomalous collections in both datasets. 2012-04-01T07:00:00Z text application/pdf https://ink.library.smu.edu.sg/sis_research/2870 info:doi/10.1137/1.9781611972825.76 https://ink.library.smu.edu.sg/context/sis_research/article/3870/viewcontent/ERAC_SDM_cr.pdf http://creativecommons.org/licenses/by-nc-nd/4.0/ Research Collection School Of Computing and Information Systems eng Institutional Knowledge at Singapore Management University Computer Sciences Databases and Information Systems
institution Singapore Management University
building SMU Libraries
continent Asia
country Singapore
Singapore
content_provider SMU Libraries
collection InK@SMU
language English
topic Computer Sciences
Databases and Information Systems
spellingShingle Computer Sciences
Databases and Information Systems
DAI, Hanbo
ZHU, Feida
LIM, Ee-Peng
PANG, Hwee Hwa
Detecting Extreme Rank Anomalous Collections
description Anomaly or outlier detection has a wide range of applications, including fraud and spam detection. Most existing studies focus on detecting point anomalies, i.e., individual, isolated entities. However, there is an increasing number of applications in which anomalies do not occur individually, but in small collections. Unlike the majority, entities in an anomalous collection tend to share certain extreme behavioral traits. The knowledge essential in understanding why and how the set of entities becomes outliers would only be revealed by examining at the collection level. A good example is web spammers adopting common spamming techniques. To discover this kind of anomalous collections, we introduce a novel definition of anomaly, called Extreme Rank Anomalous Collection. We propose a statistical model to quantify the anomalousness of such a collection, and present an exact as well as a heuristic algorithms for finding top-K extreme rank anomalous collections. We apply the algorithms on real Web spam data to detect spamming sites, and on IMDB data to detect unusual actor groups. Our algorithms achieve higher precisions compared to existing spam and anomaly detection methods. More importantly, our approach succeeds in finding meaningful anomalous collections in both datasets.
format text
author DAI, Hanbo
ZHU, Feida
LIM, Ee-Peng
PANG, Hwee Hwa
author_facet DAI, Hanbo
ZHU, Feida
LIM, Ee-Peng
PANG, Hwee Hwa
author_sort DAI, Hanbo
title Detecting Extreme Rank Anomalous Collections
title_short Detecting Extreme Rank Anomalous Collections
title_full Detecting Extreme Rank Anomalous Collections
title_fullStr Detecting Extreme Rank Anomalous Collections
title_full_unstemmed Detecting Extreme Rank Anomalous Collections
title_sort detecting extreme rank anomalous collections
publisher Institutional Knowledge at Singapore Management University
publishDate 2012
url https://ink.library.smu.edu.sg/sis_research/2870
https://ink.library.smu.edu.sg/context/sis_research/article/3870/viewcontent/ERAC_SDM_cr.pdf
_version_ 1770572659866730496