Detecting Extreme Rank Anomalous Collections
Anomaly or outlier detection has a wide range of applications, including fraud and spam detection. Most existing studies focus on detecting point anomalies, i.e., individual, isolated entities. However, there is an increasing number of applications in which anomalies do not occur individually, but i...
Saved in:
Main Authors: | , , , |
---|---|
Format: | text |
Language: | English |
Published: |
Institutional Knowledge at Singapore Management University
2012
|
Subjects: | |
Online Access: | https://ink.library.smu.edu.sg/sis_research/2870 https://ink.library.smu.edu.sg/context/sis_research/article/3870/viewcontent/ERAC_SDM_cr.pdf |
Tags: |
Add Tag
No Tags, Be the first to tag this record!
|
Institution: | Singapore Management University |
Language: | English |
id |
sg-smu-ink.sis_research-3870 |
---|---|
record_format |
dspace |
spelling |
sg-smu-ink.sis_research-38702017-07-11T07:10:18Z Detecting Extreme Rank Anomalous Collections DAI, Hanbo ZHU, Feida LIM, Ee-Peng PANG, Hwee Hwa Anomaly or outlier detection has a wide range of applications, including fraud and spam detection. Most existing studies focus on detecting point anomalies, i.e., individual, isolated entities. However, there is an increasing number of applications in which anomalies do not occur individually, but in small collections. Unlike the majority, entities in an anomalous collection tend to share certain extreme behavioral traits. The knowledge essential in understanding why and how the set of entities becomes outliers would only be revealed by examining at the collection level. A good example is web spammers adopting common spamming techniques. To discover this kind of anomalous collections, we introduce a novel definition of anomaly, called Extreme Rank Anomalous Collection. We propose a statistical model to quantify the anomalousness of such a collection, and present an exact as well as a heuristic algorithms for finding top-K extreme rank anomalous collections. We apply the algorithms on real Web spam data to detect spamming sites, and on IMDB data to detect unusual actor groups. Our algorithms achieve higher precisions compared to existing spam and anomaly detection methods. More importantly, our approach succeeds in finding meaningful anomalous collections in both datasets. 2012-04-01T07:00:00Z text application/pdf https://ink.library.smu.edu.sg/sis_research/2870 info:doi/10.1137/1.9781611972825.76 https://ink.library.smu.edu.sg/context/sis_research/article/3870/viewcontent/ERAC_SDM_cr.pdf http://creativecommons.org/licenses/by-nc-nd/4.0/ Research Collection School Of Computing and Information Systems eng Institutional Knowledge at Singapore Management University Computer Sciences Databases and Information Systems |
institution |
Singapore Management University |
building |
SMU Libraries |
continent |
Asia |
country |
Singapore Singapore |
content_provider |
SMU Libraries |
collection |
InK@SMU |
language |
English |
topic |
Computer Sciences Databases and Information Systems |
spellingShingle |
Computer Sciences Databases and Information Systems DAI, Hanbo ZHU, Feida LIM, Ee-Peng PANG, Hwee Hwa Detecting Extreme Rank Anomalous Collections |
description |
Anomaly or outlier detection has a wide range of applications, including fraud and spam detection. Most existing studies focus on detecting point anomalies, i.e., individual, isolated entities. However, there is an increasing number of applications in which anomalies do not occur individually, but in small collections. Unlike the majority, entities in an anomalous collection tend to share certain extreme behavioral traits. The knowledge essential in understanding why and how the set of entities becomes outliers would only be revealed by examining at the collection level. A good example is web spammers adopting common spamming techniques. To discover this kind of anomalous collections, we introduce a novel definition of anomaly, called Extreme Rank Anomalous Collection. We propose a statistical model to quantify the anomalousness of such a collection, and present an exact as well as a heuristic algorithms for finding top-K extreme rank anomalous collections. We apply the algorithms on real Web spam data to detect spamming sites, and on IMDB data to detect unusual actor groups. Our algorithms achieve higher precisions compared to existing spam and anomaly detection methods. More importantly, our approach succeeds in finding meaningful anomalous collections in both datasets. |
format |
text |
author |
DAI, Hanbo ZHU, Feida LIM, Ee-Peng PANG, Hwee Hwa |
author_facet |
DAI, Hanbo ZHU, Feida LIM, Ee-Peng PANG, Hwee Hwa |
author_sort |
DAI, Hanbo |
title |
Detecting Extreme Rank Anomalous Collections |
title_short |
Detecting Extreme Rank Anomalous Collections |
title_full |
Detecting Extreme Rank Anomalous Collections |
title_fullStr |
Detecting Extreme Rank Anomalous Collections |
title_full_unstemmed |
Detecting Extreme Rank Anomalous Collections |
title_sort |
detecting extreme rank anomalous collections |
publisher |
Institutional Knowledge at Singapore Management University |
publishDate |
2012 |
url |
https://ink.library.smu.edu.sg/sis_research/2870 https://ink.library.smu.edu.sg/context/sis_research/article/3870/viewcontent/ERAC_SDM_cr.pdf |
_version_ |
1770572659866730496 |