Detecting Anomaly Collections using Extreme Feature Ranks

Detecting anomaly collections is an important task with many applications, including spam and fraud detection. In an anomaly collection, entities often operate in collusion and hold different agendas to normal entities. As a result, they usually manifest collective extreme traits, i.e., members of a...

Full description

Saved in:
Bibliographic Details
Main Authors: DAI, Hanbo, ZHU, Feida, LIM, Ee Peng, PANG, Hwee Hwa
Format: text
Language:English
Published: Institutional Knowledge at Singapore Management University 2014
Subjects:
Online Access:https://ink.library.smu.edu.sg/sis_research/2534
https://ink.library.smu.edu.sg/context/sis_research/article/3534/viewcontent/Detecting_Anomaly_Collections_using_Extreme_Feature_Ranks__edited_.pdf
Tags: Add Tag
No Tags, Be the first to tag this record!
Institution: Singapore Management University
Language: English
id sg-smu-ink.sis_research-3534
record_format dspace
spelling sg-smu-ink.sis_research-35342017-12-07T06:25:32Z Detecting Anomaly Collections using Extreme Feature Ranks DAI, Hanbo ZHU, Feida LIM, Ee Peng PANG, Hwee Hwa Detecting anomaly collections is an important task with many applications, including spam and fraud detection. In an anomaly collection, entities often operate in collusion and hold different agendas to normal entities. As a result, they usually manifest collective extreme traits, i.e., members of an anomaly collection are consistently clustered toward the top or bottom ranks on certain features. We therefore propose to detect these anomaly collections by extreme feature ranks. We introduce a novel anomaly definition called Extreme Rank Anomalous Collection or ERAC. We propose a new measure of anomalousness capturing collective extreme traits based on a statistical model. As there can be a large number of ERACs of various sizes, for simplicity, we first investigate the ERAC detection problem of finding top-KERACs of a predefined size limit. We then tackle the follow-up ERAC expansion problem of uncovering the supersets of the detected ERACs that are more anomalous without any size constraint. Algorithms are proposed for both ERAC detection and expansion problems, followed by studies of their performance in four datasets. Specifically, in synthetic datasets, both ERAC detection and expansion algorithms demonstrate high precisions and recalls. In a web spam dataset, both ERAC detection and expansion algorithms discover web spammers with higher precisions than existing approaches. In an IMDB dataset, both ERAC detection and expansion algorithms identify unusual actor collections that are not easily identified by clustering-based methods. In a Chinese online forum dataset, our ERAC detection algorithm identifies suspicious “water army” spammer collections agreed by human evaluators. ERAC expansion algorithm successfully reveals two larger spammer collections with different spamming behaviors. 2014-05-01T07:00:00Z text application/pdf https://ink.library.smu.edu.sg/sis_research/2534 info:doi/10.1007/s10618-014-0360-3 https://ink.library.smu.edu.sg/context/sis_research/article/3534/viewcontent/Detecting_Anomaly_Collections_using_Extreme_Feature_Ranks__edited_.pdf http://creativecommons.org/licenses/by-nc-nd/4.0/ Research Collection School Of Computing and Information Systems eng Institutional Knowledge at Singapore Management University Anomaly collection Extreme feature rank Anomaly cluster Outlier group Spam detection Spam cluster Computer Sciences Databases and Information Systems
institution Singapore Management University
building SMU Libraries
continent Asia
country Singapore
Singapore
content_provider SMU Libraries
collection InK@SMU
language English
topic Anomaly collection
Extreme feature rank
Anomaly cluster
Outlier group
Spam detection
Spam cluster
Computer Sciences
Databases and Information Systems
spellingShingle Anomaly collection
Extreme feature rank
Anomaly cluster
Outlier group
Spam detection
Spam cluster
Computer Sciences
Databases and Information Systems
DAI, Hanbo
ZHU, Feida
LIM, Ee Peng
PANG, Hwee Hwa
Detecting Anomaly Collections using Extreme Feature Ranks
description Detecting anomaly collections is an important task with many applications, including spam and fraud detection. In an anomaly collection, entities often operate in collusion and hold different agendas to normal entities. As a result, they usually manifest collective extreme traits, i.e., members of an anomaly collection are consistently clustered toward the top or bottom ranks on certain features. We therefore propose to detect these anomaly collections by extreme feature ranks. We introduce a novel anomaly definition called Extreme Rank Anomalous Collection or ERAC. We propose a new measure of anomalousness capturing collective extreme traits based on a statistical model. As there can be a large number of ERACs of various sizes, for simplicity, we first investigate the ERAC detection problem of finding top-KERACs of a predefined size limit. We then tackle the follow-up ERAC expansion problem of uncovering the supersets of the detected ERACs that are more anomalous without any size constraint. Algorithms are proposed for both ERAC detection and expansion problems, followed by studies of their performance in four datasets. Specifically, in synthetic datasets, both ERAC detection and expansion algorithms demonstrate high precisions and recalls. In a web spam dataset, both ERAC detection and expansion algorithms discover web spammers with higher precisions than existing approaches. In an IMDB dataset, both ERAC detection and expansion algorithms identify unusual actor collections that are not easily identified by clustering-based methods. In a Chinese online forum dataset, our ERAC detection algorithm identifies suspicious “water army” spammer collections agreed by human evaluators. ERAC expansion algorithm successfully reveals two larger spammer collections with different spamming behaviors.
format text
author DAI, Hanbo
ZHU, Feida
LIM, Ee Peng
PANG, Hwee Hwa
author_facet DAI, Hanbo
ZHU, Feida
LIM, Ee Peng
PANG, Hwee Hwa
author_sort DAI, Hanbo
title Detecting Anomaly Collections using Extreme Feature Ranks
title_short Detecting Anomaly Collections using Extreme Feature Ranks
title_full Detecting Anomaly Collections using Extreme Feature Ranks
title_fullStr Detecting Anomaly Collections using Extreme Feature Ranks
title_full_unstemmed Detecting Anomaly Collections using Extreme Feature Ranks
title_sort detecting anomaly collections using extreme feature ranks
publisher Institutional Knowledge at Singapore Management University
publishDate 2014
url https://ink.library.smu.edu.sg/sis_research/2534
https://ink.library.smu.edu.sg/context/sis_research/article/3534/viewcontent/Detecting_Anomaly_Collections_using_Extreme_Feature_Ranks__edited_.pdf
_version_ 1770572516741349376