Mining coherent anomaly collections on web data
The recent boom of weblogs and social media has attached increasing importance to the identification of suspicious users with unusual behavior, such as spammers or fraudulent reviewers. A typical spamming strategy is to employ multiple dummy accounts to collectively promote a target, be it a URL or...
Saved in:
Main Authors: | , , , |
---|---|
Format: | text |
Language: | English |
Published: |
Institutional Knowledge at Singapore Management University
2012
|
Subjects: | |
Online Access: | https://ink.library.smu.edu.sg/sis_research/2869 https://ink.library.smu.edu.sg/context/sis_research/article/3869/viewcontent/MiningCoherentAnomalyCollections_2012_CIKM.pdf |
Tags: |
Add Tag
No Tags, Be the first to tag this record!
|
Institution: | Singapore Management University |
Language: | English |
id |
sg-smu-ink.sis_research-3869 |
---|---|
record_format |
dspace |
spelling |
sg-smu-ink.sis_research-38692018-06-19T06:26:05Z Mining coherent anomaly collections on web data DAI, Hanbo ZHU, Feida Ee-peng LIM, Hwee Hwa PANG, The recent boom of weblogs and social media has attached increasing importance to the identification of suspicious users with unusual behavior, such as spammers or fraudulent reviewers. A typical spamming strategy is to employ multiple dummy accounts to collectively promote a target, be it a URL or a product. Consequently, these suspicious accounts exhibit certain coherent anomalous behavior identifiable as a collection. In this paper, we propose the concept of Coherent Anomaly Collection (CAC) to capture this kind of collections, and put forward an efficient algorithm to simultaneously find the top-K disjoint CACs together with their anomalous behavior patterns. Compared with existing approaches, our new algorithm can find disjoint anomaly collections with coherent extreme behavior without having to specify either their number or sizes. Results on real Twitter data show that our approach discovers meaningful and informative hashtag spammer groups of various sizes which are hard to detect by clustering-based methods. 2012-11-01T07:00:00Z text application/pdf https://ink.library.smu.edu.sg/sis_research/2869 info:doi/10.1145/2396761.2398472 https://ink.library.smu.edu.sg/context/sis_research/article/3869/viewcontent/MiningCoherentAnomalyCollections_2012_CIKM.pdf http://creativecommons.org/licenses/by-nc-nd/4.0/ Research Collection School Of Computing and Information Systems eng Institutional Knowledge at Singapore Management University Anomaly/outlier detection Anomaly collection/cluster Computer Sciences Databases and Information Systems |
institution |
Singapore Management University |
building |
SMU Libraries |
continent |
Asia |
country |
Singapore Singapore |
content_provider |
SMU Libraries |
collection |
InK@SMU |
language |
English |
topic |
Anomaly/outlier detection Anomaly collection/cluster Computer Sciences Databases and Information Systems |
spellingShingle |
Anomaly/outlier detection Anomaly collection/cluster Computer Sciences Databases and Information Systems DAI, Hanbo ZHU, Feida Ee-peng LIM, Hwee Hwa PANG, Mining coherent anomaly collections on web data |
description |
The recent boom of weblogs and social media has attached increasing importance to the identification of suspicious users with unusual behavior, such as spammers or fraudulent reviewers. A typical spamming strategy is to employ multiple dummy accounts to collectively promote a target, be it a URL or a product. Consequently, these suspicious accounts exhibit certain coherent anomalous behavior identifiable as a collection. In this paper, we propose the concept of Coherent Anomaly Collection (CAC) to capture this kind of collections, and put forward an efficient algorithm to simultaneously find the top-K disjoint CACs together with their anomalous behavior patterns. Compared with existing approaches, our new algorithm can find disjoint anomaly collections with coherent extreme behavior without having to specify either their number or sizes. Results on real Twitter data show that our approach discovers meaningful and informative hashtag spammer groups of various sizes which are hard to detect by clustering-based methods. |
format |
text |
author |
DAI, Hanbo ZHU, Feida Ee-peng LIM, Hwee Hwa PANG, |
author_facet |
DAI, Hanbo ZHU, Feida Ee-peng LIM, Hwee Hwa PANG, |
author_sort |
DAI, Hanbo |
title |
Mining coherent anomaly collections on web data |
title_short |
Mining coherent anomaly collections on web data |
title_full |
Mining coherent anomaly collections on web data |
title_fullStr |
Mining coherent anomaly collections on web data |
title_full_unstemmed |
Mining coherent anomaly collections on web data |
title_sort |
mining coherent anomaly collections on web data |
publisher |
Institutional Knowledge at Singapore Management University |
publishDate |
2012 |
url |
https://ink.library.smu.edu.sg/sis_research/2869 https://ink.library.smu.edu.sg/context/sis_research/article/3869/viewcontent/MiningCoherentAnomalyCollections_2012_CIKM.pdf |
_version_ |
1770572659686375424 |