On Challenges in Evaluating Malware Clustering

Malware clustering and classification are important tools that enable analysts to prioritize their malware analysis efforts. The recent emergence of fully automated methods for malware clustering and classification that report high accuracy suggests that this problem may largely be solved. In this p...

Full description

Saved in:

Bibliographic Details
Main Authors:	LI, Peng, LIU, Limin, GAO, Debin, Reiter, Michael K
Format:	text
Language:	English
Published:	Institutional Knowledge at Singapore Management University 2010
Subjects:	malware clustering and classification plagiarism detection Information Security
Online Access:	https://ink.library.smu.edu.sg/sis_research/1319 https://ink.library.smu.edu.sg/context/sis_research/article/2318/viewcontent/1439313b3296c24da7869145991e73fe3b81.pdf
Tags:	Add Tag No Tags, Be the first to tag this record!
Institution:	Singapore Management University
Language:	English

id	sg-smu-ink.sis_research-2318
record_format	dspace
spelling	sg-smu-ink.sis_research-23182020-07-22T07:29:04Z On Challenges in Evaluating Malware Clustering LI, Peng LIU, Limin GAO, Debin Reiter, Michael K Malware clustering and classification are important tools that enable analysts to prioritize their malware analysis efforts. The recent emergence of fully automated methods for malware clustering and classification that report high accuracy suggests that this problem may largely be solved. In this paper, we report the results of our attempt to confirm our conjecture that the method of selecting ground-truth data in prior evaluations biases their results toward high accuracy. To examine this conjecture, we apply clustering algorithms from a different domain (plagiarism detection), first to the dataset used in a prior work's evaluation and then to a wholly new malware dataset, to see if clustering algorithms developed without attention to subtleties of malware obfuscation are nevertheless successful. While these studies provide conflicting signals as to the correctness of our conjecture, our investigation of possible reasons uncovers, we believe, a cautionary note regarding the significance of highly accurate clustering results, as can be impacted by testing on a dataset with a biased cluster-size distribution. 2010-09-01T07:00:00Z text application/pdf https://ink.library.smu.edu.sg/sis_research/1319 info:doi/10.1007/978-3-642-15512-3_13 https://ink.library.smu.edu.sg/context/sis_research/article/2318/viewcontent/1439313b3296c24da7869145991e73fe3b81.pdf http://creativecommons.org/licenses/by-nc-nd/4.0/ Research Collection School Of Computing and Information Systems eng Institutional Knowledge at Singapore Management University malware clustering and classification plagiarism detection Information Security
institution	Singapore Management University
building	SMU Libraries
continent	Asia
country	Singapore Singapore
content_provider	SMU Libraries
collection	InK@SMU
language	English
topic	malware clustering and classification plagiarism detection Information Security
spellingShingle	malware clustering and classification plagiarism detection Information Security LI, Peng LIU, Limin GAO, Debin Reiter, Michael K On Challenges in Evaluating Malware Clustering
description	Malware clustering and classification are important tools that enable analysts to prioritize their malware analysis efforts. The recent emergence of fully automated methods for malware clustering and classification that report high accuracy suggests that this problem may largely be solved. In this paper, we report the results of our attempt to confirm our conjecture that the method of selecting ground-truth data in prior evaluations biases their results toward high accuracy. To examine this conjecture, we apply clustering algorithms from a different domain (plagiarism detection), first to the dataset used in a prior work's evaluation and then to a wholly new malware dataset, to see if clustering algorithms developed without attention to subtleties of malware obfuscation are nevertheless successful. While these studies provide conflicting signals as to the correctness of our conjecture, our investigation of possible reasons uncovers, we believe, a cautionary note regarding the significance of highly accurate clustering results, as can be impacted by testing on a dataset with a biased cluster-size distribution.
format	text
author	LI, Peng LIU, Limin GAO, Debin Reiter, Michael K
author_facet	LI, Peng LIU, Limin GAO, Debin Reiter, Michael K
author_sort	LI, Peng
title	On Challenges in Evaluating Malware Clustering
title_short	On Challenges in Evaluating Malware Clustering
title_full	On Challenges in Evaluating Malware Clustering
title_fullStr	On Challenges in Evaluating Malware Clustering
title_full_unstemmed	On Challenges in Evaluating Malware Clustering
title_sort	on challenges in evaluating malware clustering
publisher	Institutional Knowledge at Singapore Management University
publishDate	2010
url	https://ink.library.smu.edu.sg/sis_research/1319 https://ink.library.smu.edu.sg/context/sis_research/article/2318/viewcontent/1439313b3296c24da7869145991e73fe3b81.pdf
_version_	1770570948170219520

On Challenges in Evaluating Malware Clustering

Similar Items