Towards Ground Truthing Observations in Gray-Box Anomaly Detection

Anomaly detection has been attracting interests from researchers due to its advantage of being able to detect zero-day exploits. A gray-box anomaly detector first observes benign executions of a computer program and then extracts reliable rules that govern the normal execution of the program. Howeve...

Full description

Saved in:
Bibliographic Details
Main Authors: MING, Jiang, ZHANG, Haibin, GAO, Debin
Format: text
Language:English
Published: Institutional Knowledge at Singapore Management University 2011
Subjects:
Online Access:https://ink.library.smu.edu.sg/sis_research/2006
https://ink.library.smu.edu.sg/context/sis_research/article/3005/viewcontent/nss11.pdf
Tags: Add Tag
No Tags, Be the first to tag this record!
Institution: Singapore Management University
Language: English
Description
Summary:Anomaly detection has been attracting interests from researchers due to its advantage of being able to detect zero-day exploits. A gray-box anomaly detector first observes benign executions of a computer program and then extracts reliable rules that govern the normal execution of the program. However, such observations from benign executions are not necessarily true evidences supporting the rules learned. For example, the observation that a file descriptor being equal to a socket descriptor should not be considered supporting a rule governing the two values to be the same. Ground truthing such observations is a difficult problem since it is not practical to analyze the semantics of every instruction in every program to be protected. In this paper, we propose using taint analysis to automatically help the ground truthing. Intuitively, the same taint source of two values provides ground truth of the data dependence. We implement a host-based anomaly detector with our proposed taint tracking and evaluate the accuracy of rules learned. Results show that we not only manage to filter out incorrect rules that would otherwise be learned (with high support and confidence), but manage recover good rules that are previously believed to be unreliable. We also present overheads of our system and time needed for training.