Feature selection using information gain for improved structural-based alert correlation

Grouping and clustering alerts for intrusion detection based on the similarity of features is referred to as structurally base alert correlation and can discover a list of attack steps. Previous researchers selected different features and data sources manually based on their knowledge and experience...

Full description

Saved in:
Bibliographic Details
Main Authors: Alhaj, T. A., Siraj, M. M., Zainal, A., Elshoush, H. T., Elhaj, F.
Format: Article
Language:English
Published: Public Library of Science 2016
Subjects:
Online Access:http://eprints.utm.my/id/eprint/71959/7/AnazidaZainal2016_FeatureSelectionusingInformationGain.pdf
http://eprints.utm.my/id/eprint/71959/
https://www.scopus.com/inward/record.uri?eid=2-s2.0-84998705814&doi=10.1371%2fjournal.pone.0166017&partnerID=40&md5=9ac511beaa64f2471387c37e3f9855c1
Tags: Add Tag
No Tags, Be the first to tag this record!
Institution: Universiti Teknologi Malaysia
Language: English
Description
Summary:Grouping and clustering alerts for intrusion detection based on the similarity of features is referred to as structurally base alert correlation and can discover a list of attack steps. Previous researchers selected different features and data sources manually based on their knowledge and experience, which lead to the less accurate identification of attack steps and inconsistent performance of clustering accuracy. Furthermore, the existing alert correlation systems deal with a huge amount of data that contains null values, incomplete information, and irrelevant features causing the analysis of the alerts to be tedious, time-consuming and error-prone. Therefore, this paper focuses on selecting accurate and significant features of alerts that are appropriate to represent the attack steps, thus, enhancing the structural-based alert correlation model. A two-tier feature selection method is proposed to obtain the significant features. The first tier aims at ranking the subset of features based on high information gain entropy in decreasing order. The second tier extends additional features with a better discriminative ability than the initially ranked features. Performance analysis results show the significance of the selected features in terms of the clustering accuracy using 2000 DARPA intrusion detection scenario-specific dataset.