DORA: Feature selection for network-based intrusion detection models

Intrusion Detection System (IDS) use models as a basis for detecting intrusions. To ensure that these models are comprehensive enough, a huge and highly-dimensional data must be fed to the system. In this study, the data set will contain a huge amount of normal traffic data and a sufficient number o...

Full description

Saved in:
Bibliographic Details
Main Authors: Acosta, Juan Carlos A., Diguangco, Wilma Patricia A., Obal, Dan Paolo B., Reforeal, Henri Frederic T.
Format: text
Language:English
Published: Animo Repository 2012
Online Access:https://animorepository.dlsu.edu.ph/etd_bachelors/14784
Tags: Add Tag
No Tags, Be the first to tag this record!
Institution: De La Salle University
Language: English
Description
Summary:Intrusion Detection System (IDS) use models as a basis for detecting intrusions. To ensure that these models are comprehensive enough, a huge and highly-dimensional data must be fed to the system. In this study, the data set will contain a huge amount of normal traffic data and a sufficient number of network intrusions data to ensure that the model will be able to correctly classify intrusions. Often, data set are noisy – meaning, it contains a lot of redundant data along with the irrelevant features that can only compromise the classification accuracy and performance of the generated model. To avoid this, the redundant data must be filtered and irrelevant features must be dropped. The goal of this study is to determine what the best features are for an intrusion detection model, which is highly dependent upon the feature selection algorithms that will be tested against the same data set. The findings of the study shows that the combined packet headers and n-grams s feature set can dramatically increase the classifications accuracy of the model being built. The results also proved that selecting only the best features from the entire feature set can increase the classification accuracy of the intrusion detection model even further. Based on the test results, the best performing algorithms are Decision Trees while the best feature selection algorithm is the N-Gram Information Gain, given the data set.