DORA: Feature selection for network-based intrusion detection models

Intrusion Detection System (IDS) use models as a basis for detecting intrusions. To ensure that these models are comprehensive enough, a huge and highly-dimensional data must be fed to the system. In this study, the data set will contain a huge amount of normal traffic data and a sufficient number o...

Full description

Saved in:
Bibliographic Details
Main Authors: Acosta, Juan Carlos A., Diguangco, Wilma Patricia A., Obal, Dan Paolo B., Reforeal, Henri Frederic T.
Format: text
Language:English
Published: Animo Repository 2012
Online Access:https://animorepository.dlsu.edu.ph/etd_bachelors/14784
Tags: Add Tag
No Tags, Be the first to tag this record!
Institution: De La Salle University
Language: English
id oai:animorepository.dlsu.edu.ph:etd_bachelors-15426
record_format eprints
spelling oai:animorepository.dlsu.edu.ph:etd_bachelors-154262021-11-24T02:32:11Z DORA: Feature selection for network-based intrusion detection models Acosta, Juan Carlos A. Diguangco, Wilma Patricia A. Obal, Dan Paolo B. Reforeal, Henri Frederic T. Intrusion Detection System (IDS) use models as a basis for detecting intrusions. To ensure that these models are comprehensive enough, a huge and highly-dimensional data must be fed to the system. In this study, the data set will contain a huge amount of normal traffic data and a sufficient number of network intrusions data to ensure that the model will be able to correctly classify intrusions. Often, data set are noisy – meaning, it contains a lot of redundant data along with the irrelevant features that can only compromise the classification accuracy and performance of the generated model. To avoid this, the redundant data must be filtered and irrelevant features must be dropped. The goal of this study is to determine what the best features are for an intrusion detection model, which is highly dependent upon the feature selection algorithms that will be tested against the same data set. The findings of the study shows that the combined packet headers and n-grams s feature set can dramatically increase the classifications accuracy of the model being built. The results also proved that selecting only the best features from the entire feature set can increase the classification accuracy of the intrusion detection model even further. Based on the test results, the best performing algorithms are Decision Trees while the best feature selection algorithm is the N-Gram Information Gain, given the data set. 2012-01-01T08:00:00Z text https://animorepository.dlsu.edu.ph/etd_bachelors/14784 Bachelor's Theses English Animo Repository
institution De La Salle University
building De La Salle University Library
continent Asia
country Philippines
Philippines
content_provider De La Salle University Library
collection DLSU Institutional Repository
language English
description Intrusion Detection System (IDS) use models as a basis for detecting intrusions. To ensure that these models are comprehensive enough, a huge and highly-dimensional data must be fed to the system. In this study, the data set will contain a huge amount of normal traffic data and a sufficient number of network intrusions data to ensure that the model will be able to correctly classify intrusions. Often, data set are noisy – meaning, it contains a lot of redundant data along with the irrelevant features that can only compromise the classification accuracy and performance of the generated model. To avoid this, the redundant data must be filtered and irrelevant features must be dropped. The goal of this study is to determine what the best features are for an intrusion detection model, which is highly dependent upon the feature selection algorithms that will be tested against the same data set. The findings of the study shows that the combined packet headers and n-grams s feature set can dramatically increase the classifications accuracy of the model being built. The results also proved that selecting only the best features from the entire feature set can increase the classification accuracy of the intrusion detection model even further. Based on the test results, the best performing algorithms are Decision Trees while the best feature selection algorithm is the N-Gram Information Gain, given the data set.
format text
author Acosta, Juan Carlos A.
Diguangco, Wilma Patricia A.
Obal, Dan Paolo B.
Reforeal, Henri Frederic T.
spellingShingle Acosta, Juan Carlos A.
Diguangco, Wilma Patricia A.
Obal, Dan Paolo B.
Reforeal, Henri Frederic T.
DORA: Feature selection for network-based intrusion detection models
author_facet Acosta, Juan Carlos A.
Diguangco, Wilma Patricia A.
Obal, Dan Paolo B.
Reforeal, Henri Frederic T.
author_sort Acosta, Juan Carlos A.
title DORA: Feature selection for network-based intrusion detection models
title_short DORA: Feature selection for network-based intrusion detection models
title_full DORA: Feature selection for network-based intrusion detection models
title_fullStr DORA: Feature selection for network-based intrusion detection models
title_full_unstemmed DORA: Feature selection for network-based intrusion detection models
title_sort dora: feature selection for network-based intrusion detection models
publisher Animo Repository
publishDate 2012
url https://animorepository.dlsu.edu.ph/etd_bachelors/14784
_version_ 1718383386882473984