DORA: Feature selection for network-based intrusion detection models
Intrusion Detection System (IDS) use models as a basis for detecting intrusions. To ensure that these models are comprehensive enough, a huge and highly-dimensional data must be fed to the system. In this study, the data set will contain a huge amount of normal traffic data and a sufficient number o...
Saved in:
Main Authors: | , , , |
---|---|
Format: | text |
Language: | English |
Published: |
Animo Repository
2012
|
Online Access: | https://animorepository.dlsu.edu.ph/etd_bachelors/14784 |
Tags: |
Add Tag
No Tags, Be the first to tag this record!
|
Institution: | De La Salle University |
Language: | English |
id |
oai:animorepository.dlsu.edu.ph:etd_bachelors-15426 |
---|---|
record_format |
eprints |
spelling |
oai:animorepository.dlsu.edu.ph:etd_bachelors-154262021-11-24T02:32:11Z DORA: Feature selection for network-based intrusion detection models Acosta, Juan Carlos A. Diguangco, Wilma Patricia A. Obal, Dan Paolo B. Reforeal, Henri Frederic T. Intrusion Detection System (IDS) use models as a basis for detecting intrusions. To ensure that these models are comprehensive enough, a huge and highly-dimensional data must be fed to the system. In this study, the data set will contain a huge amount of normal traffic data and a sufficient number of network intrusions data to ensure that the model will be able to correctly classify intrusions. Often, data set are noisy – meaning, it contains a lot of redundant data along with the irrelevant features that can only compromise the classification accuracy and performance of the generated model. To avoid this, the redundant data must be filtered and irrelevant features must be dropped. The goal of this study is to determine what the best features are for an intrusion detection model, which is highly dependent upon the feature selection algorithms that will be tested against the same data set. The findings of the study shows that the combined packet headers and n-grams s feature set can dramatically increase the classifications accuracy of the model being built. The results also proved that selecting only the best features from the entire feature set can increase the classification accuracy of the intrusion detection model even further. Based on the test results, the best performing algorithms are Decision Trees while the best feature selection algorithm is the N-Gram Information Gain, given the data set. 2012-01-01T08:00:00Z text https://animorepository.dlsu.edu.ph/etd_bachelors/14784 Bachelor's Theses English Animo Repository |
institution |
De La Salle University |
building |
De La Salle University Library |
continent |
Asia |
country |
Philippines Philippines |
content_provider |
De La Salle University Library |
collection |
DLSU Institutional Repository |
language |
English |
description |
Intrusion Detection System (IDS) use models as a basis for detecting intrusions. To ensure that these models are comprehensive enough, a huge and highly-dimensional data must be fed to the system. In this study, the data set will contain a huge amount of normal traffic data and a sufficient number of network intrusions data to ensure that the model will be able to correctly classify intrusions. Often, data set are noisy – meaning, it contains a lot of redundant data along with the irrelevant features that can only compromise the classification accuracy and performance of the generated model. To avoid this, the redundant data must be filtered and irrelevant features must be dropped. The goal of this study is to determine what the best features are for an intrusion detection model, which is highly dependent upon the feature selection algorithms that will be tested against the same data set. The findings of the study shows that the combined packet headers and n-grams s feature set can dramatically increase the classifications accuracy of the model being built. The results also proved that selecting only the best features from the entire feature set can increase the classification accuracy of the intrusion detection model even further. Based on the test results, the best performing algorithms are Decision Trees while the best feature selection algorithm is the N-Gram Information Gain, given the data set. |
format |
text |
author |
Acosta, Juan Carlos A. Diguangco, Wilma Patricia A. Obal, Dan Paolo B. Reforeal, Henri Frederic T. |
spellingShingle |
Acosta, Juan Carlos A. Diguangco, Wilma Patricia A. Obal, Dan Paolo B. Reforeal, Henri Frederic T. DORA: Feature selection for network-based intrusion detection models |
author_facet |
Acosta, Juan Carlos A. Diguangco, Wilma Patricia A. Obal, Dan Paolo B. Reforeal, Henri Frederic T. |
author_sort |
Acosta, Juan Carlos A. |
title |
DORA: Feature selection for network-based intrusion detection models |
title_short |
DORA: Feature selection for network-based intrusion detection models |
title_full |
DORA: Feature selection for network-based intrusion detection models |
title_fullStr |
DORA: Feature selection for network-based intrusion detection models |
title_full_unstemmed |
DORA: Feature selection for network-based intrusion detection models |
title_sort |
dora: feature selection for network-based intrusion detection models |
publisher |
Animo Repository |
publishDate |
2012 |
url |
https://animorepository.dlsu.edu.ph/etd_bachelors/14784 |
_version_ |
1718383386882473984 |