VULNERABILITY DETECTION IN PHP WEB APPLICATION USING LEXICAL ANALYSIS APPROACH WITH MACHINE LEARNING
<p align="justify">One of the important aspect in PHP web application development is the security aspect. Data breach is caused by vulnerabilities in web applications. The method for detecting vulnerability is by performing static analysis. Static analysis is a method in application...
Saved in:
Main Author: | |
---|---|
Format: | Theses |
Language: | Indonesia |
Online Access: | https://digilib.itb.ac.id/gdl/view/26584 |
Tags: |
Add Tag
No Tags, Be the first to tag this record!
|
Institution: | Institut Teknologi Bandung |
Language: | Indonesia |
id |
id-itb.:26584 |
---|---|
spelling |
id-itb.:265842018-03-15T16:01:17ZVULNERABILITY DETECTION IN PHP WEB APPLICATION USING LEXICAL ANALYSIS APPROACH WITH MACHINE LEARNING RIZKI ANBIYA - NIM : 23515029, DHIKA Indonesia Theses INSTITUT TEKNOLOGI BANDUNG https://digilib.itb.ac.id/gdl/view/26584 <p align="justify">One of the important aspect in PHP web application development is the security aspect. Data breach is caused by vulnerabilities in web applications. The method for detecting vulnerability is by performing static analysis. Static analysis is a method in application analysis that performed without executing the program. The advantage of static analysis is that this method performs a deep checking on the source code so that the root of security problems can be found, not just the symptoms of security problems. However, to perform static analysis requires an expert and takes huge amout of time. <br /> <br /> <br /> Security vulnerability detection can also be done using lexical analysis and machine learning. Lexical analysis is performed by transforming the source code into the form of information that is easy to be processed such as token which is then applied to machine learning for the classification. Selecting features and classification algorithms affect to the results of security vulnerability detection. The cross-project detection is applied in this research. Data comes from cve details website with details of 264 sqli, 80 cross site scripting, 117 trasversal directories and 136,090 not vulnerable. The Distribution of data is imbalanced, then it required techniques to handle by doing oversampling SMOTE and undersampling Cluster Centroid. The features are AST tokens and PHP tokens as well as pruning on ASTs and modifications on PHP’s tokens. The machine learning algorithm uses Gaussian Naïve Bayes (GNB), Support Vector Machine (SVM) and Decision Tree for classification and also KMeans for clustering. In the KMeans algorithm is weighted by giving weight to features that often appear on vulnerable classes. <br /> <br /> <br /> Based on the test results, the GNB algorithm with modification on PHP’s token as a feature has the highest recall value for two class vulnerability classification and four class vulnerability classes but has a very low precision value.<p align="justify"> <br /> text |
institution |
Institut Teknologi Bandung |
building |
Institut Teknologi Bandung Library |
continent |
Asia |
country |
Indonesia Indonesia |
content_provider |
Institut Teknologi Bandung |
collection |
Digital ITB |
language |
Indonesia |
description |
<p align="justify">One of the important aspect in PHP web application development is the security aspect. Data breach is caused by vulnerabilities in web applications. The method for detecting vulnerability is by performing static analysis. Static analysis is a method in application analysis that performed without executing the program. The advantage of static analysis is that this method performs a deep checking on the source code so that the root of security problems can be found, not just the symptoms of security problems. However, to perform static analysis requires an expert and takes huge amout of time. <br />
<br />
<br />
Security vulnerability detection can also be done using lexical analysis and machine learning. Lexical analysis is performed by transforming the source code into the form of information that is easy to be processed such as token which is then applied to machine learning for the classification. Selecting features and classification algorithms affect to the results of security vulnerability detection. The cross-project detection is applied in this research. Data comes from cve details website with details of 264 sqli, 80 cross site scripting, 117 trasversal directories and 136,090 not vulnerable. The Distribution of data is imbalanced, then it required techniques to handle by doing oversampling SMOTE and undersampling Cluster Centroid. The features are AST tokens and PHP tokens as well as pruning on ASTs and modifications on PHP’s tokens. The machine learning algorithm uses Gaussian Naïve Bayes (GNB), Support Vector Machine (SVM) and Decision Tree for classification and also KMeans for clustering. In the KMeans algorithm is weighted by giving weight to features that often appear on vulnerable classes. <br />
<br />
<br />
Based on the test results, the GNB algorithm with modification on PHP’s token as a feature has the highest recall value for two class vulnerability classification and four class vulnerability classes but has a very low precision value.<p align="justify"> <br />
|
format |
Theses |
author |
RIZKI ANBIYA - NIM : 23515029, DHIKA |
spellingShingle |
RIZKI ANBIYA - NIM : 23515029, DHIKA VULNERABILITY DETECTION IN PHP WEB APPLICATION USING LEXICAL ANALYSIS APPROACH WITH MACHINE LEARNING |
author_facet |
RIZKI ANBIYA - NIM : 23515029, DHIKA |
author_sort |
RIZKI ANBIYA - NIM : 23515029, DHIKA |
title |
VULNERABILITY DETECTION IN PHP WEB APPLICATION USING LEXICAL ANALYSIS APPROACH WITH MACHINE LEARNING |
title_short |
VULNERABILITY DETECTION IN PHP WEB APPLICATION USING LEXICAL ANALYSIS APPROACH WITH MACHINE LEARNING |
title_full |
VULNERABILITY DETECTION IN PHP WEB APPLICATION USING LEXICAL ANALYSIS APPROACH WITH MACHINE LEARNING |
title_fullStr |
VULNERABILITY DETECTION IN PHP WEB APPLICATION USING LEXICAL ANALYSIS APPROACH WITH MACHINE LEARNING |
title_full_unstemmed |
VULNERABILITY DETECTION IN PHP WEB APPLICATION USING LEXICAL ANALYSIS APPROACH WITH MACHINE LEARNING |
title_sort |
vulnerability detection in php web application using lexical analysis approach with machine learning |
url |
https://digilib.itb.ac.id/gdl/view/26584 |
_version_ |
1822921959717994496 |