VULNERABILITY DETECTION IN PHP WEB APPLICATION USING LEXICAL ANALYSIS APPROACH WITH MACHINE LEARNING

<p align="justify">One of the important aspect in PHP web application development is the security aspect. Data breach is caused by vulnerabilities in web applications. The method for detecting vulnerability is by performing static analysis. Static analysis is a method in application...

Full description

Saved in:
Bibliographic Details
Main Author: RIZKI ANBIYA - NIM : 23515029, DHIKA
Format: Theses
Language:Indonesia
Online Access:https://digilib.itb.ac.id/gdl/view/26584
Tags: Add Tag
No Tags, Be the first to tag this record!
Institution: Institut Teknologi Bandung
Language: Indonesia
id id-itb.:26584
spelling id-itb.:265842018-03-15T16:01:17ZVULNERABILITY DETECTION IN PHP WEB APPLICATION USING LEXICAL ANALYSIS APPROACH WITH MACHINE LEARNING RIZKI ANBIYA - NIM : 23515029, DHIKA Indonesia Theses INSTITUT TEKNOLOGI BANDUNG https://digilib.itb.ac.id/gdl/view/26584 <p align="justify">One of the important aspect in PHP web application development is the security aspect. Data breach is caused by vulnerabilities in web applications. The method for detecting vulnerability is by performing static analysis. Static analysis is a method in application analysis that performed without executing the program. The advantage of static analysis is that this method performs a deep checking on the source code so that the root of security problems can be found, not just the symptoms of security problems. However, to perform static analysis requires an expert and takes huge amout of time. <br /> <br /> <br /> Security vulnerability detection can also be done using lexical analysis and machine learning. Lexical analysis is performed by transforming the source code into the form of information that is easy to be processed such as token which is then applied to machine learning for the classification. Selecting features and classification algorithms affect to the results of security vulnerability detection. The cross-project detection is applied in this research. Data comes from cve details website with details of 264 sqli, 80 cross site scripting, 117 trasversal directories and 136,090 not vulnerable. The Distribution of data is imbalanced, then it required techniques to handle by doing oversampling SMOTE and undersampling Cluster Centroid. The features are AST tokens and PHP tokens as well as pruning on ASTs and modifications on PHP’s tokens. The machine learning algorithm uses Gaussian Naïve Bayes (GNB), Support Vector Machine (SVM) and Decision Tree for classification and also KMeans for clustering. In the KMeans algorithm is weighted by giving weight to features that often appear on vulnerable classes. <br /> <br /> <br /> Based on the test results, the GNB algorithm with modification on PHP’s token as a feature has the highest recall value for two class vulnerability classification and four class vulnerability classes but has a very low precision value.<p align="justify"> <br /> text
institution Institut Teknologi Bandung
building Institut Teknologi Bandung Library
continent Asia
country Indonesia
Indonesia
content_provider Institut Teknologi Bandung
collection Digital ITB
language Indonesia
description <p align="justify">One of the important aspect in PHP web application development is the security aspect. Data breach is caused by vulnerabilities in web applications. The method for detecting vulnerability is by performing static analysis. Static analysis is a method in application analysis that performed without executing the program. The advantage of static analysis is that this method performs a deep checking on the source code so that the root of security problems can be found, not just the symptoms of security problems. However, to perform static analysis requires an expert and takes huge amout of time. <br /> <br /> <br /> Security vulnerability detection can also be done using lexical analysis and machine learning. Lexical analysis is performed by transforming the source code into the form of information that is easy to be processed such as token which is then applied to machine learning for the classification. Selecting features and classification algorithms affect to the results of security vulnerability detection. The cross-project detection is applied in this research. Data comes from cve details website with details of 264 sqli, 80 cross site scripting, 117 trasversal directories and 136,090 not vulnerable. The Distribution of data is imbalanced, then it required techniques to handle by doing oversampling SMOTE and undersampling Cluster Centroid. The features are AST tokens and PHP tokens as well as pruning on ASTs and modifications on PHP’s tokens. The machine learning algorithm uses Gaussian Naïve Bayes (GNB), Support Vector Machine (SVM) and Decision Tree for classification and also KMeans for clustering. In the KMeans algorithm is weighted by giving weight to features that often appear on vulnerable classes. <br /> <br /> <br /> Based on the test results, the GNB algorithm with modification on PHP’s token as a feature has the highest recall value for two class vulnerability classification and four class vulnerability classes but has a very low precision value.<p align="justify"> <br />
format Theses
author RIZKI ANBIYA - NIM : 23515029, DHIKA
spellingShingle RIZKI ANBIYA - NIM : 23515029, DHIKA
VULNERABILITY DETECTION IN PHP WEB APPLICATION USING LEXICAL ANALYSIS APPROACH WITH MACHINE LEARNING
author_facet RIZKI ANBIYA - NIM : 23515029, DHIKA
author_sort RIZKI ANBIYA - NIM : 23515029, DHIKA
title VULNERABILITY DETECTION IN PHP WEB APPLICATION USING LEXICAL ANALYSIS APPROACH WITH MACHINE LEARNING
title_short VULNERABILITY DETECTION IN PHP WEB APPLICATION USING LEXICAL ANALYSIS APPROACH WITH MACHINE LEARNING
title_full VULNERABILITY DETECTION IN PHP WEB APPLICATION USING LEXICAL ANALYSIS APPROACH WITH MACHINE LEARNING
title_fullStr VULNERABILITY DETECTION IN PHP WEB APPLICATION USING LEXICAL ANALYSIS APPROACH WITH MACHINE LEARNING
title_full_unstemmed VULNERABILITY DETECTION IN PHP WEB APPLICATION USING LEXICAL ANALYSIS APPROACH WITH MACHINE LEARNING
title_sort vulnerability detection in php web application using lexical analysis approach with machine learning
url https://digilib.itb.ac.id/gdl/view/26584
_version_ 1822921959717994496