Solving real world security problems : hacking and protection (2)

Researchers are always looking for better ways to improve their vulnerabilities detection and analysis workflow. A way of improving static and dynamic analysis is explored here: to build highly contextualized databases of knowledge about a software codebase – everything from its code structure, to i...

Full description

Saved in:
Bibliographic Details
Main Author: Tan, Joshua Jun Ming
Other Authors: Liu Yang
Format: Final Year Project
Language:English
Published: 2018
Subjects:
Online Access:http://hdl.handle.net/10356/74867
Tags: Add Tag
No Tags, Be the first to tag this record!
Institution: Nanyang Technological University
Language: English
Description
Summary:Researchers are always looking for better ways to improve their vulnerabilities detection and analysis workflow. A way of improving static and dynamic analysis is explored here: to build highly contextualized databases of knowledge about a software codebase – everything from its code structure, to its commit history, to its function calls. This project seeks to contribute to the knowledge base by discovering which functions have many calls to other functions, which are those that are most frequently called, and which are the isolated ones. Armed with this information, researchers can easily identify which functions/files are affected when a single statement is modified. By precisely narrowing down the number of functions/files to analyse (with static/dynamic analysis tools), time and other computational resources are conserved. This project is composed of three major components: the Code Parser, Patch Analyser, and Graph Builder. The Patch Analyser parses patch files and determines the files and functions which have been modified. The Code Parser comes up with a full listing of the function calls and highlights certain interesting statements (e.g. casting operations) that are invoked within these functions. The Graph Builder then translates this to a function call graph which is added to the knowledge base. This graph can be queried manually or programmatically to draw new insights regarding the structure of the software. A total of 192 patch files for the Linux kernel were analysed. These patches are significant in that they are mostly classified as in the “Medium” to “High” severity range. These were used to generate the training and test sets for evaluation of the neural network-driven prediction model. Future work could involve selecting other high-variance features from the code repository to improve the prediction model and help to contribute to the swifter discovery of genuine vulnerabilities in software.