Solving real world security problems : hacking and protection

The following report examines the security of using open source library. Even though open source libraries were designed to be secure via transparency, it is only secure if the weakest link, the users, update it constantly when new vulnerabilities are discovered. We also found out that a huge num...

Full description

Saved in:
Bibliographic Details
Main Author: Ng, Shi Kai
Other Authors: Liu Yang
Format: Final Year Project
Language:English
Published: 2018
Subjects:
Online Access:http://hdl.handle.net/10356/74026
Tags: Add Tag
No Tags, Be the first to tag this record!
Institution: Nanyang Technological University
Language: English
Description
Summary:The following report examines the security of using open source library. Even though open source libraries were designed to be secure via transparency, it is only secure if the weakest link, the users, update it constantly when new vulnerabilities are discovered. We also found out that a huge number of commercial applications are actually using open source libraries. Thus, in our project, we hope to identify which open source and its corresponding version commercial applications are using. From there, we can then observe if there are any vulnerabilities associated with the commercial applications analyzed. For the identification of the different projects used, we would utilize language processing models. We would extract n-grams and conduct Term Frequency – Inverse Document Frequency (tf-idf) analysis on the data collected. N-grams are bag of words that are removed from a document, where ‘n’ refers to the bag size. A bi-gram of ‘My first project’ for instance, would refer to bags of words of size 2, namely ‘my first’ and ‘first project’. Tfidf is a popular metric to determine if an n-gram uniquely identifies a document by analyzing its frequency of occurrence both within the document (term frequency) and in other documents (inverse document frequency). A high tf-idf score would indicate that the n-gram can accurately identify the document. This report only deals with a component of the project, namely the building of a database of bigram and trigram mapped to the projects analyzed.