Detection and analysis of web-based malware and vulnerability

Since the dawn of the Internet, all of us have been swept up by the Niagara of information that fills our daily life. In this process, browsers play an extremely important role. Modern browsers have turned from a simple text displayer to a complicated software that supports rich user interfaces and...

Full description

Saved in:
Bibliographic Details
Main Author: Wang, Junjie
Other Authors: Liu Yang
Format: Theses and Dissertations
Language:English
Published: 2019
Subjects:
Online Access:https://hdl.handle.net/10356/89049
http://hdl.handle.net/10220/47659
Tags: Add Tag
No Tags, Be the first to tag this record!
Institution: Nanyang Technological University
Language: English
id sg-ntu-dr.10356-89049
record_format dspace
institution Nanyang Technological University
building NTU Library
country Singapore
collection DR-NTU
language English
topic DRNTU::Engineering::Computer science and engineering
spellingShingle DRNTU::Engineering::Computer science and engineering
Wang, Junjie
Detection and analysis of web-based malware and vulnerability
description Since the dawn of the Internet, all of us have been swept up by the Niagara of information that fills our daily life. In this process, browsers play an extremely important role. Modern browsers have turned from a simple text displayer to a complicated software that supports rich user interfaces and a variety of file formats and protocols. This enlarges the attack surface and makes browsers one of the main targets of cyber attack. Inside the Internet security, JavaScript malware is one of the major threats. They exploit vulnerabilities in the browsers to launch attacks remotely. To protect end-users from these threats, this thesis makes two main contributions: identifying JavaScript malware and detecting vulnerabilities in browsers, which aim at a complete solution for Internet security. In identifying JavaScript malware, we first propose to classify JavaScript malware using the machine learning approach combined with dynamic confirmation. Static and dynamic approaches both have merits and drawbacks. Dynamic approaches are effective while not scalable. Static approaches are efficient but normally suffer from a high false negative ratio. To identify JavaScript malware effectively and efficiently, we propose a two-phase approach. The first phase lightweight classifies JavaScript malware from benign web pages. Then the second phase further subdivides the attack behaviors of JavaScript malware. We implement our approach as an online tool and conduct a large-scale experiment to show its effectiveness. Towards an insightful analysis of JavaScript malware evolution trend, it is desirable to further classify them according to the exploited attack vector and the corresponding attack behaviors. Considering the emergence of numerous new JavaScript malware and their variants, such an automated classification can significantly speed up the overall response to the JavaScript malware and even shorten the time to discover the zero-day attacks. We propose to use the Deterministic Finite Automaton (DFA), to summarize patterns of malware. Our approach can automatically learn a DFA from the dynamic execution traces of JavaScript malware. The experiment results demonstrate that our approach is more scalable and effective in JavaScript malware detection and classification, compared with other commercial anti-virus tools. Through previous two works, we realized that the root cause of the prevalence of JavaScript malware is the existence of vulnerabilities in browsers. Therefore, finding vulnerabilities in browsers and improving mitigation is of significant importance. We propose a novel data-driven seed generation approach to test the core components of browsers, especially XML engines and XSLT engines. We first learn a Probabilistic Context-Sensitive Grammar (PCSG) from a large number of samples of one specific grammar. The feature of PCSG can help us to generate samples whose syntax and semantics are correct with high probability. The experimental results demonstrate that both the bug finding capability and code coverage of fuzzing are advanced. We further improve coverage-based greybox fuzzing by proposing a new grammar- aware approach for programs that process structured inputs. In details, our approach requires the grammar of test inputs, which is often publicly available. Based on the grammar, we propose a grammar-aware trimming strategy to trim test inputs at the tree level. Besides, we introduce two grammar-aware mutation strategies (i.e., enhanced dictionary-based mutation and tree-based mutation). Tree-based mutation works by replacing sub-trees of the Abstract Syntax Tree (AST) of parsed test inputs. With grammar-awareness, we can effectively mutate test inputs while keeping the input structure valid, quickly carrying the fuzzing exploration into width and depth. We conduct experiments to evaluate the effectiveness of it on one XML engine, libplist and two JavaScript engines, WebKit, and Jerryscript. The results demonstrate that our approach outperforms other fuzzing tools in both code coverage and the bug-finding capability.
author2 Liu Yang
author_facet Liu Yang
Wang, Junjie
format Theses and Dissertations
author Wang, Junjie
author_sort Wang, Junjie
title Detection and analysis of web-based malware and vulnerability
title_short Detection and analysis of web-based malware and vulnerability
title_full Detection and analysis of web-based malware and vulnerability
title_fullStr Detection and analysis of web-based malware and vulnerability
title_full_unstemmed Detection and analysis of web-based malware and vulnerability
title_sort detection and analysis of web-based malware and vulnerability
publishDate 2019
url https://hdl.handle.net/10356/89049
http://hdl.handle.net/10220/47659
_version_ 1681034607746613248
spelling sg-ntu-dr.10356-890492020-03-07T11:50:52Z Detection and analysis of web-based malware and vulnerability Wang, Junjie Liu Yang School of Computer Science and Engineering DRNTU::Engineering::Computer science and engineering Since the dawn of the Internet, all of us have been swept up by the Niagara of information that fills our daily life. In this process, browsers play an extremely important role. Modern browsers have turned from a simple text displayer to a complicated software that supports rich user interfaces and a variety of file formats and protocols. This enlarges the attack surface and makes browsers one of the main targets of cyber attack. Inside the Internet security, JavaScript malware is one of the major threats. They exploit vulnerabilities in the browsers to launch attacks remotely. To protect end-users from these threats, this thesis makes two main contributions: identifying JavaScript malware and detecting vulnerabilities in browsers, which aim at a complete solution for Internet security. In identifying JavaScript malware, we first propose to classify JavaScript malware using the machine learning approach combined with dynamic confirmation. Static and dynamic approaches both have merits and drawbacks. Dynamic approaches are effective while not scalable. Static approaches are efficient but normally suffer from a high false negative ratio. To identify JavaScript malware effectively and efficiently, we propose a two-phase approach. The first phase lightweight classifies JavaScript malware from benign web pages. Then the second phase further subdivides the attack behaviors of JavaScript malware. We implement our approach as an online tool and conduct a large-scale experiment to show its effectiveness. Towards an insightful analysis of JavaScript malware evolution trend, it is desirable to further classify them according to the exploited attack vector and the corresponding attack behaviors. Considering the emergence of numerous new JavaScript malware and their variants, such an automated classification can significantly speed up the overall response to the JavaScript malware and even shorten the time to discover the zero-day attacks. We propose to use the Deterministic Finite Automaton (DFA), to summarize patterns of malware. Our approach can automatically learn a DFA from the dynamic execution traces of JavaScript malware. The experiment results demonstrate that our approach is more scalable and effective in JavaScript malware detection and classification, compared with other commercial anti-virus tools. Through previous two works, we realized that the root cause of the prevalence of JavaScript malware is the existence of vulnerabilities in browsers. Therefore, finding vulnerabilities in browsers and improving mitigation is of significant importance. We propose a novel data-driven seed generation approach to test the core components of browsers, especially XML engines and XSLT engines. We first learn a Probabilistic Context-Sensitive Grammar (PCSG) from a large number of samples of one specific grammar. The feature of PCSG can help us to generate samples whose syntax and semantics are correct with high probability. The experimental results demonstrate that both the bug finding capability and code coverage of fuzzing are advanced. We further improve coverage-based greybox fuzzing by proposing a new grammar- aware approach for programs that process structured inputs. In details, our approach requires the grammar of test inputs, which is often publicly available. Based on the grammar, we propose a grammar-aware trimming strategy to trim test inputs at the tree level. Besides, we introduce two grammar-aware mutation strategies (i.e., enhanced dictionary-based mutation and tree-based mutation). Tree-based mutation works by replacing sub-trees of the Abstract Syntax Tree (AST) of parsed test inputs. With grammar-awareness, we can effectively mutate test inputs while keeping the input structure valid, quickly carrying the fuzzing exploration into width and depth. We conduct experiments to evaluate the effectiveness of it on one XML engine, libplist and two JavaScript engines, WebKit, and Jerryscript. The results demonstrate that our approach outperforms other fuzzing tools in both code coverage and the bug-finding capability. Doctor of Philosophy 2019-02-13T06:19:48Z 2019-12-06T17:16:44Z 2019-02-13T06:19:48Z 2019-12-06T17:16:44Z 2018 Thesis Wang, J. (2018). Detection and analysis of web-based malware and vulnerability. Doctoral thesis, Nanyang Technological University, Singapore. https://hdl.handle.net/10356/89049 http://hdl.handle.net/10220/47659 10.32657/10220/47659 en 190 p. application/pdf