Frequency analysis and online learning in malware detection

Traditional antivirus products are signature-based solutions, which rely on a static database to perform detection. The weakness of this design is that the signatures may become outdated, resulting in the failure to detect new samples. The other method is behavior-based detection, which aims to iden...

Full description

Saved in:
Bibliographic Details
Main Author: Huynh, Ngoc Anh
Other Authors: Ng Wee Keong
Format: Theses and Dissertations
Language:English
Published: 2019
Subjects:
Online Access:https://hdl.handle.net/10356/93574
http://hdl.handle.net/10220/49944
Tags: Add Tag
No Tags, Be the first to tag this record!
Institution: Nanyang Technological University
Language: English
id sg-ntu-dr.10356-93574
record_format dspace
spelling sg-ntu-dr.10356-935742020-10-28T08:40:50Z Frequency analysis and online learning in malware detection Huynh, Ngoc Anh Ng Wee Keong School of Computer Science and Engineering Fraunhofer Singapore Engineering::Computer science and engineering::Computing methodologies::Artificial intelligence Traditional antivirus products are signature-based solutions, which rely on a static database to perform detection. The weakness of this design is that the signatures may become outdated, resulting in the failure to detect new samples. The other method is behavior-based detection, which aims to identify malware based on their dynamic behavior. Behavior-based detection comes in two approaches. The first approach leverages on common known behaviors of malware such as random domain name generation and periodicity. The second approach aims to directly learn the behavior of malware from data using tools such as graph analytics and machine learning. Behavior-based detection is di cult because we have to deal with intelligent and highly motivated attackers, who can change their strategy to maximize the chance of getting access to computer networks. We narrow our research to the domain of Windows malware detection and we are particularly interested in two approaches of behavior-based detection: periodic behavior and behavior evolution. Periodic behavior refers to the regular activities programmed by attackers such as periodic polling for server connection or periodic update of the victim machine's status. Behavior evolution refers to the change in behavior of malware over time. In the first approach, we aim to exploit the periodic behavior for malware detection. The main analysis tool in this direction is Fourier transform, which is used to convert time-domain signals into frequency domain signals. This idea is motivated by the fact that it is often easier to analyze periodic signals in the frequency domain than in the original time domain. Using Fourier transform, we propose a novel frequency-based periodicity measure to evaluate the regularity of network traffic. Another challenge in this direction is that, other than malware, most automatic services of operating systems also generate periodic signals. To address this challenge, we propose a new visual analytics solution for effective alert verification. In the second approach, we aim to develop adaptive learning algorithms to capture malware samples, whose behavior changes over time. We capitalize on the well-known online machine learning framework of Follow the Regularized Leader (FTRL). Our main contribution in this direction is the usage of an adaptive decaying factor to allow FTRL algorithms to better perform in environments with concept drifts. The decaying factor helps to increasingly discount the contribution of the examples in the past, thereby alleviating the problem of concept drifts. We advance the state of the art in this direction by proposing a new adaptive online algorithm to handle the problem of concept drift in malware detection. Our improved algorithm has also been successfully applied to other non-security domains. Doctor of Philosophy 2019-09-17T01:37:30Z 2019-12-06T18:41:44Z 2019-09-17T01:37:30Z 2019-12-06T18:41:44Z 2019 Thesis Huynh, N. A. (2019). Frequency analysis and online learning in malware detection. Doctoral thesis, Nanyang Technological University, Singapore. https://hdl.handle.net/10356/93574 http://hdl.handle.net/10220/49944 10.32657/10356/93574 en 158 p. application/pdf
institution Nanyang Technological University
building NTU Library
continent Asia
country Singapore
Singapore
content_provider NTU Library
collection DR-NTU
language English
topic Engineering::Computer science and engineering::Computing methodologies::Artificial intelligence
spellingShingle Engineering::Computer science and engineering::Computing methodologies::Artificial intelligence
Huynh, Ngoc Anh
Frequency analysis and online learning in malware detection
description Traditional antivirus products are signature-based solutions, which rely on a static database to perform detection. The weakness of this design is that the signatures may become outdated, resulting in the failure to detect new samples. The other method is behavior-based detection, which aims to identify malware based on their dynamic behavior. Behavior-based detection comes in two approaches. The first approach leverages on common known behaviors of malware such as random domain name generation and periodicity. The second approach aims to directly learn the behavior of malware from data using tools such as graph analytics and machine learning. Behavior-based detection is di cult because we have to deal with intelligent and highly motivated attackers, who can change their strategy to maximize the chance of getting access to computer networks. We narrow our research to the domain of Windows malware detection and we are particularly interested in two approaches of behavior-based detection: periodic behavior and behavior evolution. Periodic behavior refers to the regular activities programmed by attackers such as periodic polling for server connection or periodic update of the victim machine's status. Behavior evolution refers to the change in behavior of malware over time. In the first approach, we aim to exploit the periodic behavior for malware detection. The main analysis tool in this direction is Fourier transform, which is used to convert time-domain signals into frequency domain signals. This idea is motivated by the fact that it is often easier to analyze periodic signals in the frequency domain than in the original time domain. Using Fourier transform, we propose a novel frequency-based periodicity measure to evaluate the regularity of network traffic. Another challenge in this direction is that, other than malware, most automatic services of operating systems also generate periodic signals. To address this challenge, we propose a new visual analytics solution for effective alert verification. In the second approach, we aim to develop adaptive learning algorithms to capture malware samples, whose behavior changes over time. We capitalize on the well-known online machine learning framework of Follow the Regularized Leader (FTRL). Our main contribution in this direction is the usage of an adaptive decaying factor to allow FTRL algorithms to better perform in environments with concept drifts. The decaying factor helps to increasingly discount the contribution of the examples in the past, thereby alleviating the problem of concept drifts. We advance the state of the art in this direction by proposing a new adaptive online algorithm to handle the problem of concept drift in malware detection. Our improved algorithm has also been successfully applied to other non-security domains.
author2 Ng Wee Keong
author_facet Ng Wee Keong
Huynh, Ngoc Anh
format Theses and Dissertations
author Huynh, Ngoc Anh
author_sort Huynh, Ngoc Anh
title Frequency analysis and online learning in malware detection
title_short Frequency analysis and online learning in malware detection
title_full Frequency analysis and online learning in malware detection
title_fullStr Frequency analysis and online learning in malware detection
title_full_unstemmed Frequency analysis and online learning in malware detection
title_sort frequency analysis and online learning in malware detection
publishDate 2019
url https://hdl.handle.net/10356/93574
http://hdl.handle.net/10220/49944
_version_ 1683493123781558272