Malware data collection and analysis

Malicious software, referred to as malware, is one of the main threats on the Internet in the present day. Millions of hosts on the Internet are infected with malware, ranging from classic computer viruses to Internet worms and bot networks. A huge increase in the number of malware samples are colle...

Full description

Saved in:
Bibliographic Details
Main Author: Low, Song Chuan.
Other Authors: Chen Lihui
Format: Final Year Project
Language:English
Published: 2013
Subjects:
Online Access:http://hdl.handle.net/10356/54413
Tags: Add Tag
No Tags, Be the first to tag this record!
Institution: Nanyang Technological University
Language: English
Description
Summary:Malicious software, referred to as malware, is one of the main threats on the Internet in the present day. Millions of hosts on the Internet are infected with malware, ranging from classic computer viruses to Internet worms and bot networks. A huge increase in the number of malware samples are collected by anti-virus vendors. In this project, malware data collection and analysis tools had been reviewed. A malware data report collection procedure has been successfully automated with CAPTCHA (Completely Automated Public Turing test to tell Computers and Humans Apart) evading technique when submitting malware data set to various malware analysis tools. The details design and implementation of the evading CAPTCHA technique for various malware analysis tools were presented in the report. Simulations on data collections were conducted to demonstrate the success of the technique implemented. After the reports are collected, pre-processing of the reports are needed to clean the data which is an important for data representation. The process of pre-processing reports includes junk characters removal such as hash code, long string of symbol and numbers, while keeping the rest of the information in each report. Other than report collection and pre-processing of reports, separation of the dataset into training and testing dataset is needed for building machine learning classifier in malware data analysis.