Malware data collection and analysis

Malicious software, referred to as malware, is one of the main threats on the Internet in the present day. Millions of hosts on the Internet are infected with malware, ranging from classic computer viruses to Internet worms and bot networks. A huge increase in the number of malware samples are colle...

وصف كامل

محفوظ في:
التفاصيل البيبلوغرافية
المؤلف الرئيسي: Low, Song Chuan.
مؤلفون آخرون: Chen Lihui
التنسيق: Final Year Project
اللغة:English
منشور في: 2013
الموضوعات:
الوصول للمادة أونلاين:http://hdl.handle.net/10356/54413
الوسوم: إضافة وسم
لا توجد وسوم, كن أول من يضع وسما على هذه التسجيلة!
الوصف
الملخص:Malicious software, referred to as malware, is one of the main threats on the Internet in the present day. Millions of hosts on the Internet are infected with malware, ranging from classic computer viruses to Internet worms and bot networks. A huge increase in the number of malware samples are collected by anti-virus vendors. In this project, malware data collection and analysis tools had been reviewed. A malware data report collection procedure has been successfully automated with CAPTCHA (Completely Automated Public Turing test to tell Computers and Humans Apart) evading technique when submitting malware data set to various malware analysis tools. The details design and implementation of the evading CAPTCHA technique for various malware analysis tools were presented in the report. Simulations on data collections were conducted to demonstrate the success of the technique implemented. After the reports are collected, pre-processing of the reports are needed to clean the data which is an important for data representation. The process of pre-processing reports includes junk characters removal such as hash code, long string of symbol and numbers, while keeping the rest of the information in each report. Other than report collection and pre-processing of reports, separation of the dataset into training and testing dataset is needed for building machine learning classifier in malware data analysis.