MALWARE CLASSIFICATION USING SUPPORT VECTOR MACHINE ALGORITHM WITH LINEARSVC APPROACH

Malware or Malicious Software is designed to damage, steal important information or data, disrupt computer performance, and other criminal acts on computers or devices that can harm computer owners to large companies. Malware can infect computers via flash disk, links distributed via email, pirated...

Full description

Saved in:
Bibliographic Details
Main Author: Maryam, Zahrina
Format: Theses
Language:Indonesia
Online Access:https://digilib.itb.ac.id/gdl/view/57191
Tags: Add Tag
No Tags, Be the first to tag this record!
Institution: Institut Teknologi Bandung
Language: Indonesia
Description
Summary:Malware or Malicious Software is designed to damage, steal important information or data, disrupt computer performance, and other criminal acts on computers or devices that can harm computer owners to large companies. Malware can infect computers via flash disk, links distributed via email, pirated applications, pirated operating systems, advertisements, fake download buttons, and so on. Some examples of malware specifications based on the type or method of distribution and their impact are viruses, trojans, spyware, worms, adware, scareware, ransomware, and so on. The number of malwares every day continues to grow. The National Cybersecurity Operations Center for the National Cyber and Passwords Agency (BSSN) noted that 88,414,296 cyberattacks had occurred from January 1, 2020, to April 12, 2020. This of course greatly complicates the malware analysis and detection process. With these problems, we need a system that can detect malware automatically. One technique that can be used is machine learning (ML). The purpose of this thesis is to create a system that can detect malware automatically using machine learning. The classification system uses the Support Vector Machine (SVM) algorithm with a Linear SVC approach and is tested with the EMBER dataset. The first test scenario is to compare the accuracy results of the three approaches to SVM, namely SVC, NuSVC, and LinearSVC. The highest accuracy is obtained from the LinearSVC approach, which is 84.91% using 14710 train data samples and 10000 test data samples. In the second and third scenarios, it can be concluded that the amount of data used and the changed LinearSVC parameters can affect the accuracy, precision, recall, and f1score results. The more data, the performance will increase.