CLASSIFICATION OF MALWARE USING MACHINE LEARNING BASED ON IMAGE PROCESSING

Malware or Malicious Software is malicious software designed to damage, steal important information or data, interfere with computer performance, and other criminal acts on computers or devices that can harm users. The National Cyber and Crypto Agency (BSSN) said cyber-attacks of a technical nature...

Full description

Saved in:
Bibliographic Details
Main Author: Akbar Abhesa, Radifa
Format: Theses
Language:Indonesia
Online Access:https://digilib.itb.ac.id/gdl/view/57190
Tags: Add Tag
No Tags, Be the first to tag this record!
Institution: Institut Teknologi Bandung
Language: Indonesia
Description
Summary:Malware or Malicious Software is malicious software designed to damage, steal important information or data, interfere with computer performance, and other criminal acts on computers or devices that can harm users. The National Cyber and Crypto Agency (BSSN) said cyber-attacks of a technical nature in 2020 reached 495,337,202 in Indonesia. This number has doubled compared to 2019 which only reached 228,277,875. To prevent the spread and harm caused by malware, there are various methods such as using machine learning to detect and classify software suspected of being malware. The malware analysis method consists of a static method, where the suspected malware is not executed and a dynamic method, when the software is run to see and analyze its behavior. However, such an approach still takes a long time because it requires various kinds of feature analysis obtained from various types of malwares (feature extraction). In this thesis, a different malware analysis method will be proposed, namely using program visualization and image processing. This method is considered capable of producing a faster analysis process because the analysis process is uniform and overall based on visuals. This thesis aims to explain the process of classifying malware using machine learning methods based on image processing. The steps taken are to convert the software program suspected of being malware into binary bits, then convert them into strings, 8-bit vectors, and then into grayscale images. Convolutional Neural Network (CNN) is used to process malware visualization datasets so that visual patterns can be found with each other. The final model is expected to identify malware into one of the categories/families of an operating system. Parameter testing carried out in the form of measurement of accuracy, error, precision, and sensitivity of the model using a confusion matrix. In the end, the experiment was able to produce a machine learning model with an accuracy rate of 94%.