MALWARE DETECTION SYSTEM WITH SELFSUPERVISED LEARNING AND MULTIMODAL

Malware has become a serious threat to the internet. According to antivirus company McAfee, an average of 588 malware attacks occurs every minute. The LockBit ransomware infected Indonesia National Data Center and caused the downtime of 282 Indonesian government institutions’ services for more th...

Full description

Saved in:

Bibliographic Details
Main Author:	Juli Irzal Ismail, Setia
Format:	Dissertations
Language:	Indonesia
Online Access:	https://digilib.itb.ac.id/gdl/view/87121
Tags:	Add Tag No Tags, Be the first to tag this record!
Institution:	Institut Teknologi Bandung
Language:	Indonesia

id	id-itb.:87121
spelling	id-itb.:871212025-01-13T13:37:20ZMALWARE DETECTION SYSTEM WITH SELFSUPERVISED LEARNING AND MULTIMODAL Juli Irzal Ismail, Setia Indonesia Dissertations malware detection, machine learning, self -supervised learning, image representation, multimodal. INSTITUT TEKNOLOGI BANDUNG https://digilib.itb.ac.id/gdl/view/87121 Malware has become a serious threat to the internet. According to antivirus company McAfee, an average of 588 malware attacks occurs every minute. The LockBit ransomware infected Indonesia National Data Center and caused the downtime of 282 Indonesian government institutions’ services for more than a week. Malware also caused economic losses. During 2023, it was reported that $1.1 trillion in ransomware payments were made. To detect malware, antivirus still relies on signature-based and heuristic-based detection techniques. This technique is effective for detecting malware. However, malware signatures and heuristic rules are compiled by malware analysts manually. It takes time and special skills to detect malware and create a signature. With the growing number of malwares, an automatic malware detection process is required. For this reason, machine learning technology is implemented for malware detection. With machine learning, the malware detection process is carried out automatically. However, the implementation of machine learning on malware detection still faces several problems. First the dataset labeling process takes a significant amount of time. Second, machine learning is not yet capable of detecting new malware. This highlights the need for a new malware detection method. In this study, a new malware detection method using machine learning is proposed to address these problems. The approach involves developing new detection techniques based on self -supervised learning methods and a multimodal architecture. Self -supervised learning techniques, which have been successfully applied in computer vision, achieve competitive results with supervised learning techniques but do not require an extensive labeling process. A novel malware detection method based on self-supervised has been developed, eliminating the need for a large labeling process. New malware was detected using multimodal methods. Malware files were converted into images, and their patterns were analyzed. Assuming that new malware reused code from known malware, the multimodal identified new malware by recognizing patterns from previously identified malware. The multimodal architecture combines two malware detection ii methods: one using image representations and the other using audio representations. The development of the proposed malware detection method is divided into three stages. The first stage involves developing a malware detection method using an image-based representation method with self -supervised learning (SSL). The second stage focuses on the development of a malware detection method utilizing audio representation with convolutional neural network (CNN). Finally, the third stage involves the development of multimodal architecture. The methods for all three stages are conducted using an experimental approach. The novelty of this research lies in the development of MalSSL, a malware detection method that does not require an extensive labeling process and multimodal approach to recognize new malware. MalSSL, the proposed method based on self-supervised learning and image representation, achieves malware classification accuracy of 98.4% without the need for labeling. The multimodal architecture, which combines image and audio representations using a late fusion approach, can detect new malware variants with an accuracy of 95.1%. Additionally, it achieves an accuracy of 99.7% in classifying known malware. text
institution	Institut Teknologi Bandung
building	Institut Teknologi Bandung Library
continent	Asia
country	Indonesia Indonesia
content_provider	Institut Teknologi Bandung
collection	Digital ITB
language	Indonesia
description	Malware has become a serious threat to the internet. According to antivirus company McAfee, an average of 588 malware attacks occurs every minute. The LockBit ransomware infected Indonesia National Data Center and caused the downtime of 282 Indonesian government institutions’ services for more than a week. Malware also caused economic losses. During 2023, it was reported that $1.1 trillion in ransomware payments were made. To detect malware, antivirus still relies on signature-based and heuristic-based detection techniques. This technique is effective for detecting malware. However, malware signatures and heuristic rules are compiled by malware analysts manually. It takes time and special skills to detect malware and create a signature. With the growing number of malwares, an automatic malware detection process is required. For this reason, machine learning technology is implemented for malware detection. With machine learning, the malware detection process is carried out automatically. However, the implementation of machine learning on malware detection still faces several problems. First the dataset labeling process takes a significant amount of time. Second, machine learning is not yet capable of detecting new malware. This highlights the need for a new malware detection method. In this study, a new malware detection method using machine learning is proposed to address these problems. The approach involves developing new detection techniques based on self -supervised learning methods and a multimodal architecture. Self -supervised learning techniques, which have been successfully applied in computer vision, achieve competitive results with supervised learning techniques but do not require an extensive labeling process. A novel malware detection method based on self-supervised has been developed, eliminating the need for a large labeling process. New malware was detected using multimodal methods. Malware files were converted into images, and their patterns were analyzed. Assuming that new malware reused code from known malware, the multimodal identified new malware by recognizing patterns from previously identified malware. The multimodal architecture combines two malware detection ii methods: one using image representations and the other using audio representations. The development of the proposed malware detection method is divided into three stages. The first stage involves developing a malware detection method using an image-based representation method with self -supervised learning (SSL). The second stage focuses on the development of a malware detection method utilizing audio representation with convolutional neural network (CNN). Finally, the third stage involves the development of multimodal architecture. The methods for all three stages are conducted using an experimental approach. The novelty of this research lies in the development of MalSSL, a malware detection method that does not require an extensive labeling process and multimodal approach to recognize new malware. MalSSL, the proposed method based on self-supervised learning and image representation, achieves malware classification accuracy of 98.4% without the need for labeling. The multimodal architecture, which combines image and audio representations using a late fusion approach, can detect new malware variants with an accuracy of 95.1%. Additionally, it achieves an accuracy of 99.7% in classifying known malware.
format	Dissertations
author	Juli Irzal Ismail, Setia
spellingShingle	Juli Irzal Ismail, Setia MALWARE DETECTION SYSTEM WITH SELFSUPERVISED LEARNING AND MULTIMODAL
author_facet	Juli Irzal Ismail, Setia
author_sort	Juli Irzal Ismail, Setia
title	MALWARE DETECTION SYSTEM WITH SELFSUPERVISED LEARNING AND MULTIMODAL
title_short	MALWARE DETECTION SYSTEM WITH SELFSUPERVISED LEARNING AND MULTIMODAL
title_full	MALWARE DETECTION SYSTEM WITH SELFSUPERVISED LEARNING AND MULTIMODAL
title_fullStr	MALWARE DETECTION SYSTEM WITH SELFSUPERVISED LEARNING AND MULTIMODAL
title_full_unstemmed	MALWARE DETECTION SYSTEM WITH SELFSUPERVISED LEARNING AND MULTIMODAL
title_sort	malware detection system with selfsupervised learning and multimodal
url	https://digilib.itb.ac.id/gdl/view/87121
_version_	1822011269993988096

MALWARE DETECTION SYSTEM WITH SELFSUPERVISED LEARNING AND MULTIMODAL

Similar Items