Calculating distances between Windows malware using siamese neural network embeddings

In recent years, the rate of growth of unique Windows malware samples has grown significantly. This rapid growth has made manual inspection of every malware sample an impossible task. One way to minimize this problem is through auto clustering of unknown malware samples into clusters of similar file...

全面介紹

Saved in:
書目詳細資料
主要作者: Sison, Marc Oliver Tan
格式: text
語言:English
出版: Animo Repository 2021
主題:
在線閱讀:https://animorepository.dlsu.edu.ph/etdm_comsci/12
https://animorepository.dlsu.edu.ph/cgi/viewcontent.cgi?article=1014&context=etdm_comsci
標簽: 添加標簽
沒有標簽, 成為第一個標記此記錄!
實物特徵
總結:In recent years, the rate of growth of unique Windows malware samples has grown significantly. This rapid growth has made manual inspection of every malware sample an impossible task. One way to minimize this problem is through auto clustering of unknown malware samples into clusters of similar files. Auto clustering done in this way would allow malware researchers to identify large clusters, as well as analyzing entire clusters using only a few representatives of each cluster. Much work has been done in machine learning with regards to the problem of clustering malware samples. However, previous work has mostly focused on clustering into known malware families, or require dynamic features which are prohibitively slow to extract given the amount of new malware samples. This paper proposes training a siamese neural network using engineered static features to generate embeddings that can be used to calculate the distances between malware files. The engineered features would be carefully chosen so that the distances calculated from the resulting embeddings would be resistant to a certain degree of malware metamorphism, as well as generalizing well to Windows files as a whole instead of specific malware families. This would also enable a type of one-shot learning detection, where multiple unknown malware samples can be detected using the distance from a known malicious files.