Scenario-based insider threat detection from cyber activities

An insider threat scenario refers to the outcome of a set of malicious activities caused by intentional or unintentional misuse of the organization's systems, networks, data, and resources. Prevention of insider threat is difficult, since trusted partners of the organization are involved in it,...

Full description

Saved in:
Bibliographic Details
Main Authors: Chattopadhyay, Pratik, Wang, Lipo, Tan, Yap-Peng
Other Authors: School of Electrical and Electronic Engineering
Format: Article
Language:English
Published: 2020
Subjects:
Online Access:https://hdl.handle.net/10356/140631
Tags: Add Tag
No Tags, Be the first to tag this record!
Institution: Nanyang Technological University
Language: English
id sg-ntu-dr.10356-140631
record_format dspace
spelling sg-ntu-dr.10356-1406312020-06-01T02:51:37Z Scenario-based insider threat detection from cyber activities Chattopadhyay, Pratik Wang, Lipo Tan, Yap-Peng School of Electrical and Electronic Engineering Engineering::Electrical and electronic engineering Cost-sensitive Learning Imbalanced Data An insider threat scenario refers to the outcome of a set of malicious activities caused by intentional or unintentional misuse of the organization's systems, networks, data, and resources. Prevention of insider threat is difficult, since trusted partners of the organization are involved in it, who have authorized access to these confidential/sensitive resources. The state-of-the-art research on insider threat detection mostly focuses on developing unsupervised behavioral anomaly detection techniques with the objective of finding out anomalousness or abnormal changes in user behavior over time. However, an anomalous activity is not necessarily malicious that can lead to an insider threat scenario. As an improvement to the existing approaches, we propose a technique for insider threat detection from time-series classification of user activities. Initially, a set of single-day features is computed from the user activity logs. A time-series feature vector is next constructed from the statistics of each single-day feature over a period of time. The label of each time-series feature vector (whether malicious or nonmalicious) is extracted from the ground truth. To classify the imbalanced ground-truth insider threat data consisting of only a small number of malicious instances, we employ a cost-sensitive data adjustment technique that undersamples the nonmalicious class instances randomly. As a classifier, we employ a two-layered deep autoencoder neural network and compare its performance with other popularly used classifiers: Random forest and multilayer perceptron. Encouraging results are obtained by evaluating our approach using the CMU Insider Threat Data, which is the only publicly available insider threat data set consisting of about 14-GB web-browsing logs, along with logon, device connection, file transfer, and e-mail log files. We observe that both deep autoencoder and random forest classifiers classify the data-adjusted time-series feature set with high precision, recall, and f-score. Although multilayer perceptron has a high recall, it suffers from a lower precision and f-score compared to the other two classifiers. 2020-06-01T02:51:37Z 2020-06-01T02:51:37Z 2018 Journal Article Chattopadhyay, P., Wang, L., & Tan, Y.-P. (2018). Scenario-based insider threat detection from cyber activities. IEEE Transactions on Computational Social Systems, 5(3), 660-675. doi:10.1109/tcss.2018.2857473 2329-924X https://hdl.handle.net/10356/140631 10.1109/TCSS.2018.2857473 2-s2.0-85052700825 3 5 660 675 en IEEE Transactions on Computational Social Systems © 2018 IEEE. All rights reserved.
institution Nanyang Technological University
building NTU Library
country Singapore
collection DR-NTU
language English
topic Engineering::Electrical and electronic engineering
Cost-sensitive Learning
Imbalanced Data
spellingShingle Engineering::Electrical and electronic engineering
Cost-sensitive Learning
Imbalanced Data
Chattopadhyay, Pratik
Wang, Lipo
Tan, Yap-Peng
Scenario-based insider threat detection from cyber activities
description An insider threat scenario refers to the outcome of a set of malicious activities caused by intentional or unintentional misuse of the organization's systems, networks, data, and resources. Prevention of insider threat is difficult, since trusted partners of the organization are involved in it, who have authorized access to these confidential/sensitive resources. The state-of-the-art research on insider threat detection mostly focuses on developing unsupervised behavioral anomaly detection techniques with the objective of finding out anomalousness or abnormal changes in user behavior over time. However, an anomalous activity is not necessarily malicious that can lead to an insider threat scenario. As an improvement to the existing approaches, we propose a technique for insider threat detection from time-series classification of user activities. Initially, a set of single-day features is computed from the user activity logs. A time-series feature vector is next constructed from the statistics of each single-day feature over a period of time. The label of each time-series feature vector (whether malicious or nonmalicious) is extracted from the ground truth. To classify the imbalanced ground-truth insider threat data consisting of only a small number of malicious instances, we employ a cost-sensitive data adjustment technique that undersamples the nonmalicious class instances randomly. As a classifier, we employ a two-layered deep autoencoder neural network and compare its performance with other popularly used classifiers: Random forest and multilayer perceptron. Encouraging results are obtained by evaluating our approach using the CMU Insider Threat Data, which is the only publicly available insider threat data set consisting of about 14-GB web-browsing logs, along with logon, device connection, file transfer, and e-mail log files. We observe that both deep autoencoder and random forest classifiers classify the data-adjusted time-series feature set with high precision, recall, and f-score. Although multilayer perceptron has a high recall, it suffers from a lower precision and f-score compared to the other two classifiers.
author2 School of Electrical and Electronic Engineering
author_facet School of Electrical and Electronic Engineering
Chattopadhyay, Pratik
Wang, Lipo
Tan, Yap-Peng
format Article
author Chattopadhyay, Pratik
Wang, Lipo
Tan, Yap-Peng
author_sort Chattopadhyay, Pratik
title Scenario-based insider threat detection from cyber activities
title_short Scenario-based insider threat detection from cyber activities
title_full Scenario-based insider threat detection from cyber activities
title_fullStr Scenario-based insider threat detection from cyber activities
title_full_unstemmed Scenario-based insider threat detection from cyber activities
title_sort scenario-based insider threat detection from cyber activities
publishDate 2020
url https://hdl.handle.net/10356/140631
_version_ 1681059029439217664