Scenario-based insider threat detection from cyber activities

An insider threat scenario refers to the outcome of a set of malicious activities caused by intentional or unintentional misuse of the organization's systems, networks, data, and resources. Prevention of insider threat is difficult, since trusted partners of the organization are involved in it,...

Full description

Saved in:

Bibliographic Details
Main Authors:	Chattopadhyay, Pratik, Wang, Lipo, Tan, Yap-Peng
Other Authors:	School of Electrical and Electronic Engineering
Format:	Article
Language:	English
Published:	2020
Subjects:	Engineering::Electrical and electronic engineering Cost-sensitive Learning Imbalanced Data
Online Access:	https://hdl.handle.net/10356/140631
Tags:	Add Tag No Tags, Be the first to tag this record!
Institution:	Nanyang Technological University
Language:	English

id	sg-ntu-dr.10356-140631
record_format	dspace
spelling	sg-ntu-dr.10356-1406312020-06-01T02:51:37Z Scenario-based insider threat detection from cyber activities Chattopadhyay, Pratik Wang, Lipo Tan, Yap-Peng School of Electrical and Electronic Engineering Engineering::Electrical and electronic engineering Cost-sensitive Learning Imbalanced Data An insider threat scenario refers to the outcome of a set of malicious activities caused by intentional or unintentional misuse of the organization's systems, networks, data, and resources. Prevention of insider threat is difficult, since trusted partners of the organization are involved in it, who have authorized access to these confidential/sensitive resources. The state-of-the-art research on insider threat detection mostly focuses on developing unsupervised behavioral anomaly detection techniques with the objective of finding out anomalousness or abnormal changes in user behavior over time. However, an anomalous activity is not necessarily malicious that can lead to an insider threat scenario. As an improvement to the existing approaches, we propose a technique for insider threat detection from time-series classification of user activities. Initially, a set of single-day features is computed from the user activity logs. A time-series feature vector is next constructed from the statistics of each single-day feature over a period of time. The label of each time-series feature vector (whether malicious or nonmalicious) is extracted from the ground truth. To classify the imbalanced ground-truth insider threat data consisting of only a small number of malicious instances, we employ a cost-sensitive data adjustment technique that undersamples the nonmalicious class instances randomly. As a classifier, we employ a two-layered deep autoencoder neural network and compare its performance with other popularly used classifiers: Random forest and multilayer perceptron. Encouraging results are obtained by evaluating our approach using the CMU Insider Threat Data, which is the only publicly available insider threat data set consisting of about 14-GB web-browsing logs, along with logon, device connection, file transfer, and e-mail log files. We observe that both deep autoencoder and random forest classifiers classify the data-adjusted time-series feature set with high precision, recall, and f-score. Although multilayer perceptron has a high recall, it suffers from a lower precision and f-score compared to the other two classifiers. 2020-06-01T02:51:37Z 2020-06-01T02:51:37Z 2018 Journal Article Chattopadhyay, P., Wang, L., & Tan, Y.-P. (2018). Scenario-based insider threat detection from cyber activities. IEEE Transactions on Computational Social Systems, 5(3), 660-675. doi:10.1109/tcss.2018.2857473 2329-924X https://hdl.handle.net/10356/140631 10.1109/TCSS.2018.2857473 2-s2.0-85052700825 3 5 660 675 en IEEE Transactions on Computational Social Systems © 2018 IEEE. All rights reserved.
institution	Nanyang Technological University
building	NTU Library
country	Singapore
collection	DR-NTU
language	English
topic	Engineering::Electrical and electronic engineering Cost-sensitive Learning Imbalanced Data
spellingShingle	Engineering::Electrical and electronic engineering Cost-sensitive Learning Imbalanced Data Chattopadhyay, Pratik Wang, Lipo Tan, Yap-Peng Scenario-based insider threat detection from cyber activities
description	An insider threat scenario refers to the outcome of a set of malicious activities caused by intentional or unintentional misuse of the organization's systems, networks, data, and resources. Prevention of insider threat is difficult, since trusted partners of the organization are involved in it, who have authorized access to these confidential/sensitive resources. The state-of-the-art research on insider threat detection mostly focuses on developing unsupervised behavioral anomaly detection techniques with the objective of finding out anomalousness or abnormal changes in user behavior over time. However, an anomalous activity is not necessarily malicious that can lead to an insider threat scenario. As an improvement to the existing approaches, we propose a technique for insider threat detection from time-series classification of user activities. Initially, a set of single-day features is computed from the user activity logs. A time-series feature vector is next constructed from the statistics of each single-day feature over a period of time. The label of each time-series feature vector (whether malicious or nonmalicious) is extracted from the ground truth. To classify the imbalanced ground-truth insider threat data consisting of only a small number of malicious instances, we employ a cost-sensitive data adjustment technique that undersamples the nonmalicious class instances randomly. As a classifier, we employ a two-layered deep autoencoder neural network and compare its performance with other popularly used classifiers: Random forest and multilayer perceptron. Encouraging results are obtained by evaluating our approach using the CMU Insider Threat Data, which is the only publicly available insider threat data set consisting of about 14-GB web-browsing logs, along with logon, device connection, file transfer, and e-mail log files. We observe that both deep autoencoder and random forest classifiers classify the data-adjusted time-series feature set with high precision, recall, and f-score. Although multilayer perceptron has a high recall, it suffers from a lower precision and f-score compared to the other two classifiers.
author2	School of Electrical and Electronic Engineering
author_facet	School of Electrical and Electronic Engineering Chattopadhyay, Pratik Wang, Lipo Tan, Yap-Peng
format	Article
author	Chattopadhyay, Pratik Wang, Lipo Tan, Yap-Peng
author_sort	Chattopadhyay, Pratik
title	Scenario-based insider threat detection from cyber activities
title_short	Scenario-based insider threat detection from cyber activities
title_full	Scenario-based insider threat detection from cyber activities
title_fullStr	Scenario-based insider threat detection from cyber activities
title_full_unstemmed	Scenario-based insider threat detection from cyber activities
title_sort	scenario-based insider threat detection from cyber activities
publishDate	2020
url	https://hdl.handle.net/10356/140631
_version_	1681059029439217664

Scenario-based insider threat detection from cyber activities

Similar Items