Data pre-processing and analysis for insider threat detection

Insider threat is one of the most prominent concern in many companies. In this scenario, insiders are people with authorized access to sensitive information within the company. Insider threats are difficult to detect and thus, it is not enough to address simply through technical means alone. In orde...

Full description

Saved in:
Bibliographic Details
Main Author: Ho, See Cheng
Other Authors: Chen Lihui
Format: Final Year Project
Language:English
Published: 2019
Subjects:
Online Access:http://hdl.handle.net/10356/77551
Tags: Add Tag
No Tags, Be the first to tag this record!
Institution: Nanyang Technological University
Language: English
Description
Summary:Insider threat is one of the most prominent concern in many companies. In this scenario, insiders are people with authorized access to sensitive information within the company. Insider threats are difficult to detect and thus, it is not enough to address simply through technical means alone. In order to help with the early detection of insider threats, emotional and social factors are to be considered. The project analyses emails, one of the most common mode of communication in most organizations and uses deep learning techniques to build user profile that includes sentiment and network information. These user profiles are updated at fixed interval and the anomalous users are viewed as potential insider threats. In this project, the author is tasked with the testing of the TWOs dataset with the current Insider Threat Detection. The results would determine the viability of the dataset as additional training data for the system on top of its existing dataset. Additionally, the author was also involved in the development of a general preprocessing tools that processes datasets containing information of emails. The report will cover the methodology used for data processing and aspect extraction of the “The Wolves of SUTD” (TWOS) dataset. After the going through the two processes, the dataset will then be fed into an existing framework using ABSA model and HIN-Skipgram user profiling model to evaluate the top 10 user profiles with anomalies. The evaluated results will then be discussed. These results will also be compared with the author’s manual data analysis of the dataset.