EXPLORING COMPLEX NETWORK WITH TREE-BASED MODEL ON OPIUM EPIDEMIC AT USA

This paper provides a tree-based model to analyze the dynamic distribution of synthetic substance on opium epidemic in USA. The exploratory data analysis was done using data from a publication by National Forensic Laboratory Information System (NFLIS) of USA in Ohio, Kentucky, Pennsylvania, Virgi...

Full description

Saved in:
Bibliographic Details
Main Author: Gede Bagus Gigih Ferdian Bas, I
Format: Final Project
Language:Indonesia
Online Access:https://digilib.itb.ac.id/gdl/view/42384
Tags: Add Tag
No Tags, Be the first to tag this record!
Institution: Institut Teknologi Bandung
Language: Indonesia
Description
Summary:This paper provides a tree-based model to analyze the dynamic distribution of synthetic substance on opium epidemic in USA. The exploratory data analysis was done using data from a publication by National Forensic Laboratory Information System (NFLIS) of USA in Ohio, Kentucky, Pennsylvania, Virginia and West Virginia States from 2010 to 2017. Using undirected multigraph framework, firstly this paper model every county in the system as a node and connect them to other nodes with the sharing substance in their adjacency circumstances for every year. It shows the signal of opium crisis in 2015 - 2017 by the consistent growth of the size, density, node distribution, number of clique’s and maximal cliques distribution on the graph in the years before. A classification decision tree inferences model was built as the base model for the dynamic distribution of the synthetic substance in every year. This model is aimed to predict the probability of the new substance exist in each county for the next time step by analyzing the influences of the nodes in the adjacency circumstances. The count on drug reports, distance between counties in the graphs, and other factor are engineered to figure out the node strength, node acceptance, and drug strength features that could explain the epidemic phenomena by the model proposed. Finally, recall metrics was used to evaluate the accuracy of the model, and the model development to random forest been done to increase the accuracy of the model and reduce the risk of over-fitting. The result shows a good recall accuracy for predicting the existence of the new substance for each counties in 2017.