EXPLORING COMPLEX NETWORK WITH TREE-BASED MODEL ON OPIUM EPIDEMIC AT USA
This paper provides a tree-based model to analyze the dynamic distribution of synthetic substance on opium epidemic in USA. The exploratory data analysis was done using data from a publication by National Forensic Laboratory Information System (NFLIS) of USA in Ohio, Kentucky, Pennsylvania, Virgi...
Saved in:
Main Author: | |
---|---|
Format: | Final Project |
Language: | Indonesia |
Online Access: | https://digilib.itb.ac.id/gdl/view/42384 |
Tags: |
Add Tag
No Tags, Be the first to tag this record!
|
Institution: | Institut Teknologi Bandung |
Language: | Indonesia |
Summary: | This paper provides a tree-based model to analyze the dynamic distribution of
synthetic substance on opium epidemic in USA. The exploratory data analysis
was done using data from a publication by National Forensic Laboratory Information
System (NFLIS) of USA in Ohio, Kentucky, Pennsylvania, Virginia and
West Virginia States from 2010 to 2017. Using undirected multigraph framework,
firstly this paper model every county in the system as a node and connect them to
other nodes with the sharing substance in their adjacency circumstances for every
year. It shows the signal of opium crisis in 2015 - 2017 by the consistent growth of
the size, density, node distribution, number of clique’s and maximal cliques distribution
on the graph in the years before. A classification decision tree inferences
model was built as the base model for the dynamic distribution of the synthetic
substance in every year. This model is aimed to predict the probability of the new
substance exist in each county for the next time step by analyzing the influences
of the nodes in the adjacency circumstances. The count on drug reports, distance
between counties in the graphs, and other factor are engineered to figure out the
node strength, node acceptance, and drug strength features that could explain the
epidemic phenomena by the model proposed. Finally, recall metrics was used to
evaluate the accuracy of the model, and the model development to random forest
been done to increase the accuracy of the model and reduce the risk of over-fitting.
The result shows a good recall accuracy for predicting the existence of the new
substance for each counties in 2017. |
---|