IDENTIFICATION OF HEALTH CARE PROVIDER FRAUD USING SUPPORT VECTOR MACHINE
One of the most important problems in the insurance industry is fraud which causes huge losses. Deliberate fraud by hiding or omitting facts when submitting a claim is considered a fraudulent activity in the health insurance sector which causes large losses for insurance companies. Fraudulent act...
Saved in:
Main Author: | |
---|---|
Format: | Final Project |
Language: | Indonesia |
Online Access: | https://digilib.itb.ac.id/gdl/view/81303 |
Tags: |
Add Tag
No Tags, Be the first to tag this record!
|
Institution: | Institut Teknologi Bandung |
Language: | Indonesia |
Summary: | One of the most important problems in the insurance industry is fraud which causes
huge losses. Deliberate fraud by hiding or omitting facts when submitting a claim
is considered a fraudulent activity in the health insurance sector which causes large
losses for insurance companies. Fraudulent acts are increasingly diverse and the
amount of data is also growing, making it quite difficult to recognize fraudulent
acts from large data sets. One way to overcome this fraud is to detect it using
machine learning. In this research, the machine learning methods used are linear
support vector machines and nonlinear support vector machines with radial basis
function and sigmoid kernels whose performance will be compared. In building
a support vector machine model, there are several parameters that need to be
defined. To obtain optimal parameters, a hyperparameter optimization method
is needed. In this case, the hyperparameter optimization methods used are grid
search, random search and Bayesian optimization. Apart from that, in preparing
the data several methods are also needed, namely data normalization, oversampling
and feature selection so that the resulting model is more optimal. The data
normalization method used is robust scaler while the oversampling method used is
SMOTE. Feature selection is one of the important things in machine learning and is
often used to carry out dimension reduction by removing irrelevant and redundant
information from a data set to obtain an optimal feature subset. The method used
to select features is recursive feature elimination (RFE). The best model obtained
was the Linear SVM model with 20 features selected using the RFE method and the
hyperparameter optimization method used was the Random Search method. This
model produces an AUC value on test data of 0.93732 which shows that the model
can perform classification very well.. |
---|