IDENTIFICATION OF HEALTH CARE PROVIDER FRAUD USING SUPPORT VECTOR MACHINE
One of the most important problems in the insurance industry is fraud which causes huge losses. Deliberate fraud by hiding or omitting facts when submitting a claim is considered a fraudulent activity in the health insurance sector which causes large losses for insurance companies. Fraudulent act...
Saved in:
Main Author: | |
---|---|
Format: | Final Project |
Language: | Indonesia |
Online Access: | https://digilib.itb.ac.id/gdl/view/81303 |
Tags: |
Add Tag
No Tags, Be the first to tag this record!
|
Institution: | Institut Teknologi Bandung |
Language: | Indonesia |
id |
id-itb.:81303 |
---|---|
spelling |
id-itb.:813032024-06-12T11:21:21ZIDENTIFICATION OF HEALTH CARE PROVIDER FRAUD USING SUPPORT VECTOR MACHINE Jeremy Indonesia Final Project Fraud Detection, Support Vector Machine, Hyperparameter Optimization, Kernels INSTITUT TEKNOLOGI BANDUNG https://digilib.itb.ac.id/gdl/view/81303 One of the most important problems in the insurance industry is fraud which causes huge losses. Deliberate fraud by hiding or omitting facts when submitting a claim is considered a fraudulent activity in the health insurance sector which causes large losses for insurance companies. Fraudulent acts are increasingly diverse and the amount of data is also growing, making it quite difficult to recognize fraudulent acts from large data sets. One way to overcome this fraud is to detect it using machine learning. In this research, the machine learning methods used are linear support vector machines and nonlinear support vector machines with radial basis function and sigmoid kernels whose performance will be compared. In building a support vector machine model, there are several parameters that need to be defined. To obtain optimal parameters, a hyperparameter optimization method is needed. In this case, the hyperparameter optimization methods used are grid search, random search and Bayesian optimization. Apart from that, in preparing the data several methods are also needed, namely data normalization, oversampling and feature selection so that the resulting model is more optimal. The data normalization method used is robust scaler while the oversampling method used is SMOTE. Feature selection is one of the important things in machine learning and is often used to carry out dimension reduction by removing irrelevant and redundant information from a data set to obtain an optimal feature subset. The method used to select features is recursive feature elimination (RFE). The best model obtained was the Linear SVM model with 20 features selected using the RFE method and the hyperparameter optimization method used was the Random Search method. This model produces an AUC value on test data of 0.93732 which shows that the model can perform classification very well.. text |
institution |
Institut Teknologi Bandung |
building |
Institut Teknologi Bandung Library |
continent |
Asia |
country |
Indonesia Indonesia |
content_provider |
Institut Teknologi Bandung |
collection |
Digital ITB |
language |
Indonesia |
description |
One of the most important problems in the insurance industry is fraud which causes
huge losses. Deliberate fraud by hiding or omitting facts when submitting a claim
is considered a fraudulent activity in the health insurance sector which causes large
losses for insurance companies. Fraudulent acts are increasingly diverse and the
amount of data is also growing, making it quite difficult to recognize fraudulent
acts from large data sets. One way to overcome this fraud is to detect it using
machine learning. In this research, the machine learning methods used are linear
support vector machines and nonlinear support vector machines with radial basis
function and sigmoid kernels whose performance will be compared. In building
a support vector machine model, there are several parameters that need to be
defined. To obtain optimal parameters, a hyperparameter optimization method
is needed. In this case, the hyperparameter optimization methods used are grid
search, random search and Bayesian optimization. Apart from that, in preparing
the data several methods are also needed, namely data normalization, oversampling
and feature selection so that the resulting model is more optimal. The data
normalization method used is robust scaler while the oversampling method used is
SMOTE. Feature selection is one of the important things in machine learning and is
often used to carry out dimension reduction by removing irrelevant and redundant
information from a data set to obtain an optimal feature subset. The method used
to select features is recursive feature elimination (RFE). The best model obtained
was the Linear SVM model with 20 features selected using the RFE method and the
hyperparameter optimization method used was the Random Search method. This
model produces an AUC value on test data of 0.93732 which shows that the model
can perform classification very well.. |
format |
Final Project |
author |
Jeremy |
spellingShingle |
Jeremy IDENTIFICATION OF HEALTH CARE PROVIDER FRAUD USING SUPPORT VECTOR MACHINE |
author_facet |
Jeremy |
author_sort |
Jeremy |
title |
IDENTIFICATION OF HEALTH CARE PROVIDER FRAUD USING SUPPORT VECTOR MACHINE |
title_short |
IDENTIFICATION OF HEALTH CARE PROVIDER FRAUD USING SUPPORT VECTOR MACHINE |
title_full |
IDENTIFICATION OF HEALTH CARE PROVIDER FRAUD USING SUPPORT VECTOR MACHINE |
title_fullStr |
IDENTIFICATION OF HEALTH CARE PROVIDER FRAUD USING SUPPORT VECTOR MACHINE |
title_full_unstemmed |
IDENTIFICATION OF HEALTH CARE PROVIDER FRAUD USING SUPPORT VECTOR MACHINE |
title_sort |
identification of health care provider fraud using support vector machine |
url |
https://digilib.itb.ac.id/gdl/view/81303 |
_version_ |
1822997248620888064 |