IDENTIFICATION OF HEALTH CARE PROVIDER FRAUD USING SUPPORT VECTOR MACHINE

One of the most important problems in the insurance industry is fraud which causes huge losses. Deliberate fraud by hiding or omitting facts when submitting a claim is considered a fraudulent activity in the health insurance sector which causes large losses for insurance companies. Fraudulent act...

Full description

Saved in:
Bibliographic Details
Main Author: Jeremy
Format: Final Project
Language:Indonesia
Online Access:https://digilib.itb.ac.id/gdl/view/81303
Tags: Add Tag
No Tags, Be the first to tag this record!
Institution: Institut Teknologi Bandung
Language: Indonesia
Description
Summary:One of the most important problems in the insurance industry is fraud which causes huge losses. Deliberate fraud by hiding or omitting facts when submitting a claim is considered a fraudulent activity in the health insurance sector which causes large losses for insurance companies. Fraudulent acts are increasingly diverse and the amount of data is also growing, making it quite difficult to recognize fraudulent acts from large data sets. One way to overcome this fraud is to detect it using machine learning. In this research, the machine learning methods used are linear support vector machines and nonlinear support vector machines with radial basis function and sigmoid kernels whose performance will be compared. In building a support vector machine model, there are several parameters that need to be defined. To obtain optimal parameters, a hyperparameter optimization method is needed. In this case, the hyperparameter optimization methods used are grid search, random search and Bayesian optimization. Apart from that, in preparing the data several methods are also needed, namely data normalization, oversampling and feature selection so that the resulting model is more optimal. The data normalization method used is robust scaler while the oversampling method used is SMOTE. Feature selection is one of the important things in machine learning and is often used to carry out dimension reduction by removing irrelevant and redundant information from a data set to obtain an optimal feature subset. The method used to select features is recursive feature elimination (RFE). The best model obtained was the Linear SVM model with 20 features selected using the RFE method and the hyperparameter optimization method used was the Random Search method. This model produces an AUC value on test data of 0.93732 which shows that the model can perform classification very well..