PREDICTING HEALTH INSURANCE PREMIUM USING GENERALIZED LINEAR MODEL AND RANDOM FOREST

An insurance business sells promises of financial compensation if an unwanted event (risk) occurs. On the other hand, a policyholder needs to pay a premium to the insurer or insurance company. Since the number of claims (frequency) and the amount of claims (severity) cannot be known before they occu...

Full description

Saved in:
Bibliographic Details
Main Author: Hadinata Putra, Jason
Format: Final Project
Language:Indonesia
Online Access:https://digilib.itb.ac.id/gdl/view/76257
Tags: Add Tag
No Tags, Be the first to tag this record!
Institution: Institut Teknologi Bandung
Language: Indonesia
id id-itb.:76257
spelling id-itb.:762572023-08-14T09:48:02ZPREDICTING HEALTH INSURANCE PREMIUM USING GENERALIZED LINEAR MODEL AND RANDOM FOREST Hadinata Putra, Jason Indonesia Final Project insurance premium, generalized linear model (GLM), random forest INSTITUT TEKNOLOGI BANDUNG https://digilib.itb.ac.id/gdl/view/76257 An insurance business sells promises of financial compensation if an unwanted event (risk) occurs. On the other hand, a policyholder needs to pay a premium to the insurer or insurance company. Since the number of claims (frequency) and the amount of claims (severity) cannot be known before they occur, the premium is determined using past claims data. In this final project, two methodologies are used to predict the premium, namely the Generalized Linear Model (GLM) and Random Forest. GLM assumes that the probability distribution of the response variable belongs to the exponential family distribution. Random Forest is a machine learning method that does not assume any distribution on the response variable. The data used in this Final Project is a historical claims data in the United States taken from Kaggle.com. The response variable used is “charges” or the premium. The “charges” response variable follows a Tweedie probability distribution; hence, the link function used is the natural logarithm. In the Random Forest method, the hyperparameter N and the cost-complexity parameter are determined. To compare the two methodologies, the Root Mean Squared Error (RMSE) metric is used. The results shows that the RMSE from the Random Forest method is smaller than that obtained when using GLM. This shows that the Random Forest method has a better predictive performance than GLM. However, the Random Forest cannot provide an interpretation on the obtained model whereas GLM can. Hence, GLM is still considered in determining the premium of an insurance business. text
institution Institut Teknologi Bandung
building Institut Teknologi Bandung Library
continent Asia
country Indonesia
Indonesia
content_provider Institut Teknologi Bandung
collection Digital ITB
language Indonesia
description An insurance business sells promises of financial compensation if an unwanted event (risk) occurs. On the other hand, a policyholder needs to pay a premium to the insurer or insurance company. Since the number of claims (frequency) and the amount of claims (severity) cannot be known before they occur, the premium is determined using past claims data. In this final project, two methodologies are used to predict the premium, namely the Generalized Linear Model (GLM) and Random Forest. GLM assumes that the probability distribution of the response variable belongs to the exponential family distribution. Random Forest is a machine learning method that does not assume any distribution on the response variable. The data used in this Final Project is a historical claims data in the United States taken from Kaggle.com. The response variable used is “charges” or the premium. The “charges” response variable follows a Tweedie probability distribution; hence, the link function used is the natural logarithm. In the Random Forest method, the hyperparameter N and the cost-complexity parameter are determined. To compare the two methodologies, the Root Mean Squared Error (RMSE) metric is used. The results shows that the RMSE from the Random Forest method is smaller than that obtained when using GLM. This shows that the Random Forest method has a better predictive performance than GLM. However, the Random Forest cannot provide an interpretation on the obtained model whereas GLM can. Hence, GLM is still considered in determining the premium of an insurance business.
format Final Project
author Hadinata Putra, Jason
spellingShingle Hadinata Putra, Jason
PREDICTING HEALTH INSURANCE PREMIUM USING GENERALIZED LINEAR MODEL AND RANDOM FOREST
author_facet Hadinata Putra, Jason
author_sort Hadinata Putra, Jason
title PREDICTING HEALTH INSURANCE PREMIUM USING GENERALIZED LINEAR MODEL AND RANDOM FOREST
title_short PREDICTING HEALTH INSURANCE PREMIUM USING GENERALIZED LINEAR MODEL AND RANDOM FOREST
title_full PREDICTING HEALTH INSURANCE PREMIUM USING GENERALIZED LINEAR MODEL AND RANDOM FOREST
title_fullStr PREDICTING HEALTH INSURANCE PREMIUM USING GENERALIZED LINEAR MODEL AND RANDOM FOREST
title_full_unstemmed PREDICTING HEALTH INSURANCE PREMIUM USING GENERALIZED LINEAR MODEL AND RANDOM FOREST
title_sort predicting health insurance premium using generalized linear model and random forest
url https://digilib.itb.ac.id/gdl/view/76257
_version_ 1822007927083368448