PREDICTING HEALTH INSURANCE PREMIUM USING GENERALIZED LINEAR MODEL AND RANDOM FOREST
An insurance business sells promises of financial compensation if an unwanted event (risk) occurs. On the other hand, a policyholder needs to pay a premium to the insurer or insurance company. Since the number of claims (frequency) and the amount of claims (severity) cannot be known before they occu...
Saved in:
Main Author: | |
---|---|
Format: | Final Project |
Language: | Indonesia |
Online Access: | https://digilib.itb.ac.id/gdl/view/76257 |
Tags: |
Add Tag
No Tags, Be the first to tag this record!
|
Institution: | Institut Teknologi Bandung |
Language: | Indonesia |
id |
id-itb.:76257 |
---|---|
spelling |
id-itb.:762572023-08-14T09:48:02ZPREDICTING HEALTH INSURANCE PREMIUM USING GENERALIZED LINEAR MODEL AND RANDOM FOREST Hadinata Putra, Jason Indonesia Final Project insurance premium, generalized linear model (GLM), random forest INSTITUT TEKNOLOGI BANDUNG https://digilib.itb.ac.id/gdl/view/76257 An insurance business sells promises of financial compensation if an unwanted event (risk) occurs. On the other hand, a policyholder needs to pay a premium to the insurer or insurance company. Since the number of claims (frequency) and the amount of claims (severity) cannot be known before they occur, the premium is determined using past claims data. In this final project, two methodologies are used to predict the premium, namely the Generalized Linear Model (GLM) and Random Forest. GLM assumes that the probability distribution of the response variable belongs to the exponential family distribution. Random Forest is a machine learning method that does not assume any distribution on the response variable. The data used in this Final Project is a historical claims data in the United States taken from Kaggle.com. The response variable used is “charges” or the premium. The “charges” response variable follows a Tweedie probability distribution; hence, the link function used is the natural logarithm. In the Random Forest method, the hyperparameter N and the cost-complexity parameter are determined. To compare the two methodologies, the Root Mean Squared Error (RMSE) metric is used. The results shows that the RMSE from the Random Forest method is smaller than that obtained when using GLM. This shows that the Random Forest method has a better predictive performance than GLM. However, the Random Forest cannot provide an interpretation on the obtained model whereas GLM can. Hence, GLM is still considered in determining the premium of an insurance business. text |
institution |
Institut Teknologi Bandung |
building |
Institut Teknologi Bandung Library |
continent |
Asia |
country |
Indonesia Indonesia |
content_provider |
Institut Teknologi Bandung |
collection |
Digital ITB |
language |
Indonesia |
description |
An insurance business sells promises of financial compensation if an unwanted event (risk) occurs. On the other hand, a policyholder needs to pay a premium to the insurer or insurance company. Since the number of claims (frequency) and the amount of claims (severity) cannot be known before they occur, the premium is determined using past claims data. In this final project, two methodologies are used to predict the premium, namely the Generalized Linear Model (GLM) and Random Forest. GLM assumes that the probability distribution of the response variable belongs to the exponential family distribution. Random Forest is a machine learning method that does not assume any distribution on the response variable. The data used in this Final Project is a historical claims data in the United States taken from Kaggle.com. The response variable used is “charges” or the premium. The “charges” response variable follows a Tweedie probability distribution; hence, the link function used is the natural logarithm. In the Random Forest method, the hyperparameter N and the cost-complexity parameter are determined. To compare the two methodologies, the Root Mean Squared Error (RMSE) metric is used. The results shows that the RMSE from the Random Forest method is smaller than that obtained when using GLM. This shows that the Random Forest method has a better predictive performance than GLM. However, the Random Forest cannot provide an interpretation on the obtained model whereas GLM can. Hence, GLM is still considered in determining the premium of an insurance business. |
format |
Final Project |
author |
Hadinata Putra, Jason |
spellingShingle |
Hadinata Putra, Jason PREDICTING HEALTH INSURANCE PREMIUM USING GENERALIZED LINEAR MODEL AND RANDOM FOREST |
author_facet |
Hadinata Putra, Jason |
author_sort |
Hadinata Putra, Jason |
title |
PREDICTING HEALTH INSURANCE PREMIUM USING GENERALIZED LINEAR MODEL AND RANDOM FOREST |
title_short |
PREDICTING HEALTH INSURANCE PREMIUM USING GENERALIZED LINEAR MODEL AND RANDOM FOREST |
title_full |
PREDICTING HEALTH INSURANCE PREMIUM USING GENERALIZED LINEAR MODEL AND RANDOM FOREST |
title_fullStr |
PREDICTING HEALTH INSURANCE PREMIUM USING GENERALIZED LINEAR MODEL AND RANDOM FOREST |
title_full_unstemmed |
PREDICTING HEALTH INSURANCE PREMIUM USING GENERALIZED LINEAR MODEL AND RANDOM FOREST |
title_sort |
predicting health insurance premium using generalized linear model and random forest |
url |
https://digilib.itb.ac.id/gdl/view/76257 |
_version_ |
1822007927083368448 |