A PREDICTION OF A HEALTH INSURANCE PREMIUM USING A GENERALIZED LINEAR MODEL (GLM) AND GRADIENT BOOSTING MACHINE (GBM)
An insurance business is a business which handles a risk transfer from an insured (policyholder) to an insurer (insurance company). As a compensation for the transfer of risk, a policyholder is required to pay an insurance premium. However, there is a level of uncertainty in the amount of premium...
Saved in:
Main Author: | |
---|---|
Format: | Final Project |
Language: | Indonesia |
Online Access: | https://digilib.itb.ac.id/gdl/view/68974 |
Tags: |
Add Tag
No Tags, Be the first to tag this record!
|
Institution: | Institut Teknologi Bandung |
Language: | Indonesia |
id |
id-itb.:68974 |
---|---|
spelling |
id-itb.:689742022-09-19T19:17:07ZA PREDICTION OF A HEALTH INSURANCE PREMIUM USING A GENERALIZED LINEAR MODEL (GLM) AND GRADIENT BOOSTING MACHINE (GBM) Satria Joel Manurung, Tito Indonesia Final Project insurance premium, Generalized Linear Model (GLM), Gradient Boosting Machine (GBM) INSTITUT TEKNOLOGI BANDUNG https://digilib.itb.ac.id/gdl/view/68974 An insurance business is a business which handles a risk transfer from an insured (policyholder) to an insurer (insurance company). As a compensation for the transfer of risk, a policyholder is required to pay an insurance premium. However, there is a level of uncertainty in the amount of premium which must be paid by the policyholder because the frequency and severity of claims are not known with certainty at the time the premium need to be paid. In this final project, two methodologies are used to determine the amount of insurance premium which must be paid by a policyholder for a general insurance product. The first methodology is a regression model called a Generalized Linear Model (GLM). In GLM, there is an assumption that the distribution of the response variable must follow a distribution in the exponential family. The second, is the Gradient Boosting Machine (GBM) which does not require any assumptions on the probability distribution of the response variable. In this final project, a dataset on a health insurance in the United States is used, obtained from Kaggle.com. The premium variable in the data, which is the response variable, follows a Tweedie distribution. Based on that probability model, the natural logarithm link function is used in the GLM. The second methodology, the GBM, considers 4 hyperparameters: shrinkage, interaction.depth, minobsinnode, and n.trees. The RMSE value in the test set is used to compare the two methodologies. It was found that the RMSE produced by the GBM is smaller than that produced by GLM. This means that, based on the data analyzed, the GBM is better in predicting the amount of insurance premiums than those predicted by GLM. However, it should be noted that the results produced by a GLM is more interpretive than those produced by a GBM. Hence, a GLM is still widely used in modeling a general insurance data. text |
institution |
Institut Teknologi Bandung |
building |
Institut Teknologi Bandung Library |
continent |
Asia |
country |
Indonesia Indonesia |
content_provider |
Institut Teknologi Bandung |
collection |
Digital ITB |
language |
Indonesia |
description |
An insurance business is a business which handles a risk transfer from an insured
(policyholder) to an insurer (insurance company). As a compensation for the
transfer of risk, a policyholder is required to pay an insurance premium. However,
there is a level of uncertainty in the amount of premium which must be paid by the
policyholder because the frequency and severity of claims are not known with
certainty at the time the premium need to be paid. In this final project, two
methodologies are used to determine the amount of insurance premium which must
be paid by a policyholder for a general insurance product. The first methodology is
a regression model called a Generalized Linear Model (GLM). In GLM, there is an
assumption that the distribution of the response variable must follow a distribution
in the exponential family. The second, is the Gradient Boosting Machine (GBM)
which does not require any assumptions on the probability distribution of the
response variable. In this final project, a dataset on a health insurance in the United
States is used, obtained from Kaggle.com. The premium variable in the data, which
is the response variable, follows a Tweedie distribution. Based on that probability
model, the natural logarithm link function is used in the GLM. The second
methodology, the GBM, considers 4 hyperparameters: shrinkage, interaction.depth,
minobsinnode, and n.trees. The RMSE value in the test set is used to compare the
two methodologies. It was found that the RMSE produced by the GBM is smaller
than that produced by GLM. This means that, based on the data analyzed, the GBM
is better in predicting the amount of insurance premiums than those predicted by
GLM. However, it should be noted that the results produced by a GLM is more
interpretive than those produced by a GBM. Hence, a GLM is still widely used in
modeling a general insurance data. |
format |
Final Project |
author |
Satria Joel Manurung, Tito |
spellingShingle |
Satria Joel Manurung, Tito A PREDICTION OF A HEALTH INSURANCE PREMIUM USING A GENERALIZED LINEAR MODEL (GLM) AND GRADIENT BOOSTING MACHINE (GBM) |
author_facet |
Satria Joel Manurung, Tito |
author_sort |
Satria Joel Manurung, Tito |
title |
A PREDICTION OF A HEALTH INSURANCE PREMIUM USING A GENERALIZED LINEAR MODEL (GLM) AND GRADIENT BOOSTING MACHINE (GBM) |
title_short |
A PREDICTION OF A HEALTH INSURANCE PREMIUM USING A GENERALIZED LINEAR MODEL (GLM) AND GRADIENT BOOSTING MACHINE (GBM) |
title_full |
A PREDICTION OF A HEALTH INSURANCE PREMIUM USING A GENERALIZED LINEAR MODEL (GLM) AND GRADIENT BOOSTING MACHINE (GBM) |
title_fullStr |
A PREDICTION OF A HEALTH INSURANCE PREMIUM USING A GENERALIZED LINEAR MODEL (GLM) AND GRADIENT BOOSTING MACHINE (GBM) |
title_full_unstemmed |
A PREDICTION OF A HEALTH INSURANCE PREMIUM USING A GENERALIZED LINEAR MODEL (GLM) AND GRADIENT BOOSTING MACHINE (GBM) |
title_sort |
prediction of a health insurance premium using a generalized linear model (glm) and gradient boosting machine (gbm) |
url |
https://digilib.itb.ac.id/gdl/view/68974 |
_version_ |
1822990755901210624 |