A PREDICTION OF A HEALTH INSURANCE PREMIUM USING A GENERALIZED LINEAR MODEL (GLM) AND GRADIENT BOOSTING MACHINE (GBM)
An insurance business is a business which handles a risk transfer from an insured (policyholder) to an insurer (insurance company). As a compensation for the transfer of risk, a policyholder is required to pay an insurance premium. However, there is a level of uncertainty in the amount of premium...
Saved in:
Main Author: | |
---|---|
Format: | Final Project |
Language: | Indonesia |
Online Access: | https://digilib.itb.ac.id/gdl/view/68974 |
Tags: |
Add Tag
No Tags, Be the first to tag this record!
|
Institution: | Institut Teknologi Bandung |
Language: | Indonesia |
Summary: | An insurance business is a business which handles a risk transfer from an insured
(policyholder) to an insurer (insurance company). As a compensation for the
transfer of risk, a policyholder is required to pay an insurance premium. However,
there is a level of uncertainty in the amount of premium which must be paid by the
policyholder because the frequency and severity of claims are not known with
certainty at the time the premium need to be paid. In this final project, two
methodologies are used to determine the amount of insurance premium which must
be paid by a policyholder for a general insurance product. The first methodology is
a regression model called a Generalized Linear Model (GLM). In GLM, there is an
assumption that the distribution of the response variable must follow a distribution
in the exponential family. The second, is the Gradient Boosting Machine (GBM)
which does not require any assumptions on the probability distribution of the
response variable. In this final project, a dataset on a health insurance in the United
States is used, obtained from Kaggle.com. The premium variable in the data, which
is the response variable, follows a Tweedie distribution. Based on that probability
model, the natural logarithm link function is used in the GLM. The second
methodology, the GBM, considers 4 hyperparameters: shrinkage, interaction.depth,
minobsinnode, and n.trees. The RMSE value in the test set is used to compare the
two methodologies. It was found that the RMSE produced by the GBM is smaller
than that produced by GLM. This means that, based on the data analyzed, the GBM
is better in predicting the amount of insurance premiums than those predicted by
GLM. However, it should be noted that the results produced by a GLM is more
interpretive than those produced by a GBM. Hence, a GLM is still widely used in
modeling a general insurance data. |
---|