A PREDICTION OF A HEALTH INSURANCE PREMIUM USING A GENERALIZED LINEAR MODEL (GLM) AND GRADIENT BOOSTING MACHINE (GBM)

An insurance business is a business which handles a risk transfer from an insured (policyholder) to an insurer (insurance company). As a compensation for the transfer of risk, a policyholder is required to pay an insurance premium. However, there is a level of uncertainty in the amount of premium...

Full description

Saved in:
Bibliographic Details
Main Author: Satria Joel Manurung, Tito
Format: Final Project
Language:Indonesia
Online Access:https://digilib.itb.ac.id/gdl/view/68974
Tags: Add Tag
No Tags, Be the first to tag this record!
Institution: Institut Teknologi Bandung
Language: Indonesia
Description
Summary:An insurance business is a business which handles a risk transfer from an insured (policyholder) to an insurer (insurance company). As a compensation for the transfer of risk, a policyholder is required to pay an insurance premium. However, there is a level of uncertainty in the amount of premium which must be paid by the policyholder because the frequency and severity of claims are not known with certainty at the time the premium need to be paid. In this final project, two methodologies are used to determine the amount of insurance premium which must be paid by a policyholder for a general insurance product. The first methodology is a regression model called a Generalized Linear Model (GLM). In GLM, there is an assumption that the distribution of the response variable must follow a distribution in the exponential family. The second, is the Gradient Boosting Machine (GBM) which does not require any assumptions on the probability distribution of the response variable. In this final project, a dataset on a health insurance in the United States is used, obtained from Kaggle.com. The premium variable in the data, which is the response variable, follows a Tweedie distribution. Based on that probability model, the natural logarithm link function is used in the GLM. The second methodology, the GBM, considers 4 hyperparameters: shrinkage, interaction.depth, minobsinnode, and n.trees. The RMSE value in the test set is used to compare the two methodologies. It was found that the RMSE produced by the GBM is smaller than that produced by GLM. This means that, based on the data analyzed, the GBM is better in predicting the amount of insurance premiums than those predicted by GLM. However, it should be noted that the results produced by a GLM is more interpretive than those produced by a GBM. Hence, a GLM is still widely used in modeling a general insurance data.