PREDICTING HEALTH INSURANCE PREMIUM USING GENERALIZED LINEAR MODEL AND RANDOM FOREST
An insurance business sells promises of financial compensation if an unwanted event (risk) occurs. On the other hand, a policyholder needs to pay a premium to the insurer or insurance company. Since the number of claims (frequency) and the amount of claims (severity) cannot be known before they occu...
Saved in:
Main Author: | |
---|---|
Format: | Final Project |
Language: | Indonesia |
Online Access: | https://digilib.itb.ac.id/gdl/view/76257 |
Tags: |
Add Tag
No Tags, Be the first to tag this record!
|
Institution: | Institut Teknologi Bandung |
Language: | Indonesia |
Summary: | An insurance business sells promises of financial compensation if an unwanted event (risk) occurs. On the other hand, a policyholder needs to pay a premium to the insurer or insurance company. Since the number of claims (frequency) and the amount of claims (severity) cannot be known before they occur, the premium is determined using past claims data. In this final project, two methodologies are used to predict the premium, namely the Generalized Linear Model (GLM) and Random Forest. GLM assumes that the probability distribution of the response variable belongs to the exponential family distribution. Random Forest is a machine learning method that does not assume any distribution on the response variable. The data used in this Final Project is a historical claims data in the United States taken from Kaggle.com. The response variable used is “charges” or the premium. The “charges” response variable follows a Tweedie probability distribution; hence, the link function used is the natural logarithm. In the Random Forest method, the hyperparameter N and the cost-complexity parameter are determined. To compare the two methodologies, the Root Mean Squared Error (RMSE) metric is used. The results shows that the RMSE from the Random Forest method is smaller than that obtained when using GLM. This shows that the Random Forest method has a better predictive performance than GLM. However, the Random Forest cannot provide an interpretation on the obtained model whereas GLM can. Hence, GLM is still considered in determining the premium of an insurance business. |
---|