PREDICTING HEALTH INSURANCE PREMIUM USING GENERALIZED LINEAR MODEL AND RANDOM FOREST

An insurance business sells promises of financial compensation if an unwanted event (risk) occurs. On the other hand, a policyholder needs to pay a premium to the insurer or insurance company. Since the number of claims (frequency) and the amount of claims (severity) cannot be known before they occu...

Full description

Saved in:

Bibliographic Details
Main Author:	Hadinata Putra, Jason
Format:	Final Project
Language:	Indonesia
Online Access:	https://digilib.itb.ac.id/gdl/view/76257
Tags:	Add Tag No Tags, Be the first to tag this record!
Institution:	Institut Teknologi Bandung
Language:	Indonesia

id	id-itb.:76257
spelling	id-itb.:762572023-08-14T09:48:02ZPREDICTING HEALTH INSURANCE PREMIUM USING GENERALIZED LINEAR MODEL AND RANDOM FOREST Hadinata Putra, Jason Indonesia Final Project insurance premium, generalized linear model (GLM), random forest INSTITUT TEKNOLOGI BANDUNG https://digilib.itb.ac.id/gdl/view/76257 An insurance business sells promises of financial compensation if an unwanted event (risk) occurs. On the other hand, a policyholder needs to pay a premium to the insurer or insurance company. Since the number of claims (frequency) and the amount of claims (severity) cannot be known before they occur, the premium is determined using past claims data. In this final project, two methodologies are used to predict the premium, namely the Generalized Linear Model (GLM) and Random Forest. GLM assumes that the probability distribution of the response variable belongs to the exponential family distribution. Random Forest is a machine learning method that does not assume any distribution on the response variable. The data used in this Final Project is a historical claims data in the United States taken from Kaggle.com. The response variable used is “charges” or the premium. The “charges” response variable follows a Tweedie probability distribution; hence, the link function used is the natural logarithm. In the Random Forest method, the hyperparameter N and the cost-complexity parameter are determined. To compare the two methodologies, the Root Mean Squared Error (RMSE) metric is used. The results shows that the RMSE from the Random Forest method is smaller than that obtained when using GLM. This shows that the Random Forest method has a better predictive performance than GLM. However, the Random Forest cannot provide an interpretation on the obtained model whereas GLM can. Hence, GLM is still considered in determining the premium of an insurance business. text
institution	Institut Teknologi Bandung
building	Institut Teknologi Bandung Library
continent	Asia
country	Indonesia Indonesia
content_provider	Institut Teknologi Bandung
collection	Digital ITB
language	Indonesia
description	An insurance business sells promises of financial compensation if an unwanted event (risk) occurs. On the other hand, a policyholder needs to pay a premium to the insurer or insurance company. Since the number of claims (frequency) and the amount of claims (severity) cannot be known before they occur, the premium is determined using past claims data. In this final project, two methodologies are used to predict the premium, namely the Generalized Linear Model (GLM) and Random Forest. GLM assumes that the probability distribution of the response variable belongs to the exponential family distribution. Random Forest is a machine learning method that does not assume any distribution on the response variable. The data used in this Final Project is a historical claims data in the United States taken from Kaggle.com. The response variable used is “charges” or the premium. The “charges” response variable follows a Tweedie probability distribution; hence, the link function used is the natural logarithm. In the Random Forest method, the hyperparameter N and the cost-complexity parameter are determined. To compare the two methodologies, the Root Mean Squared Error (RMSE) metric is used. The results shows that the RMSE from the Random Forest method is smaller than that obtained when using GLM. This shows that the Random Forest method has a better predictive performance than GLM. However, the Random Forest cannot provide an interpretation on the obtained model whereas GLM can. Hence, GLM is still considered in determining the premium of an insurance business.
format	Final Project
author	Hadinata Putra, Jason
spellingShingle	Hadinata Putra, Jason PREDICTING HEALTH INSURANCE PREMIUM USING GENERALIZED LINEAR MODEL AND RANDOM FOREST
author_facet	Hadinata Putra, Jason
author_sort	Hadinata Putra, Jason
title	PREDICTING HEALTH INSURANCE PREMIUM USING GENERALIZED LINEAR MODEL AND RANDOM FOREST
title_short	PREDICTING HEALTH INSURANCE PREMIUM USING GENERALIZED LINEAR MODEL AND RANDOM FOREST
title_full	PREDICTING HEALTH INSURANCE PREMIUM USING GENERALIZED LINEAR MODEL AND RANDOM FOREST
title_fullStr	PREDICTING HEALTH INSURANCE PREMIUM USING GENERALIZED LINEAR MODEL AND RANDOM FOREST
title_full_unstemmed	PREDICTING HEALTH INSURANCE PREMIUM USING GENERALIZED LINEAR MODEL AND RANDOM FOREST
title_sort	predicting health insurance premium using generalized linear model and random forest
url	https://digilib.itb.ac.id/gdl/view/76257
_version_	1823653011332792320

PREDICTING HEALTH INSURANCE PREMIUM USING GENERALIZED LINEAR MODEL AND RANDOM FOREST

Similar Items