CLAIM FREQUENCY MODEL SELECTION FOR A WORKERS' COMPENSATION INSURANCE USING GIBBS SAMPLING

Claims frequency data in an insurance business may have the following cha- racteristics: do not follow a normal distribution and may be observed over several periods. For example, in this thesis, data observed are annual claims frequency of a workers' compensation insurance, observed over a...

Full description

Saved in:
Bibliographic Details
Main Author: Stella Sunaryo, Josephine
Format: Theses
Language:Indonesia
Online Access:https://digilib.itb.ac.id/gdl/view/63400
Tags: Add Tag
No Tags, Be the first to tag this record!
Institution: Institut Teknologi Bandung
Language: Indonesia
id id-itb.:63400
spelling id-itb.:634002022-02-07T09:02:36ZCLAIM FREQUENCY MODEL SELECTION FOR A WORKERS' COMPENSATION INSURANCE USING GIBBS SAMPLING Stella Sunaryo, Josephine Indonesia Theses panel data, workers' compensation insurance, generalized estima- ting equation, quasi likelihood information criterion, model selection, Markov chain, Gibbs Sampler INSTITUT TEKNOLOGI BANDUNG https://digilib.itb.ac.id/gdl/view/63400 Claims frequency data in an insurance business may have the following cha- racteristics: do not follow a normal distribution and may be observed over several periods. For example, in this thesis, data observed are annual claims frequency of a workers' compensation insurance, observed over a seven-year period. The distribution of the claims frequency does not follow a normal distribution; it is skewed to the right. The claims frequency data is classi- ed by occupation and state. Total payroll is also included in the data and is used as an exposure. The claims frequency for a particular state and occupation will be predicted. The data therefore have three explanatory variables: year, state, and occu- pation. The state variable has 10 levels, whereas the occupation variable has 25 levels. Hence, in total, there are 33 candidates for predictors. To model this panel data, Generalized Estimating Equation (GEE) is used. A working correlation matrix R(), with a as a parameter, measures the correlation be- tween observations in one panel data. In this thesis, the working correlation matrix is assumed to follow an AR(1) model with = 0:3518 , and each of the panel data is assumed to be independent of one another. With the 33 predictor variables, there will be 233 possible sub-models, where an intercept is always included in the regression model. For the author's nal project (Tugas Akhir) in her Sarjana program in Ma- thematics at FMIPA ITB, the author modelled the same data with GEE but used a stepwise regression method to select the most appropriate model for the given data. One of the drawbacks of a stepwise regression method is that the resulting best model depends on the signicance level used and it only gave one appropriate model as the nal result. In other words, a stepwise regression method does not compare all of the possible sub-models. To overcome this problem, a Gibbs sampling algorithm which compares all of the possible sub-models is used. This algorithm is able to nd the best model based on the quasi likelihood information criterion (QIC) value eciently. The Gibbs sampling algorithm is modied so it could be used to sample sub-models from its population of all possible sub-models. Analytical result showed that a model obtained by the Gibbs sampling algorithm is better than that obtained by a stepwise regression method. Let "exp" be the total payroll, x1 to x9 denote the state variables, and x10 to x32 denote the occupation class variables. For the data used in this thesis, the best model given by the Gibbs sampling algorithm is shown below: g () = ln exp ???? 2:6810 + 0:7703x1 + 1:2704x2 + 0:5358x3 ???? 0:1025x4 + 0:6314x5 +0:9760x7 ???? 0:1803x8 + 0:7693x9 ???? 0:6805x11 ???? 0:9835x12 ???? 0:1958x13 +0:5321x14 ???? 1:0336x17 ???? 0:4470x18 ???? 0:7274x19 ???? 0:5322x20 ???? 0:4780x21 ????0:1906x22 ???? 0:5164x24 ???? 0:8818x25 ???? 1:1169x26 ???? 0:7494x27 ???? 0:9885x28 ????0:6448x29 ???? 1:4706x30 ???? 0:6036x31 ???? 1:4217x32 with a QIC value of -44,475.906, which is smaller than that of the model obtained by a stepwise regression method: g () = ln exp ???? 2:7425 + 0:8306x1 + 1:3314x2 + 0:5388x3 + 0:6184x5 + 0:9099x7 +0:7765x9 ???? 0:5835x11 ???? 0:7891x12 + 0:6133x14 ???? 1:1759x17 ???? 0:3617x18 ????0:6901x19 ???? 0:5004x20 ???? 0:3978x21 ???? 0:4551x24 ???? 0:9023x25 ???? 0:9911x26 ????0:6926x27 ???? 0:9335x28 ???? 0:5894x29 ???? 1:4873x30 ???? 0:5429x31 ???? 1:3623x32 with a QIC value of 37,330.3338. text
institution Institut Teknologi Bandung
building Institut Teknologi Bandung Library
continent Asia
country Indonesia
Indonesia
content_provider Institut Teknologi Bandung
collection Digital ITB
language Indonesia
description Claims frequency data in an insurance business may have the following cha- racteristics: do not follow a normal distribution and may be observed over several periods. For example, in this thesis, data observed are annual claims frequency of a workers' compensation insurance, observed over a seven-year period. The distribution of the claims frequency does not follow a normal distribution; it is skewed to the right. The claims frequency data is classi- ed by occupation and state. Total payroll is also included in the data and is used as an exposure. The claims frequency for a particular state and occupation will be predicted. The data therefore have three explanatory variables: year, state, and occu- pation. The state variable has 10 levels, whereas the occupation variable has 25 levels. Hence, in total, there are 33 candidates for predictors. To model this panel data, Generalized Estimating Equation (GEE) is used. A working correlation matrix R(), with a as a parameter, measures the correlation be- tween observations in one panel data. In this thesis, the working correlation matrix is assumed to follow an AR(1) model with = 0:3518 , and each of the panel data is assumed to be independent of one another. With the 33 predictor variables, there will be 233 possible sub-models, where an intercept is always included in the regression model. For the author's nal project (Tugas Akhir) in her Sarjana program in Ma- thematics at FMIPA ITB, the author modelled the same data with GEE but used a stepwise regression method to select the most appropriate model for the given data. One of the drawbacks of a stepwise regression method is that the resulting best model depends on the signicance level used and it only gave one appropriate model as the nal result. In other words, a stepwise regression method does not compare all of the possible sub-models. To overcome this problem, a Gibbs sampling algorithm which compares all of the possible sub-models is used. This algorithm is able to nd the best model based on the quasi likelihood information criterion (QIC) value eciently. The Gibbs sampling algorithm is modied so it could be used to sample sub-models from its population of all possible sub-models. Analytical result showed that a model obtained by the Gibbs sampling algorithm is better than that obtained by a stepwise regression method. Let "exp" be the total payroll, x1 to x9 denote the state variables, and x10 to x32 denote the occupation class variables. For the data used in this thesis, the best model given by the Gibbs sampling algorithm is shown below: g () = ln exp ???? 2:6810 + 0:7703x1 + 1:2704x2 + 0:5358x3 ???? 0:1025x4 + 0:6314x5 +0:9760x7 ???? 0:1803x8 + 0:7693x9 ???? 0:6805x11 ???? 0:9835x12 ???? 0:1958x13 +0:5321x14 ???? 1:0336x17 ???? 0:4470x18 ???? 0:7274x19 ???? 0:5322x20 ???? 0:4780x21 ????0:1906x22 ???? 0:5164x24 ???? 0:8818x25 ???? 1:1169x26 ???? 0:7494x27 ???? 0:9885x28 ????0:6448x29 ???? 1:4706x30 ???? 0:6036x31 ???? 1:4217x32 with a QIC value of -44,475.906, which is smaller than that of the model obtained by a stepwise regression method: g () = ln exp ???? 2:7425 + 0:8306x1 + 1:3314x2 + 0:5388x3 + 0:6184x5 + 0:9099x7 +0:7765x9 ???? 0:5835x11 ???? 0:7891x12 + 0:6133x14 ???? 1:1759x17 ???? 0:3617x18 ????0:6901x19 ???? 0:5004x20 ???? 0:3978x21 ???? 0:4551x24 ???? 0:9023x25 ???? 0:9911x26 ????0:6926x27 ???? 0:9335x28 ???? 0:5894x29 ???? 1:4873x30 ???? 0:5429x31 ???? 1:3623x32 with a QIC value of 37,330.3338.
format Theses
author Stella Sunaryo, Josephine
spellingShingle Stella Sunaryo, Josephine
CLAIM FREQUENCY MODEL SELECTION FOR A WORKERS' COMPENSATION INSURANCE USING GIBBS SAMPLING
author_facet Stella Sunaryo, Josephine
author_sort Stella Sunaryo, Josephine
title CLAIM FREQUENCY MODEL SELECTION FOR A WORKERS' COMPENSATION INSURANCE USING GIBBS SAMPLING
title_short CLAIM FREQUENCY MODEL SELECTION FOR A WORKERS' COMPENSATION INSURANCE USING GIBBS SAMPLING
title_full CLAIM FREQUENCY MODEL SELECTION FOR A WORKERS' COMPENSATION INSURANCE USING GIBBS SAMPLING
title_fullStr CLAIM FREQUENCY MODEL SELECTION FOR A WORKERS' COMPENSATION INSURANCE USING GIBBS SAMPLING
title_full_unstemmed CLAIM FREQUENCY MODEL SELECTION FOR A WORKERS' COMPENSATION INSURANCE USING GIBBS SAMPLING
title_sort claim frequency model selection for a workers' compensation insurance using gibbs sampling
url https://digilib.itb.ac.id/gdl/view/63400
_version_ 1822004311718100992