Types of covariate and distribution effects on parameter estimates and goodness-of-fit test using clustering partitioning strategy for multinomial logistic regression / Hamzah Abdul Hamid

This thesis presents a simulation study on parameter estimation for binary and multinomial logistic regression, and the extension of the clustering partitioning strategy for goodness-of-fit test to multinomial logistic regression model. The motivation behind this study is influenced by two main fact...

Full description

Saved in:
Bibliographic Details
Main Author: Abdul Hamid, Hamzah
Format: Thesis
Language:English
Published: 2017
Subjects:
Online Access:https://ir.uitm.edu.my/id/eprint/66514/1/66514.pdf
https://ir.uitm.edu.my/id/eprint/66514/
Tags: Add Tag
No Tags, Be the first to tag this record!
Institution: Universiti Teknologi Mara
Language: English
id my.uitm.ir.66514
record_format eprints
spelling my.uitm.ir.665142023-01-27T02:50:08Z https://ir.uitm.edu.my/id/eprint/66514/ Types of covariate and distribution effects on parameter estimates and goodness-of-fit test using clustering partitioning strategy for multinomial logistic regression / Hamzah Abdul Hamid Abdul Hamid, Hamzah Regression analysis. Correlation analysis. Spatial analysis (Statistics) This thesis presents a simulation study on parameter estimation for binary and multinomial logistic regression, and the extension of the clustering partitioning strategy for goodness-of-fit test to multinomial logistic regression model. The motivation behind this study is influenced by two main factors. Firstly, parameter estimation is often sensitive to sample size and types of data. Simulation studies are useful to assess and confirm the effects of parameter estimation for binary and multinomial logistic regression under various conditions. The first phase of this study covers the effect of different types of covariate, distributions and sample size on parameter estimation for binary and multinomial logistic regression model. Data were simulated for different sample sizes, types of covariate (continuous, count, categorical) arid distributions (normal or skewed for continuous variable). The simulation results show that the effect of skewed and categorical covariate reduces as sample size increases. The parameter estimates for normal distribution covariate apparently are less affected by sample size. For multinomial logistic regression model with a single covariate, a sample size of at least 300 is required to obtain unbiased estimates when the covariate is positively skewed or is a categorical covariate. A much larger sample size is required when covariates are negatively skewed. In Phase 2, we investigate the goodness-of-fit (GoF) tests for multinomial logistic regression. Goodness-of-fit tests are important to assess if the model fits the data. We investigated the Type I error and power of two goodness-of-fit tests for multinomial logistic regression via a simulation study. The GoF test using partitioning strategy (clustering) in the covariate space, XP*G w a s compared with another test, Cg which was based on grouping of predicted probabilities. The power of both tests was investigated when quadratic term or interaction term were omitted from the model. The proposed test XP*G shows good Type I error and ample power except for multinomial models with highly skewed covariate distribution. Additionally, the proposed test XP*G has good power in detecting omission of continuous interaction term. Further simulation results showd that partitioning strategy using Hierarchical Clustering with Canberra distance, %C,G performs better than XP*G (Hiearchical clustering with Euclidean distance) and XI*G (Partitioning using k-medoids). The application on a real dataset confirmed the simulation results. The simulation and analyses were carried out using R, an open-source programming language for statistical computing and graphics. 2017 Thesis NonPeerReviewed text en https://ir.uitm.edu.my/id/eprint/66514/1/66514.pdf Types of covariate and distribution effects on parameter estimates and goodness-of-fit test using clustering partitioning strategy for multinomial logistic regression / Hamzah Abdul Hamid. (2017) PhD thesis, thesis, Universiti Teknologi MARA (UiTM).
institution Universiti Teknologi Mara
building Tun Abdul Razak Library
collection Institutional Repository
continent Asia
country Malaysia
content_provider Universiti Teknologi Mara
content_source UiTM Institutional Repository
url_provider http://ir.uitm.edu.my/
language English
topic Regression analysis. Correlation analysis. Spatial analysis (Statistics)
spellingShingle Regression analysis. Correlation analysis. Spatial analysis (Statistics)
Abdul Hamid, Hamzah
Types of covariate and distribution effects on parameter estimates and goodness-of-fit test using clustering partitioning strategy for multinomial logistic regression / Hamzah Abdul Hamid
description This thesis presents a simulation study on parameter estimation for binary and multinomial logistic regression, and the extension of the clustering partitioning strategy for goodness-of-fit test to multinomial logistic regression model. The motivation behind this study is influenced by two main factors. Firstly, parameter estimation is often sensitive to sample size and types of data. Simulation studies are useful to assess and confirm the effects of parameter estimation for binary and multinomial logistic regression under various conditions. The first phase of this study covers the effect of different types of covariate, distributions and sample size on parameter estimation for binary and multinomial logistic regression model. Data were simulated for different sample sizes, types of covariate (continuous, count, categorical) arid distributions (normal or skewed for continuous variable). The simulation results show that the effect of skewed and categorical covariate reduces as sample size increases. The parameter estimates for normal distribution covariate apparently are less affected by sample size. For multinomial logistic regression model with a single covariate, a sample size of at least 300 is required to obtain unbiased estimates when the covariate is positively skewed or is a categorical covariate. A much larger sample size is required when covariates are negatively skewed. In Phase 2, we investigate the goodness-of-fit (GoF) tests for multinomial logistic regression. Goodness-of-fit tests are important to assess if the model fits the data. We investigated the Type I error and power of two goodness-of-fit tests for multinomial logistic regression via a simulation study. The GoF test using partitioning strategy (clustering) in the covariate space, XP*G w a s compared with another test, Cg which was based on grouping of predicted probabilities. The power of both tests was investigated when quadratic term or interaction term were omitted from the model. The proposed test XP*G shows good Type I error and ample power except for multinomial models with highly skewed covariate distribution. Additionally, the proposed test XP*G has good power in detecting omission of continuous interaction term. Further simulation results showd that partitioning strategy using Hierarchical Clustering with Canberra distance, %C,G performs better than XP*G (Hiearchical clustering with Euclidean distance) and XI*G (Partitioning using k-medoids). The application on a real dataset confirmed the simulation results. The simulation and analyses were carried out using R, an open-source programming language for statistical computing and graphics.
format Thesis
author Abdul Hamid, Hamzah
author_facet Abdul Hamid, Hamzah
author_sort Abdul Hamid, Hamzah
title Types of covariate and distribution effects on parameter estimates and goodness-of-fit test using clustering partitioning strategy for multinomial logistic regression / Hamzah Abdul Hamid
title_short Types of covariate and distribution effects on parameter estimates and goodness-of-fit test using clustering partitioning strategy for multinomial logistic regression / Hamzah Abdul Hamid
title_full Types of covariate and distribution effects on parameter estimates and goodness-of-fit test using clustering partitioning strategy for multinomial logistic regression / Hamzah Abdul Hamid
title_fullStr Types of covariate and distribution effects on parameter estimates and goodness-of-fit test using clustering partitioning strategy for multinomial logistic regression / Hamzah Abdul Hamid
title_full_unstemmed Types of covariate and distribution effects on parameter estimates and goodness-of-fit test using clustering partitioning strategy for multinomial logistic regression / Hamzah Abdul Hamid
title_sort types of covariate and distribution effects on parameter estimates and goodness-of-fit test using clustering partitioning strategy for multinomial logistic regression / hamzah abdul hamid
publishDate 2017
url https://ir.uitm.edu.my/id/eprint/66514/1/66514.pdf
https://ir.uitm.edu.my/id/eprint/66514/
_version_ 1756687558501203968