Model-based classification with predictors subjected to detection limit

In the biomedical field, important factors such as biomarkers are often left censored due to the limitations of the measuring instruments. Classification cannot be performed on a dataset with predictors subjected to detection limit and hence needs to be processed. Imputation, the process of assignin...

Full description

Saved in:
Bibliographic Details
Main Author: Chong, Ryan
Other Authors: Xiang Liming
Format: Final Year Project
Language:English
Published: Nanyang Technological University 2024
Subjects:
Online Access:https://hdl.handle.net/10356/181352
Tags: Add Tag
No Tags, Be the first to tag this record!
Institution: Nanyang Technological University
Language: English
id sg-ntu-dr.10356-181352
record_format dspace
spelling sg-ntu-dr.10356-1813522024-11-26T05:57:00Z Model-based classification with predictors subjected to detection limit Chong, Ryan Xiang Liming School of Physical and Mathematical Sciences LMXiang@ntu.edu.sg Mathematical Sciences Detection limit Left censoring Missing data Imputation Logistic regression In the biomedical field, important factors such as biomarkers are often left censored due to the limitations of the measuring instruments. Classification cannot be performed on a dataset with predictors subjected to detection limit and hence needs to be processed. Imputation, the process of assigning a value to the missing ones, is used to keep the information in the dataset to the greatest extent while outputting a complete dataset. This paper aims to identify and evaluate different state-of-the-art imputation methods that effectively improve classification performance. Examples such as Multiple Imputation by Chained Equations (MICE) and Generative Adversarial Imputation Nets (GAIN) will be used to complete the dataset and their performance will be evaluated after fitting the logistic regression model to the different imputed datasets. Necessary metrics such as mean squared error and accuracy will be used for comparison. It was found that GAIN performed the best when the censored rate was low, while MICE performed consistently well in different situations. A small subset of data from the study of Genetic and Inflammatory Markers of Sepsis (GenIMS) is also being used to conduct real case analysis. Bachelor's degree 2024-11-26T05:57:00Z 2024-11-26T05:57:00Z 2024 Final Year Project (FYP) Chong, R. (2024). Model-based classification with predictors subjected to detection limit. Final Year Project (FYP), Nanyang Technological University, Singapore. https://hdl.handle.net/10356/181352 https://hdl.handle.net/10356/181352 en application/pdf Nanyang Technological University
institution Nanyang Technological University
building NTU Library
continent Asia
country Singapore
Singapore
content_provider NTU Library
collection DR-NTU
language English
topic Mathematical Sciences
Detection limit
Left censoring
Missing data
Imputation
Logistic regression
spellingShingle Mathematical Sciences
Detection limit
Left censoring
Missing data
Imputation
Logistic regression
Chong, Ryan
Model-based classification with predictors subjected to detection limit
description In the biomedical field, important factors such as biomarkers are often left censored due to the limitations of the measuring instruments. Classification cannot be performed on a dataset with predictors subjected to detection limit and hence needs to be processed. Imputation, the process of assigning a value to the missing ones, is used to keep the information in the dataset to the greatest extent while outputting a complete dataset. This paper aims to identify and evaluate different state-of-the-art imputation methods that effectively improve classification performance. Examples such as Multiple Imputation by Chained Equations (MICE) and Generative Adversarial Imputation Nets (GAIN) will be used to complete the dataset and their performance will be evaluated after fitting the logistic regression model to the different imputed datasets. Necessary metrics such as mean squared error and accuracy will be used for comparison. It was found that GAIN performed the best when the censored rate was low, while MICE performed consistently well in different situations. A small subset of data from the study of Genetic and Inflammatory Markers of Sepsis (GenIMS) is also being used to conduct real case analysis.
author2 Xiang Liming
author_facet Xiang Liming
Chong, Ryan
format Final Year Project
author Chong, Ryan
author_sort Chong, Ryan
title Model-based classification with predictors subjected to detection limit
title_short Model-based classification with predictors subjected to detection limit
title_full Model-based classification with predictors subjected to detection limit
title_fullStr Model-based classification with predictors subjected to detection limit
title_full_unstemmed Model-based classification with predictors subjected to detection limit
title_sort model-based classification with predictors subjected to detection limit
publisher Nanyang Technological University
publishDate 2024
url https://hdl.handle.net/10356/181352
_version_ 1816859003496431616