Model-based classification with predictors subjected to detection limit
In the biomedical field, important factors such as biomarkers are often left censored due to the limitations of the measuring instruments. Classification cannot be performed on a dataset with predictors subjected to detection limit and hence needs to be processed. Imputation, the process of assignin...
Saved in:
Main Author: | |
---|---|
Other Authors: | |
Format: | Final Year Project |
Language: | English |
Published: |
Nanyang Technological University
2024
|
Subjects: | |
Online Access: | https://hdl.handle.net/10356/181352 |
Tags: |
Add Tag
No Tags, Be the first to tag this record!
|
Institution: | Nanyang Technological University |
Language: | English |
id |
sg-ntu-dr.10356-181352 |
---|---|
record_format |
dspace |
spelling |
sg-ntu-dr.10356-1813522024-11-26T05:57:00Z Model-based classification with predictors subjected to detection limit Chong, Ryan Xiang Liming School of Physical and Mathematical Sciences LMXiang@ntu.edu.sg Mathematical Sciences Detection limit Left censoring Missing data Imputation Logistic regression In the biomedical field, important factors such as biomarkers are often left censored due to the limitations of the measuring instruments. Classification cannot be performed on a dataset with predictors subjected to detection limit and hence needs to be processed. Imputation, the process of assigning a value to the missing ones, is used to keep the information in the dataset to the greatest extent while outputting a complete dataset. This paper aims to identify and evaluate different state-of-the-art imputation methods that effectively improve classification performance. Examples such as Multiple Imputation by Chained Equations (MICE) and Generative Adversarial Imputation Nets (GAIN) will be used to complete the dataset and their performance will be evaluated after fitting the logistic regression model to the different imputed datasets. Necessary metrics such as mean squared error and accuracy will be used for comparison. It was found that GAIN performed the best when the censored rate was low, while MICE performed consistently well in different situations. A small subset of data from the study of Genetic and Inflammatory Markers of Sepsis (GenIMS) is also being used to conduct real case analysis. Bachelor's degree 2024-11-26T05:57:00Z 2024-11-26T05:57:00Z 2024 Final Year Project (FYP) Chong, R. (2024). Model-based classification with predictors subjected to detection limit. Final Year Project (FYP), Nanyang Technological University, Singapore. https://hdl.handle.net/10356/181352 https://hdl.handle.net/10356/181352 en application/pdf Nanyang Technological University |
institution |
Nanyang Technological University |
building |
NTU Library |
continent |
Asia |
country |
Singapore Singapore |
content_provider |
NTU Library |
collection |
DR-NTU |
language |
English |
topic |
Mathematical Sciences Detection limit Left censoring Missing data Imputation Logistic regression |
spellingShingle |
Mathematical Sciences Detection limit Left censoring Missing data Imputation Logistic regression Chong, Ryan Model-based classification with predictors subjected to detection limit |
description |
In the biomedical field, important factors such as biomarkers are often left censored due to the limitations of the measuring instruments. Classification cannot be performed on a dataset with predictors subjected to detection limit and hence needs to be processed. Imputation, the process of assigning a value to the missing ones, is used to keep the information in the dataset
to the greatest extent while outputting a complete dataset. This paper aims to identify and evaluate different state-of-the-art imputation methods that effectively improve classification performance. Examples such as Multiple Imputation by Chained Equations (MICE) and Generative Adversarial Imputation Nets (GAIN) will be used to complete the dataset and their performance will be evaluated after fitting the logistic regression model to the different imputed datasets. Necessary metrics such as mean squared error and accuracy will be used
for comparison. It was found that GAIN performed the best when the censored rate was low,
while MICE performed consistently well in different situations. A small subset of data from the study of Genetic and Inflammatory Markers of Sepsis (GenIMS) is also being used to conduct real case analysis. |
author2 |
Xiang Liming |
author_facet |
Xiang Liming Chong, Ryan |
format |
Final Year Project |
author |
Chong, Ryan |
author_sort |
Chong, Ryan |
title |
Model-based classification with predictors subjected to detection limit |
title_short |
Model-based classification with predictors subjected to detection limit |
title_full |
Model-based classification with predictors subjected to detection limit |
title_fullStr |
Model-based classification with predictors subjected to detection limit |
title_full_unstemmed |
Model-based classification with predictors subjected to detection limit |
title_sort |
model-based classification with predictors subjected to detection limit |
publisher |
Nanyang Technological University |
publishDate |
2024 |
url |
https://hdl.handle.net/10356/181352 |
_version_ |
1816859003496431616 |