Model-based classification with predictors subjected to detection limit
In the biomedical field, important factors such as biomarkers are often left censored due to the limitations of the measuring instruments. Classification cannot be performed on a dataset with predictors subjected to detection limit and hence needs to be processed. Imputation, the process of assignin...
Saved in:
Main Author: | |
---|---|
Other Authors: | |
Format: | Final Year Project |
Language: | English |
Published: |
Nanyang Technological University
2024
|
Subjects: | |
Online Access: | https://hdl.handle.net/10356/181352 |
Tags: |
Add Tag
No Tags, Be the first to tag this record!
|
Institution: | Nanyang Technological University |
Language: | English |
Summary: | In the biomedical field, important factors such as biomarkers are often left censored due to the limitations of the measuring instruments. Classification cannot be performed on a dataset with predictors subjected to detection limit and hence needs to be processed. Imputation, the process of assigning a value to the missing ones, is used to keep the information in the dataset
to the greatest extent while outputting a complete dataset. This paper aims to identify and evaluate different state-of-the-art imputation methods that effectively improve classification performance. Examples such as Multiple Imputation by Chained Equations (MICE) and Generative Adversarial Imputation Nets (GAIN) will be used to complete the dataset and their performance will be evaluated after fitting the logistic regression model to the different imputed datasets. Necessary metrics such as mean squared error and accuracy will be used
for comparison. It was found that GAIN performed the best when the censored rate was low,
while MICE performed consistently well in different situations. A small subset of data from the study of Genetic and Inflammatory Markers of Sepsis (GenIMS) is also being used to conduct real case analysis. |
---|