Efficient missing data technique for prediction of nasopharyngeal carcinoma recurrence
This study aims to investigate efficient missing data techniques for prediction of nasopharyngeal carcinoma (NPC) recurrence. Initially, clinical data of patients with NPC who received treatment at Ramathibodi hospital, Thailand, were collected. In total, 495 records were employed for the cancer rec...
Saved in:
Main Authors: | , , |
---|---|
Other Authors: | |
Format: | Article |
Published: |
2018
|
Subjects: | |
Online Access: | https://repository.li.mahidol.ac.th/handle/123456789/31646 |
Tags: |
Add Tag
No Tags, Be the first to tag this record!
|
Institution: | Mahidol University |
Summary: | This study aims to investigate efficient missing data techniques for prediction of nasopharyngeal carcinoma (NPC) recurrence. Initially, clinical data of patients with NPC who received treatment at Ramathibodi hospital, Thailand, were collected. In total, 495 records were employed for the cancer recurrence prediction. Due to the fact that these data contain different missing values, appropriate missing data techniques (MDTs) must be examined. In this study, complete-case analysis, mean imputation, k-nearest neighbor imputation and Expectation Maximization (EM) imputation are mainly focused. The completed data are then used for developing three different predictive models, i.e., single-point model, multiple-point model and sequential neural network. The experimental results showed that EM imputation was superior to the other missing data techniques in which it provided highest predictive performance in all models. The average area under the receiver operating characteristic curve (AUC) of 0.72 could be achieved. The Hosmer and Lemeshow goodness of fittest was used for evaluating goodness of fit of each model. The results confirmed that EM imputation was the best missing data technique. The sequential neural network outperformed the other models. It provided the highest predictive performances in terms of the average AUC (0.73) and the Chi-square statistic (4.30). In addition, survival curves generated from these predictive models were compared with that of the Kaplan-Meier survival curve. The curves based on EM imputation were closest to the Kaplan-Meier model. From the log-rank test, however, these curves were significantly different (p-value < 0.05). © 2013 Asian Network for Scientific Information. |
---|