Assessing the external validity of machine learning-based detection of glaucoma

Studies using machine learning (ML) approaches have reported high diagnostic accuracies for glaucoma detection. However, none assessed model performance across ethnicities. The aim of the study is to externally validate ML models for glaucoma detection from optical coherence tomography (OCT) data. W...

Full description

Saved in:
Bibliographic Details
Main Authors: Li, Chi, Chua, Jacqueline, Schwarzhans, Florian, Husain, Rahat, Girard, Michaël J. A., Majithia, Shivani, Tham, Yih-Chung, Cheng, Ching-Yu, Aung, Tin, Fischer, Georg, Vass, Clemens, Bujor, Inna, Kwoh, Chee Keong, Popa-Cherecheanu, Alina, Schmetterer, Leopold, Wong, Damon
Other Authors: School of Computer Science and Engineering
Format: Article
Language:English
Published: 2023
Subjects:
Online Access:https://hdl.handle.net/10356/168760
Tags: Add Tag
No Tags, Be the first to tag this record!
Institution: Nanyang Technological University
Language: English
Description
Summary:Studies using machine learning (ML) approaches have reported high diagnostic accuracies for glaucoma detection. However, none assessed model performance across ethnicities. The aim of the study is to externally validate ML models for glaucoma detection from optical coherence tomography (OCT) data. We performed a prospective, cross-sectional study, where 514 Asians (257 glaucoma/257 controls) were enrolled to construct ML models for glaucoma detection, which was then tested on 356 Asians (183 glaucoma/173 controls) and 138 Caucasians (57 glaucoma/81 controls). We used the retinal nerve fibre layer (RNFL) thickness values produced by the compensation model, which is a multiple regression model fitted on healthy subjects that corrects the RNFL profile for anatomical factors and the original OCT data (measured) to build two classifiers, respectively. Both the ML models (area under the receiver operating [AUC] = 0.96 and accuracy = 92%) outperformed the measured data (AUC = 0.93; P < 0.001) for glaucoma detection in the Asian dataset. However, in the Caucasian dataset, the ML model trained with compensated data (AUC = 0.93 and accuracy = 84%) outperformed the ML model trained with original data (AUC = 0.83 and accuracy = 79%; P < 0.001) and measured data (AUC = 0.82; P < 0.001) for glaucoma detection. The performance with the ML model trained on measured data showed poor reproducibility across different datasets, whereas the performance of the compensated data was maintained. Care must be taken when ML models are applied to patient cohorts of different ethnicities.