External Validation of Deep Learning Algorithms for Cardiothoracic Ratio Measurement

Recent advances in machine learning have made it possible to create automated systems for medical image diagnosis. Cardiothoracic ratio (CTR) measurement, a common procedure for assessing cardiac abnormality in chest radiographs, has been investigated by several deep learning studies aiming to autom...

Full description

Saved in:
Bibliographic Details
Main Authors: Warasinee Chaisangmongkon, Isarun Chamveha, Tretap Promwiset, Pairash Saiviroonporn, Trongtum Tongdee
Other Authors: Siriraj Hospital
Format: Article
Published: 2022
Subjects:
Online Access:https://repository.li.mahidol.ac.th/handle/123456789/76735
Tags: Add Tag
No Tags, Be the first to tag this record!
Institution: Mahidol University
Description
Summary:Recent advances in machine learning have made it possible to create automated systems for medical image diagnosis. Cardiothoracic ratio (CTR) measurement, a common procedure for assessing cardiac abnormality in chest radiographs, has been investigated by several deep learning studies aiming to automate the process. However, of key consideration is whether automated CTR measurements by machine learning models can yield CTR values as accurately and consistently as trained human technicians on unseen data and thereby be considered trustworthy for clinical application. To assess this, we performed external validation of automated CTR algorithms on a dataset of 7,517 images, comparing four variants of U-Net architecture in heart and lung segmentations and CTR calculations: VGG-11, VGG-16, SegNet, and AlbuNet. We then benchmarked their performance against two human experts manually measuring CTR on the same images in a clinical setting such that we could equitably compare model-to-human variation against human-to-human variation. Our analysis shows that AlbuNet demonstrates human-level performance in CTR measurements, achieving MAPE of 2.38%, which is on par with human-to-human inter-rater variability (2.53%) when using the manual measurement method. The other three U-Net variants, particularly VGG-16, also performed similarly well. Additionally, we conducted an extreme outlier analysis on each model architecture, assessing the percentage of samples with higher measurement errors than the maximum error from the manual method. AlbuNet outperformed other architectures with only 0.35% occurrence of extreme outliers, while the other three U-Net variants ranged from 0.64% to 1.06% occurrence. Overall, the deep-learning-based algorithm was demonstrated to be as reliable as the manual method and shows strong potential for assisting radiologists in the CTR measurement process in clinical practice.