Forecasting length of stay: will it be clear or cloudy today?

Objective: Patient length of stay (LOS) is a vital metric for hospital operational efficiency, and shorter LOS is tied to better patient outcomes and improved financial performance. Models that provide accurate, real-time LOS forecasts can help hospitals effectively manage their resources and bed ca...

Full description

Saved in:
Bibliographic Details
Main Authors: Deng, Charles, Reddy, Arjun, Kavitesh, Bali Kavitesh, Babu, Myoungmee, Babu, Benson A.
Other Authors: School of Computer Science and Engineering
Format: Article
Language:English
Published: 2023
Subjects:
Online Access:https://hdl.handle.net/10356/164592
Tags: Add Tag
No Tags, Be the first to tag this record!
Institution: Nanyang Technological University
Language: English
Description
Summary:Objective: Patient length of stay (LOS) is a vital metric for hospital operational efficiency, and shorter LOS is tied to better patient outcomes and improved financial performance. Models that provide accurate, real-time LOS forecasts can help hospitals effectively manage their resources and bed capacity. Forecasting LOS is a perfect problem for modern machine learning methods. In this paper, we conduct a descriptive literature review of studies that use machine learning methods to predict LOS. Methods: We searched Embase, PubMed, DBLP, Google Scholar, IEEE Xplore, and Cochrane databases for articles published between 2008 and 2021 that use machine learning models to forecast patient LOS. From 87 articles identified through keyword search and the two articles identified using the snowball method, we used pre-specified inclusion criteria to select the final 12 articles in the descriptive literature review. The articles are international, retrospective, and carried out during the ML development lifecycle. Results: Most studies approached the LOS forecasting problem as a classification problem, with a minority of studies opting to train regression models instead. The most frequently used models included support vector machines, random forests, gradient boosted trees, logistic regressions, and neural networks. In general, tree-based models like random forests and gradient boosted trees had the best performance – stacked methods that combined the predictions of multiple models also performed well. Several studies used natural language processing (NLP) methods and other techniques to extract features from unstructured electronic health record data and improve model performance. In addition to model and feature selection, data preprocessing decisions, such as careful handling of missing data and resampling to address the class imbalance, significantly improved model performance. Conclusion: Machine learning methods are capable of forecasting patient LOS with impressive accuracy. However, most studies were designed as pre-deployment experimental models. As AI applications advance, a systematic approach to crafting high-quality data management and monitoring during real-time clinical ML production is essential to developing a precise prediction service. While in production, factors such as data drift recognition, monitoring, and correction are required for accurate model performance. Future longitudinal studies must validate these models during production to recognize their real-world healthcare impact.