Can we reliably automate clinical prognostic modelling? A retrospective cohort study for ICU triage prediction of in-hospital mortality of COVID-19 patients in the Netherlands

Verfasser:	Vagliano, I. Brinkman, S. Abu-Hanna, A. Arbous, M. S. Dongelmans, D. A. Elbers, P. W. G. de Lange, D. W. van der Schaar, M. de Keizer, N. F. Schut, M. C.
Dokumenttyp:	Artikel
Erscheinungsdatum:	2022
Reihe/Periodikum:	Vagliano , I , Brinkman , S , Abu-Hanna , A , Arbous , M S , Dongelmans , D A , Elbers , P W G , de Lange , D W , van der Schaar , M , de Keizer , N F & Schut , M C 2022 , ' Can we reliably automate clinical prognostic modelling? A retrospective cohort study for ICU triage prediction of in-hospital mortality of COVID-19 patients in the Netherlands ' , International Journal of Medical Informatics , vol. 160 , 104688 . https://doi.org/10.1016/j.ijmedinf.2022.104688
Sprache:	Englisch
Permalink:	https://search.fid-benelux.de/Record/base-27232047
Datenquelle:	BASE; Originalkatalog
Powered By:	BASE
Link(s) :	https://research.vumc.nl/en/publications/3aa042c1-1c33-4230-aecd-875f1c2592dc

Background: Building Machine Learning (ML) models in healthcare may suffer from time-consuming and potentially biased pre-selection of predictors by hand that can result in limited or trivial selection of suitable models. We aimed to assess the predictive performance of automating the process of building ML models (AutoML) in-hospital mortality prediction modelling of triage COVID-19 patients at ICU admission versus expert-based predictor pre-selection followed by logistic regression. Methods: We conducted an observational study of all COVID-19 patients admitted to Dutch ICUs between February and July 2020. We included 2,690 COVID-19 patients from 70 ICUs participating in the Dutch National Intensive Care Evaluation (NICE) registry. The main outcome measure was in-hospital mortality. We asessed model performance (at admission and after 24h, respectively) of AutoML compared to the more traditional approach of predictor pre-selection and logistic regression. Findings: Predictive performance of the autoML models with variables available at admission shows fair discrimination (average AUROC = 0·75-0·76 (sdev = 0·03), PPV = 0·70-0·76 (sdev = 0·1) at cut-off = 0·3 (the observed mortality rate), and good calibration. This performance is on par with a logistic regression model with selection of patient variables by three experts (average AUROC = 0·78 (sdev = 0·03) and PPV = 0·79 (sdev = 0·2)). Extending the models with variables that are available at 24h after admission resulted in models with higher predictive performance (average AUROC = 0·77-0·79 (sdev = 0·03) and PPV = 0·79-0·80 (sdev = 0·10-0·17)). Conclusions: AutoML delivers prediction models with fair discriminatory performance, and good calibration and accuracy, which is as good as regression models with expert-based predictor pre-selection. In the context of the restricted availability of data in an ICU quality registry, extending the models with variables that are available at 24h after admission showed small (but significantly) performance increase.