Generalizability of an acute kidney injury prediction model across health systems

Cao, Jie; Zhang, Xiaosong; Shahinian, Vahakn; Yin, Huiying; Steffick, Diane; Saran, Rajiv; Crowley, Susan; Mathis, Michael; Nadkarni, Girish N.; Heung, Michael; Singh, Karandeep

doi:10.1038/s42256-022-00563-8

Article
Published: 01 December 2022

Generalizability of an acute kidney injury prediction model across health systems

Jie Cao¹,
Xiaosong Zhang²,
Vahakn Shahinian^2,3,
Huiying Yin²,
Diane Steffick^2,3,
Rajiv Saran^2,3,4,
Susan Crowley⁵,
Michael Mathis ORCID: orcid.org/0000-0002-9697-2212^6,7,
Girish N. Nadkarni ORCID: orcid.org/0000-0001-6319-4314^8,9,
Michael Heung^2,3^na1 &
…
Karandeep Singh ORCID: orcid.org/0000-0001-8980-2330^3,7,10,11^na1

Nature Machine Intelligence volume 4, pages 1121–1129 (2022)Cite this article

1743 Accesses
11 Citations
258 Altmetric
Metrics details

Subjects

Abstract

Delays in the identification of acute kidney injury in hospitalized patients are a major barrier to the development of effective interventions for treatment. A recent study described a series of models that outperformed previously published models in predicting acute kidney injury up to 48 h in advance, including a recurrent neural network that achieved state-of-the-art performance (area under the curve 0.92) and a gradient-boosted decision tree model that was close behind (area under the curve 0.89). Because these models were trained in a population of US veterans that was 94% male, questions have arisen about its generalizability to other health systems where the populations are more sex balanced. In this study, we aimed to evaluate how well an acute kidney injury model trained in a population of US veterans performs in females at the Veterans Affairs and the extent to which its performance generalizes to a large academic hospital setting. We found that the model performed worse in predicting acute kidney injury in females in both populations, with miscalibration in lower stages of acute kidney injury and worse discrimination (a lower area under the curve) in higher stages of acute kidney injury. We demonstrate that, while this discrepancy in performance can be largely corrected in non-veterans by updating the original model using data from a sex-balanced academic hospital cohort, the worse model performance persists in veterans. Our study sheds light on the importance of characterizing the generalizability of artificial intelligence studies, and on the complexity of discrepancies in model performance in subgroups that cannot be explained simply on the basis of sample size.

Access through your institution

Buy or subscribe

This is a preview of subscription content, access via your institution

Access options

Access through your institution

Buy this article

Purchase on Springer Link
Instant access to full article PDF

Buy now

Prices may be subject to local taxes which are calculated during checkout

**Fig. 1: Representation of the EHR data for the proposed model.**

Deep-learning-based real-time prediction of acute kidney injury outperforms human predictive performance

Article Open access 26 October 2020

Development of a prediction score for in-hospital mortality in COVID-19 patients with acute kidney injury: a machine learning approach

Article Open access 24 December 2021

Development and validation of a deep neural network model to predict postoperative mortality, acute kidney injury, and reintubation using a single feature set

Article Open access 20 April 2020

Data availability

This study used data from the national Veterans Health Administration’s Corporate Data Warehouse and the University of Michigan. Analyses were performed in secure locations within the VA and UM information systems, respectively. The data in this study are not publicly available because they contain protected health information, and restrictions apply to their use. A sample of processed data from six patients has been made available online¹⁹.

Researchers interested in obtaining deidentified Michigan Medicine patient data should contact PHDataHelp@umich.edu to obtain guidance on which regulatory and compliance requirements need to be fulfilled to obtain access to the Precision Health data resources. More details about the data and the access process are available at https://precisionhealth.umich.edu/.Source data are provided with this paper.

Code availability

Data preparation code, an example of prepared data, the original and extended models trained in this study, and code to generate predictions from the provided data are available online¹⁹. Data preparation requires the gpmodels R package²⁸.

References

Hoste, E. A. J. et al. Global epidemiology and outcomes of acute kidney injury. Nat. Rev. Nephrol. 14, 607–625 (2018).
Article Google Scholar
Wilson, F. P. et al. Automated, electronic alerts for acute kidney injury: a single-blind, parallel-group, randomised controlled trial. Lancet 385, 1966–1974 (2015).
Article Google Scholar
Koyner, J. L., Adhikari, R., Edelson, D. P. & Churpek, M. M. Development of a multicenter ward-based AKI prediction model. Clin. J. Am. Soc. Nephrol. 11, 1935–1943 (2016).
Article Google Scholar
Koyner, J. L., Carey, K. A., Edelson, D. P. & Churpek, M. M. The development of a machine learning inpatient acute kidney injury prediction model. Crit. Care Med. 46, 1070–1077 (2018).
Article Google Scholar
Peng, J.-C. et al. Development of mortality prediction model in the elderly hospitalized AKI patients. Sci. Rep. 11, 15157 (2021).
Article Google Scholar
Haines, R. W. et al. Acute kidney injury in trauma patients admitted to critical care: development and validation of a diagnostic prediction model. Sci. Rep. 8, 3665 (2018).
Article Google Scholar
Motwani, S. S. et al. Development and validation of a risk prediction model for acute kidney injury after the first course of cisplatin. J. Clin. Oncol. 36, 682 (2018).
Article Google Scholar
Tomašev, N. et al. A clinically applicable approach to continuous prediction of future acute kidney injury. Nature 572, 116–119 (2019).
Article Google Scholar
McCradden, M. D., Stephenson, E. A. & Anderson, J. A. Clinical research underlies ethical integration of healthcare artificial intelligence. Nat. Med. 26, 1325–1326 (2020).
Article Google Scholar
Tomašev, N. et al. Use of deep learning to develop continuous-risk models for adverse event prediction from electronic health records. Nat. Protoc. 16, 2765–2787 (2021).
Article Google Scholar
Google. EHR modeling framework. GitHub https://github.com/google/ehr-predictions (2021).
Haibe-Kains, B. et al. Transparency and reproducibility in artificial intelligence. Nature 586, E14–E16 (2020).
Article Google Scholar
McDermott, M. B. A. et al. Reproducibility in machine learning for health research: still a ways to go. Sci. Transl. Med. 13, eabb1655 (2021).
Article Google Scholar
Stupple, A., Singerman, D. & Celi, L. A. The reproducibility crisis in the age of digital medicine. npj Digit. Med. 2, 2 (2019).
Article Google Scholar
Carter, R. E., Attia, Z. I., Lopez-Jimenez, F. & Friedman, P. A. Pragmatic considerations for fostering reproducible research in artificial intelligence. npj Digit. Med. 2, 42 (2019).
Article Google Scholar
Singh, K., Beam, A. L. & Nallamothu, B. K. Machine learning in clinical journals: moving from inscrutable to informative. Circ. Cardiovasc. Qual. Outcomes 13, e007491 (2020).
Article Google Scholar
Robbins, R. et al. AI systems are worse at diagnosing disease when training data is skewed by sex. STAT https://www.statnews.com/2020/05/25/ai-systems-training-data-sex-bias/ (2020).
Larrazabal, A. J., Nieto, N., Peterson, V., Milone, D. H. & Ferrante, E. Gender imbalance in medical imaging datasets produces biased classifiers for computer-aided diagnosis. Proc. Natl Acad. Sci. USA 117, 12592–12594 (2020).
Article Google Scholar
Singh, K. ML4LHS/va-aki-model: initial release. Zenodo https://doi.org/10.5281/zenodo.7129945 (2022).
World Health Organization International Classification of Diseases (ICD) https://www.who.int/standards/classifications/classification-of-diseases (2022).
Sundararajan, V. et al. New ICD-10 version of the Charlson comorbidity index predicted in-hospital mortality. J. Clin. Epidemiol. 57, 1288–1294 (2004).
Article Google Scholar
Khwaja, A. KDIGO clinical practice guidelines for acute kidney injury. Nephron Clin. Pract. 120, c179–c184 (2012).
Article Google Scholar
Hand, D. J. & Till, R. J. A simple generalisation of the area under the ROC curve for multiple class classification problems. Mach. Learn. 45, 171–186 (2001).
Article MATH Google Scholar
DeLong, E. R., DeLong, D. M. & Clarke-Pearson, D. L. Comparing the areas under two or more correlated receiver operating characteristic curves: a nonparametric approach. Biometrics 44, 837–845 (1988).
Article MATH Google Scholar
Morris, N. tboot: Tilted bootstrap. R package version 0.2.1 (2020).
Friedman, J. H. Greedy function approximation: a gradient boosting machine. Ann. Stat. 29, 1189–1232 (2001).
R Core Team. R: A language and environment for statistical computing. (R Foundation for Statistical Computing, 2022) https://www.R-project.org/
Singh, K. & Meyer, S. R. ML4LHS/gpmodels: initial release. Zenodo https://doi.org/10.5281/zenodo.7158501 (2022).
LeDell, E. h2o: R interface for the ‘H2O’ scalable machine learning platform. R package version 3.36.0.2 (2022).
Pafka, S. GBM performance. GitHub https://github.com/szilard/GBM-perf (2021).

Download references

Acknowledgements

This work was supported in part by the Veterans Health Association Innovation Program contract number 36C10B18C2766 (received by X.Z., V.S., H.Y., D.S., R.S., M.H. and K.S.) and through NIDDK R01DK133226 (received by M.M. and K.S.).

Author information

These authors contributed equally: Michael Heung, Karandeep Singh.

Authors and Affiliations

Department of Computational Medicine and Bioinformatics, University of Michigan Medical School, Ann Arbor, MI, USA
Jie Cao
Kidney Epidemiology and Cost Center, School of Public Health, University of Michigan, Ann Arbor, MI, USA
Xiaosong Zhang, Vahakn Shahinian, Huiying Yin, Diane Steffick, Rajiv Saran & Michael Heung
Division of Nephrology, Department of Internal Medicine, University of Michigan Medical School, Ann Arbor, MI, USA
Vahakn Shahinian, Diane Steffick, Rajiv Saran, Michael Heung & Karandeep Singh
Department of Epidemiology, School of Public Health, University of Michigan, Ann Arbor, MI, USA
Rajiv Saran
Renal Section, VA Connecticut Healthcare System, West Haven, CT, USA
Susan Crowley
Department of Anesthesiology, University of Michigan Medical School, Ann Arbor, MI, USA
Michael Mathis
Center of Computational Medicine and Bioinformatics, University of Michigan Medical School, Ann Arbor, MI, USA
Michael Mathis & Karandeep Singh
Mount Sinai Clinical Intelligence Center, Icahn School of Medicine at Mount Sinai, New York, NY, USA
Girish N. Nadkarni
Division of Data Driven and Digital Medicine (D3M), Department of Medicine, Icahn School of Medicine at Mount Sinai, New York, NY, USA
Girish N. Nadkarni
Department of Learning Health Sciences, University of Michigan Medical School, Ann Arbor, MI, USA
Karandeep Singh
School of Information, University of Michigan, Ann Arbor, MI, USA
Karandeep Singh

Authors

Jie Cao
View author publications
You can also search for this author in PubMed Google Scholar
Xiaosong Zhang
View author publications
You can also search for this author in PubMed Google Scholar
Vahakn Shahinian
View author publications
You can also search for this author in PubMed Google Scholar
Huiying Yin
View author publications
You can also search for this author in PubMed Google Scholar
Diane Steffick
View author publications
You can also search for this author in PubMed Google Scholar
Rajiv Saran
View author publications
You can also search for this author in PubMed Google Scholar
Susan Crowley
View author publications
You can also search for this author in PubMed Google Scholar
Michael Mathis
View author publications
You can also search for this author in PubMed Google Scholar
Girish N. Nadkarni
View author publications
You can also search for this author in PubMed Google Scholar
Michael Heung
View author publications
You can also search for this author in PubMed Google Scholar
Karandeep Singh
View author publications
You can also search for this author in PubMed Google Scholar

Contributions

V.S., R.S., S.C., M.H. and K.S. conceived and designed the study. J.C., X.Z., H.Y., D.S., M.H. and K.S. acquired, analysed and interpreted data. J.C., X.Z. and K.S. participated in the creation of the software used in this work. J.C. drafted the manuscript. X.Z., V.S., H.Y., D.S., R.S., S.C., M.M., G.N.N., M.H. and K.S. substantively revised the manuscript. All authors have approved the submitted version and have agreed both to be personally accountable for the author’s own contributions and to ensure that questions related to the accuracy or integrity of any part of the work, even ones in which the author was not personally involved, are appropriately investigated and resolved, and the resolution documented in the literature.

Corresponding author

Correspondence to Karandeep Singh.

Ethics declarations

Competing interests

K.S.’s institution receives grant funding from Teva Pharmaceuticals and Blue Cross Blue Shield of Michigan for unrelated work, and K.S. serves on an advisory board for Flatiron Health. M.M. has received research grants from the US National Institutes of Health (NHLBI K01HL141701). G.N.N. is also supported by R01DK108803, U01HG007278, U01HG009610 and 1U01DK116100. G.N.N. reports personal income and equity and stock options from Renalytix and pulseData. G.N.N. is a scientific cofounder of Renalytix, Verici Dx, Pensieve Health, Nexus Health Connect and Data2Wisdom and owns equity in these companies. G.N.N. has received personal income from Siemens Healthineers, Variant Bio, AstraZeneca, Reata, BioVie, Daiichi Sankyo, Cambridge Health Consulting, Qiming Capital and GLG Consulting in the past three years. M.H. receives research grant funding from Astute Medical Inc. and Spectral Medical Inc., and serves as a consultant for Wolters-Kluwer Inc., Potrero Inc. and CardioSounds Inc. The remaining authors declare no competing interests.

Peer review

Peer review information

Nature Machine Intelligence thanks Shalmali Joshi and the other, anonymous, reviewer(s) for their contribution to the peer review of this work.

Additional information

Publisher’s note Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Extended data

Extended Data Fig. 1 Model performance (AUC) of the original VA model at each VA hospital.

Model performance of the original model at each VA hospital in the test set, along with characteristics of each VA hospital. A. Model performance with respect to area under the curve (AUC) with 95% CI of the original VA model for predicting AKI-1+ at each VA hospital. The center dot represents the AUC when the original model is applied to the hospital, and the 95% CI is calculated by the DeLong’s method²⁴. B. Number of predictions (after excluding those with AKI-1+ at baseline) at each VA hospital. C. Hospitalization-level AKI-1+ incidence in the test set (after excluding those with AKI-1+ at baseline) at each VA hospital. Five VA hospitals are not shown here due to small cohort sizes (<30 patients).

Source data

Extended Data Fig. 2 Calibration of the original VA model a) VA test set b) UM test set.

The calibration of the original model on the a) VA test set and b) UM test set. The predicted probabilities (deciles) are plotted against the observed probabilities with 95% confidence intervals. The diagonal line demonstrates the ideal calibration. The model calibration is examined for all patients (red), females only (green), and males only (blue).

Source data

Extended Data Fig. 3 Calibration of the extended VA model at UM.

The calibration of the extended model in the UM test set. The predicted probabilities (deciles) are plotted against the observed probabilities with 95% CI. The diagonal line demonstrates the ideal calibration. The model calibration is examined for all patients (red), females only (green), and males only (blue).

Source data

Extended Data Fig. 4 Predictor importance plot of the original and extended VA model.

Top 20 important predictors of the original VA model (top) and the extended VA model (bottom). Predictors are ranked by their relative importance and expressed as a percentage.

Extended Data Table 1 AKI Incidence in the VA and UM cohorts, by acute kidney injury stage, by sex

Full size table

Extended Data Table 2 Model performance (AUC) of the extended VA models at VA, by outcomes stage, by sex

Full size table

Extended Data Table 3 Model Performance (AUC) of the original and extended VA models at VA, by outcome stage, by race

Full size table

Extended Data Table 4 Model Performance (AUC) of the original and extended VA models at UM, by outcome stage, by race

Full size table

Supplementary information

Supplementary Information

Supplementary Tables 1–5

Reporting Summary

Source data

Source Data Extended Data Fig. 1

Statistical source data for Extended Data Figure 1

Source Data Extended Data Fig. 2

Statistical source data for Extended Data Figure 2

Source Data Extended Data Fig. 3

Statistical source data for Extended Data Figure 3

Rights and permissions

Springer Nature or its licensor (e.g. a society or other partner) holds exclusive rights to this article under a publishing agreement with the author(s) or other rightsholder(s); author self-archiving of the accepted manuscript version of this article is solely governed by the terms of such publishing agreement and applicable law.

Reprints and permissions

About this article

Cite this article

Cao, J., Zhang, X., Shahinian, V. et al. Generalizability of an acute kidney injury prediction model across health systems. Nat Mach Intell 4, 1121–1129 (2022). https://doi.org/10.1038/s42256-022-00563-8

Download citation

Received: 10 May 2022
Accepted: 11 October 2022
Published: 01 December 2022
Issue Date: December 2022
DOI: https://doi.org/10.1038/s42256-022-00563-8

This article is cited by

A practical guide to the implementation of AI in orthopaedic research – part 1: opportunities in clinical application and overcoming existing challenges
- Bálint Zsidai
- Ann‐Sophie Hilkert
- Robert Feldt
Journal of Experimental Orthopaedics (2023)
What is acute kidney injury? A visual guide
- Michael Eisenstein
Nature (2023)

Subjects

Abstract

Access options

Similar content being viewed by others

Data availability

Code availability

References

Acknowledgements

Author information

Authors and Affiliations

Contributions

Corresponding author

Ethics declarations

Competing interests

Peer review

Peer review information

Additional information

Extended data

Supplementary information

Source data

Rights and permissions

About this article

Cite this article

Share this article

This article is cited by

Search

Quick links