SHAP based predictive modeling for 1 year all-cause readmission risk in elderly heart failure patients: feature selection and model interpretation

Luo, Hao; Xiang, Congyu; Zeng, Lang; Li, Shikang; Mei, Xue; Xiong, Lijuan; Liu, Yanxu; Wen, Cong; Cui, Yangyang; Du, Linqin; Zhou, Yang; Wang, Kun; Li, Lan; Liu, Zonglian; Wu, Qi; Pu, Jun; Yue, Rongchuan

doi:10.1038/s41598-024-67844-7

Download PDF

Article
Open access
Published: 31 July 2024

SHAP based predictive modeling for 1 year all-cause readmission risk in elderly heart failure patients: feature selection and model interpretation

Hao Luo¹^na1,
Congyu Xiang⁵^na1,
Lang Zeng¹^na1,
Shikang Li¹,
Xue Mei³,
Lijuan Xiong²,
Yanxu Liu¹,
Cong Wen¹,
Yangyang Cui¹,
Linqin Du¹,
Yang Zhou¹,
Kun Wang¹,
Lan Li¹,
Zonglian Liu¹,
Qi Wu⁴,
Jun Pu¹ &
…
Rongchuan Yue^1,2

Scientific Reports volume 14, Article number: 17728 (2024) Cite this article

708 Accesses
Metrics details

Subjects

Abstract

Heart failure (HF) is a significant global public health concern with a high readmission rate, posing a serious threat to the health of the elderly population. While several studies have used machine learning (ML) to develop all-cause readmission risk prediction models for elderly patients with HF, few have integrated ML-selected features with those chosen by human experts to assess HF patients readmission. A retrospective analysis of 8396 elderly HF patients hospitalized at the Affiliated Hospital of North Sichuan Medical College from January 1, 2018 to December 31, 2021 was conducted. Variables selected by XGBoost, LASSO regression, and random forest constituted the machine group, while the human expert group comprised variables chosen by two experienced cardiovascular professors. The variables selected by both groups were combined to form a human–machine collaboration group. Model performance was evaluated using the area under the receiver operating characteristic curve (AUC). The SHapley Additive exPlanations (SHAP) method was used to elucidate the importance of each predictive feature, explain the impact of individual features on the model, and provide visual representation. A total of 73 features were included for model development. The human–machine collaboration model, utilizing CatBoost, achieved an AUC of 0.83617, an F1-score of 0.73521, and a Brier score of 0.16536 on the validation set. This model demonstrated superior predictive performance compared to those created solely by human experts or machine. The SHAP plot was then used to visually display the feature analysis of the human–machine collaboration model, revealing HGB, NT-proBNP, smoking history, NYHA classification, and LVEF as the 5 most important features. This study indicate that the human–machine collaboration model outperforms those relying solely on human expert selection or machine algorithm at predicting all-cause readmission in elderly HF patients. The application of the SHAP method enhanced the interpretability of the model outcomes, aiding clinicians in accurately pinpointing risk factors associated with HF readmission. This advancement enables the formulation of tailored treatment strategies, offering a more personalized approach to patient care.

Predicting unplanned readmission due to cardiovascular disease in hospitalized patients with cancer: a machine learning approach

Article Open access 18 August 2023

A comparative study of explainable ensemble learning and logistic regression for predicting in-hospital mortality in the emergency department

Article Open access 10 February 2024

Machine learning based readmission and mortality prediction in heart failure patients

Article Open access 31 October 2023

Introduction

Heart failure (HF) represents the mid-to-late stage of various cardiovascular diseases, characterized by high incidence, mortality, and hospitalization rates. It is a significant global public health issue, leading to decreased quality of life and increased mortality among patients¹. Notably, HF predominantly affects the elderly, with incidence rising notably after the age of 60². The aging process induces degenerative changes in cardiac structure and function, resulting in myocardial cell hypertrophy, increased cardiac afterload pressure, and subsequent inadequate systemic perfusion, leading to symptoms such as dyspnea and fatigue³. Furthermore, due to the factors like an aging population, widespread risk factors for cardiovascular diseases, and improved evidence-based treatments for HF patients, the HF incidence and readmission rates continue to rise. Therefore, prioritizing the improvement of the quality of life for elderly HF patients, seeking effective diagnostic and therapeutic measures to reduce readmission rates, and striving to alleviate the associated socioeconomic burden have become imperative for global public health reform.

As the understanding of the pathophysiological mechanisms of HF deepens, significant progress has been made in its treatment, with continuous advancements in both pharmacological and device-based therapies. Despite these advancements, there are still approximately 26 million cases of HF globally, with an estimated annual cost of $108 billion, accounting for 2% of the global healthcare budget, and this expenditure is projected to continue rising⁴. Furthermore, repeated hospital readmissions due to HF pose a significant challenge for global cardiovascular physicians and patients, presenting enormous psychological, economic, and physical burdens. They often feel discouraged, anxious, and helpless, and may even experience depression, leading to a vicious cycle of repeated hospitalizations⁵. Research indicates that approximately 18.2% of patients are readmitted within 30 days, and 31.2% within 90 days⁶. And, more than 50% of these patients encounter at least one readmission within 1-year⁷. Evidence-based medicine suggests that about a quarter of HF readmissions are preventable, and the economic burden on HF patients can be avoided by reducing readmission rates⁸. Consequently, researchers and policymakers have focused on efforts to reduce readmissions and avoid excessive healthcare expenditures. For instance, the United States introduced the Hospital Readmission Reduction Program (HRRP), which initially targeted HF among other conditions. This program measures hospital quality based on readmission rates and aims to lower readmission rates by economically penalizing hospitals with higher-than-average readmission rates⁹.

However, the incidence, prevalence, and survival rates of HF vary significantly across different countries and regions due to differences in genetics, cultural backgrounds, healthcare systems, and other factors. Additionally, there is a considerable variation in the levels of HF diagnosis and treatment among hospitals of diverse capabilities and resources¹⁰. For example, ischemic heart disease is predominant in North America and Europe, while valvular heart disease is more prevalent in the Asia–Pacific and East Asia regions¹¹. Predictive tools for HF developed in other regions may have limited applicability. Consequently, it is crucial to develop precise 1-year all-cause readmission rate risk prediction models tailored to Chinese HF patients and explore their influencing factors. This is essential for determining clinically targeted interventions and developing personalized treatment plans to effectively reduce the readmission rate of elderly HF patients in China.

Assessing the risk of patient readmission is a challenging task that involves processing a large amount of clinical data and feature information. Traditionally, model features have been selected by experts, such as the widely used LACE model for predicting the 30-day readmission or mortality risk for HF patients¹². In recent years, with the advancement of computer science, ML and artificial intelligence have demonstrated unique advantages in the prevention, diagnosis, and prognosis prediction of cardiovascular diseases due to their powerful ability to capture non-linear relationships^13,14,15. For instance, Sharma compared ML models and the LACE score in predicting the 30-day unplanned readmission risk for HF patients. The study revealed that the XGBoost method can uncover more predictive factors, showing significant advantages over non-machine learning methods such as the LACE score¹⁶. In another study, Okabe et al.¹⁷ constructed a Cox proportional hazards model, indicating that worsening renal function (WRF) combined with elevated BNP at discharge is associated with increased cardiovascular mortality and HF readmission rates within 1 year. Furthermore, Chinese scholars developed an easy-to-use nomogram using a Cox regression model in a prospective cohort study of HF patients aged 65 and above, aiming to identify factors predicting 30-day and 1-year readmission or mortality for elderly HF patients in China¹⁸. Tong et al.¹⁹ compared the performance of traditional multivariate Cox regression models with three ML models at different time stages based on AUC value and Brier score. They found that models based on random survival forest and gradient boosting outperformed the traditional linear Cox model in terms of discriminative ability. Additionally, Rahman et al.²⁰ conducted a study using electronic health record (EHR) data and trained a stacking ML model using results from three top-performing models. The model achieved an AUC of 0.881, indicating the effectiveness of the proposed model in predicting emergency readmissions for HF patients.

While current research demonstrates the widespread application of ML in medical researches, there remains a notable gap in the evidence-based application and interpretability of many ML risk prediction models when it comes to clinical disease prognosis^15,21. Additionally, relying solely on ML algorithms for feature selection requires a significant amount of computational time and may overlooks the clinical expertise and specialized knowledge of cardiology experts. Conversely, relying solely on human experts for feature selection may result in the loss of valuable information. Furthermore, within the traditional statistical framework, high-dimensional non-linear relationships between clinical data and outcomes cannot be effectively identified²². To address these challenges, this study will compare models constructed based on feature sets selected by human experts and ML algorithms. By combining the strengths of both approaches in feature selection, a human–machine collaboration set is created to predict readmission within one year for elderly patients with HF. This method helps to gain a deeper understanding of the respective advantages and limitations of ML algorithms and human expert-selected features in constructing predictive models for readmission in elderly HF patients. Additionally, based on previous experience, this study applies SHAP values to the human–machine collaboration framework, providing intuitive explanations for the predictive results. This approach empowers clinical practitioners in more accurately assessing prognostic factors for the disease, thereby offering crucial insights for reducing patient readmission rates.

Methods

Research design

Initially, we identified elderly patients with HF admitted to the Affiliated Hospital of North Sichuan Medical College from January 1, 2018, to December 31, 2021. We collected their demographic details, clinical complications, laboratory results, and medication records during their hospitalization. Subsequently, we pre-processed the data using various techniques, including direct deletion, multicollinearity testing, the K-nearest neighbors (KNN) algorithm, and Min–Max normalization were applied. Following this, we employed a stratified random sampling approach to partition the patient data into a training set (70%) and a validation set (30%). We then assessed the importance of variables using three ML algorithms and identified the top 50 common features as the machine group, while the features chosen by two cardiology experts constituted the expert group. Next, we generated receiver operating characteristic (ROC) curves using 10 ML algorithms in the validation set and computed the AUC, F1-score, and Brier score to assess model performance. Subsequently, we utilized the Fast and Lightweight AutoML Library (FLAML) to optimize hyperparameters for prediction models with an AUC value exceeding 0.8. The common features selected by the expert and machine groups were integrated into the hyperparameter-optimized models. The performance metrics such as AUC, accuracy, sensitivity, recall, specificity, F1 score, and Brier score were calculated for each model. The entire process is depicted in Fig. 1.

Data extraction

This retrospective cohort study utilized database from multiple systems including the information system, laboratory information management system, picture archiving and communication system, and electronic medical record system of the Affiliated Hospital of North Sichuan Medical College. Patients aged 65 and above, who were hospitalized with a primary diagnosis of HF from January 1, 2018, to December 31, 2021, were preliminary screened. The study variables included demographic information, clinical complications, physical examination, laboratory data, echocardiography results, and inpatient medication usage. Patient follow-up for unplanned readmissions within 1-year was conducted through database searches, phone communication, and outpatient visits.

Data pre-processing

Patients who passed away or were lost to follow-up during the observation period were excluded from the study. To mitigate the impact of missing data, feature variables with a missing rate of 30% or more were removed, while those with a missing rate of less than 30% were filled using the KNN algorithm (Fig. S1). Additionally, the correlation between variables was assessed using Pearson and Spearman tests, and any variable with a correlation coefficient of 0.8 or higher were removedto avoid multicollinearity. To address scale differences between different evaluation metrics and ensure result reliability, Min–Max normalization was applied to scale the data, enhancing comparability across data indicators. The dataset was then randomly divided into a training set and a validation set in a 7:3 ratio. The training set was utilized for develop models, while the validation set was used to evaluate models performance.

Feature selection and model construction

1.
Machine Group In order to avoid overfitting and prediction bias of traditional statistical methods and ensure the stability of variable characteristics, XGBoost, random forest (RF), and LASSO regression analysis were employed. The top 50 common variables were selected as independent risk factors for all-cause readmisson in elderly HF patients.
2.
Expert Group Two cardiologists, based on clinical expertise and review of relevant published predictive models, further evaluated the clinical importance of these variables from the 73 feature variables selected after collinearity analysis. This assessment aimed to determine the risk of readmisson in elderly HF patients. Importantly, this process was conducted without seeing the machine feature set.
3.
Human–Machine Collaboration Group The variables selected by both the three ML algorithms and the experts were included as the final included features for the human–machine collaboration group. Ten ML models: including KNN, logistic regression (LR), support vector machine (SVM), RF, gaussian naive bayes (GaussianNB), decision tree (DT), gradient boosting decision tree (GBDT), light gradient boosting machine (LightGBM), extreme gradient boosting (XGBoost), and categorical boosting (CatBoost), were used for modeling. Various metrics: such as AUC value, F1-score, and Brier score, were calculated for each model. Subsequently, for the human–machine collaboration model with an AUC value greater than 0.8 in the training dataset, Based on “FLAML” framework, the optimal combination of hyperparameters is found by setting the iteration times of random search to 10, performing fivefold cross-validation, fitting training data and labels, and performing random search and cross-validation to improve the predictive performance of the model.

After the above steps, we can build 30 kinds of prediction models, and the modeling method is shown in Fig S2.

Model evaluation and result interpretation

In this study, conventional model evaluation metrics such as AUC value, accuracy, precision, recall, sensitivity, and specificity, along with the Brier score and F1-score were included to provide a comprehensive assessment of model performance. The F1-score, derived from the harmonic mean of precision and recall, offers a robust measure of model performance, with higher score indicating greater robustness. The Brier score measures the model's calibration, ranging from 0 to 1, with lower value signifying heightened predictive accuracy. To further compare the performance of models constructed by different methods, we used the DeLong test to assess the significance differences in AUC between different models²³. Furthermore, the interpretation the results of ML predictive models and the elucidation of prediction importance present ongoing challenges. To address this, we utilized SHAP, a post-hoc explanation technique, to generate SHAP values for each input feature, quantifying its impact on predictions. Consequently, our study integrated the top-performing predictive model with the SHAP framework to enhance result interpretation.

Ethical approval

This study was conducted in accordance with the Declaration of Helsinki. The study protocol has been approved by the ethics committee of the Affiliated Hospital of North Sichuan Medical College (2023ER331-1).Considering this study solely involved the analysis of patients' historical clinical data, informed consent was waived by “Affiliated Hospital of North Sichuan Medical College” Ethics Committee.

Statistical analysis

All statistical analysis and calculations were performed using Python (version 3.11 and the following packages: “pandas”, “os”, “numpy”, “warnings”, “sklearn”, “CatBoost”, “lightGBM”, “matplotlib”, “XGBoost”, “FLAML”, and “SHAP”. R language (version 4.2.3) and the “CompareGroups”, “tableone”, and “glmnet” packages were also utilized for specific analyses. Continuous variables follows a non-normal distribution is represented as [M(P25,P75)]. Group comparisons were represented by Mann–Whitney U non-parametric test. The counting is represented as frequency and percentage (n, %), and between-group comparisons were conducted using the χ² test. All tests were two-sided and P values < 0.05 were considered statistically significant.

Result

Patient characteristics

After excluding 821 deceased patients and 253 patients lost to follow-up, a total of 8396 elderly patients with HF were included in this study. Among them, 3978 patients (47.3%) experienced unplanned readmission within 1-year. The average age of the patients was 74.00 years, with 4653 male patients (55.4%) and 3743 female patients (44.6%). Furthermore, the average HGB level for readmitted patients was 106.000 g/L (95% CI 87.000–124.000), which was lower than the non-readmission group at 121.000 g/L (95% CI 108.000–133.000) (P < 0.001). The average N-terminal pro-B-type natriuretic peptide (NT-proBNP) level for readmitted patients was 2169.853 pg/mL (95% CI 1525.750–3364.000), which was significantly higher than the non-readmission group at 1355.100 pg/mL (95% CI 857.775–2738.789) (P < 0.001). No differences were observed between the groups in terms of gender and history of hypertension. Table 1 presents a detailed comparison of clinical characteristics between the readmission and non-readmission groups.

Table 1 Baseline characteristics of older patients with heart failure.

Full size table

Feature variable selection

This study initially included 87 variables based on database information. Subsequently, 8 feature variables with a missing rate greater than 30% were removed. After conducting Pearson and Spearman correlation tests, and based on clinical experience, 6 feature variables with correlation coefficients ≥ 0.8 were also removed. Following these steps, a total of 73 features were included in the study, covering basic personal information (n = 8), vital signs (n = 6), medical history (n = 5), laboratory indicators (n = 40), echocardiography data (n = 11), and in-hospital medication use (n = 3).

1.
Machine group In order to ensure the reliability and robustness of the included features, this study selected the top 50 variables in terms of importance scores from XGBoost, RF, and LASSO regression analyses as the final prognostic-related features for the machine group. Figure 2 shows the top 30 features. Among the variables selected by RF (Fig. 2A), the importance of NT-proBNP and HGB was significantly higher than other indicators. In the XGBoost algorithm (Fig. 2B), the smoking history was ranked as the most important, followed by NT-proBNP, diuretic, HGB, and NYHA classification. LASSO regression, through the addition of penalty functions and continuous coefficient compression to select relevant variables, the results show that HGB is the most importance among the variables, while the performance of NT-proBNP was poor, ranking fifteenth (Fig. 2C). A total of 32 common variables were selected.
2.
Expert group Based on their clinical experience and knowledge, the experts selected 25 variables, including age, gender, length of stay, medical history, blood indicators, renal function, blood lipids, in-hospital medication history, echocardiography data, and HF biomarkers. Notably, these features mainly focused on demographic information and laboratory data, excluding information on patients' physical examinations (Table S1).
3.
Human–machine collaboration group The machine group and the expert group combined their selected feature variables to form a collaborative group. The collaboration group comprised a total of 46 features, including 11 common features, 21 unique variables from the machine group, and 14 from the expert group. In terms of the categories of selected variables, the expert group had slightly fewer variables than the machine group, but the information included for patients was more balanced (Table S1).

Performance of expert group, machine group, and human–machine combination group models

Subsequently, the performance of the models was assessed using ten ML algorithms on the validation set, based on the three groups of research variables mentioned earlier. The corresponding ROC curves for each model are presented in Fig. 3. Moreover, various metrics were calculated to comprehensively evaluate the models' performance (Table S2). Notably, among the human–machine collaboration models, five ML algorithms exhibited an AUC value exceeding 0.8, namely RF (0.81575), XGBoost (0.81529), LightGBM (0.82921), GBDT (0.83213), and CatBoost (0.83447), as depicted in Fig. 3C. Upon selecting these five models and utilizing the “FLAML” framework to optimize the hyperparameters, their performance improved, resulting in higher AUC values of RF (0.82436), XGBoost (0.83272), LightGBM (0.83341), GBDT (0.83300), and CatBoost (0.83617), as shown in Table 2 and Fig. 3D. Comparatively, CatBoost demonstrated the highest AUC value, the lowest Brier score, and exceptional performance in terms of sensitivity, accuracy, specificity, and F1-score. Therefore, CatBoost emerged as the best ML algorithm in this study.

Table 2 Five optimized ML algorithms in a human–machine collaboration group.

Full size table

Performance comparison of three group models

In terms of model performance, some models developed by the expert group demonstrate higher accuracy and sensitivity compared to those of the machine group. However, overall, the machine group's performance was generally superior to that of the expert group (Table S2A,B). Specifically, the AUC value of CatBoost algorithm in the machine group surpasses that of the expert group (Z value = 3.00058, P < 0.05). Similarly, the AUC value achieved by the CatBoost algorithm in the human–machine collaboration group significantly outperforms that of the expert group (Z value = 4.10028, P < 0.05). Additionally, the CatBoost model developed by the human–machine collaboration group exhibits slightly higher AUC value than the machine group, along with notably improved accuracy, sensitivity, specificity, and F1-score, and a lower Brier score (Table 3). Consequently, it can be inferred that the performance of the CatBoost model developed by the human–machine collaboration group is the best among the three groups.

Table 3 Performance comparison of CatBoost prediction models constructed by three groups.

Full size table

Model result interpretation

In this study, the SHAP framework was employed to compute importance scores (SHAP values) based on the intrinsic attributes of ML algorithms. The magnitude of SHAP values is utilized to evaluate the contribution of each predictive feature for patient readmission, offering an intuitive explanation for the study’s predictive outcomes. Figure 4 presents the top 20 variables sorted by SHAP values in the CatBoost model developed through human–machine collaboration. The feature importance decreases in descending order, and the top 5 variables in terms of importance were recognized by both the expert and machine groups. To visually illustrate the impact of variables, the study provides two typical examples to demonstrate the model's interpretability, the features indicated by the red arrows increase the risk of readmission within one year for elderly HF patients, while those indicated by the blue arrows decrease the risk of readmission. As depicted in Fig. 5, for a high-risk readmission patient with a base value of 0.144, the predicted probability of unplanned readmission within 1 year is 91.96%, accompanied by an output SHAP value of 2.44. Conversely, for a low-risk readmission patient, the predicted probability is 14.42%, with an output SHAP value of -1.78. Fig. S3 presents a visual combination force plot of interpretation for top 200 samples, arranged side by side according to sample clustering similarity, with the Y-axis corresponding to the X-axis of single force plot, the trend of change through the broken line indicates that the readmission risk for the first 50 patients is lower than that of the last 150 patients. Furthermore, Fig. 6 illustrates the impact of the top 5 variables on the CatBoost model predictions under the influence of the variable with the strongest interaction effect. For example, Fig. 6A shows that HGB and HCT interact most frequently, exhibiting a negative trend between HGB and the predicted outcome. As HGB increases, SHAP values show a significant decreasing trend, indicating a lower risk of readmission. Additionally, as illustrated by the color gradient in the figure, individuals with a high HCT and an HGB of less than 100 generally have larger SHAP values than those with a low HCT, indicating a higher risk of readmission. The opposite is true for individuals with HGB greater than 100. Figure 6D shows that for NYHA class 1 and 2, NT-proBNP levels are relatively lower, corresponding to lower SHAP values. However, for NYHA class 3 and 4, higher NT-proBNP levels (more red dots) are observed, reinforcing the increased risk of readmission. Additionally, we demonstrate the output of SHAP value under the interaction of HGB and NT-proBNP (Fig. S4), the color gradient shows that individuals with higher NT-proBNP levels (purple to red dots) tend to have higher SHAP values at lower HGB levels. Conversely, at higher HGB levels, the dots are mostly blue, indicating lower NT-proBNP levels, which correspond to lower SHAP values, indicating a lower risk of readmission.

Discussion

Heart failure is a disease that poses a threat to global public health security and exhibits epidemiological variances and risk factors across different countries and regions. The characteristics of HF in developing countries differ from those in the western world, leading to differences in incidence and readmission rates²⁴. In China, both the prevalence and incidence of HF increase with age, with an average per capita hospitalization cost of $4406.8. Furthermore, 40.5% of hospitalized patients have been admitted more than three times, indicating the persistent high mortality and readmission rates among heart failure patients²⁵. Consequently, an interpretable human–machine collaboration predictive model has been developed, showing promise in predicting the risk of readmission for elderly HF patients within 1-year.

Machine Learning (ML), as a branch of artificial intelligence, encompasses a multidisciplinary field involving statistics, probability, and computer science. It plays a significant role in data preprocessing, statistical analysis, and predictive performance by learning from and making predictions and judgments based on large amounts of computer data²⁶. Common ML algorithms are primarily divided into two categories: supervised learning and unsupervised learning. This includes methods like KNN, Artificial Neural Networks (ANN), SVM, DT, and RF, among others²⁷. ML has been widely applied in the medical field, with multiple studies indicating that it can assist healthcare professionals in disease diagnosis, prognosis analysis, and risk stratification from complex medical data, thereby identifying high-risk factors in patients and establishing preventive measures^28,29,30. However, it is important to note that the purpose of ML is to assist physicians in prognostic assessment and efficacy analysis of diseases, rather than to replace medical professionals as independent predictive tools³¹. The results obtained still require evaluation and optimization by qualified physicians, rather than blindly relying on algorithmic outcomes.

Nevertheless, current research primarily focuses on exploring predictive factors for various diseases through the combination of ML algorithms and clinical data, with few studies comparing or combining features selected by human experts with those chosen by ML^32,33. To address this gap, our study aims to explore the performance of ML methods in predicting readmission of HF patients in collaboration with human experts, constructing a human–machine collaboration model to predict the likelihood of readmission within 1-year for elderly HF patients.

In this study, we conducted a comprehensive technical review of the data, utilizing the KNN algorithm, Pearson and Spearman tests to address missing data and highly collinear variables. The processed data encompassed 73 features, covering a wide range of information such as patients’ demographic information, vital signs, laboratory tests, echocardiographic data, and comorbidities. However, due to the varying scales of these features, achieving comparability among variables posed a challenge. Direct analysis of the original data might disproportionately emphasize larger values while overlooking smaller ones, significantly impacting subsequent modeling performance. Therefore, we employed Min–Max normalization to standardize the raw data³⁴. For models with an AUC value exceeding 0.8 in the human–machine collaborative construction, we further enhanced their performance utilizing “FLAML” library. This library operates on the principle of automatic machine learning, automatically selecting and adjusting machine learning algorithm hyperparameters on a given dataset to identify the best model, thereby enhancing the efficiency and accuracy of machine learning. Within the FLAML framework, the optimal model search not only takes into account model performance but also considers the time cost and computational complexity of model training, thererby optimizing computing resources^35,36. Additionally, in the comparison of ten ML algorithms, the CatBoost algorithm demonstrated exceptional performance. CatBoost is an ensemble learning algorithm based on GBDT improvements, leveraging symmetric decision trees and ranking boosting methods to mitigate model overfitting and effectively handle categorical features, thereby enhancing prediction accuracy and generalization³⁷. Our findings align with these advantages, indicating that CatBoost can effectively enhance model performance and optimize risk assessment decisions for HF patient readmissions compared to several other ML algorithms.

This study utilized three ML algorithms to select variables as machine group (n = 32), which had a wider range compared to variables selected by human experts group (n = 25), and the machine model’s performance on the validation set demonstrated improvement. However, the overall inclusion of risk factor categories remained relatively narrow. Through the human–machine collaboration model, the precision of human expert selection was combined with the depth of machine selection, thereby expanding the range and accuracy of prognostic indicators for HF from multiple perspectives. Additionally, the study not only focused on the most important AUC differentiating index but also calculated other indicators of model prediction ability, such as accuracy, sensitivity, specificity, F1-score, and Brier score, compared with the 1-year heart failure readmission risk model construct other ML, the human–machine collaboration model perform more outstanding^38,39. These results indicate that clinical practitioners should familiarize themselves with the basic concepts and metrics of ML and embrace the idea of constructing predictive models through a combination of human expert and machine selection.

In previous studies, the interpretability of ML models has been a significant obstacle. This study tackled this issue by employing SHAP for model interpretability. SHAP, introduced by Lundberg et al.⁴⁰, is an algorithm designed to explain “black box” models. It calculates Shapley Values for each feature in the predictive model, offering consistent and accurate attributions for each feature to assess the importance of all feature combinations in reflecting their contribution to the predictive ability of the entire model. The SHAP force plot elucidates individual instances in the dataset, revealing how different features influence the predicted values, while the SHAP summary plot aggregates multiple features to provide a global view. By outputting the SHAP values for each feature sample in the dataset, visualization of the importance of different variable features for the model can be achieved. In this research, the SHAP framework was applied to the XGBoost model, a complex ensemble technique, revealing that HGB levels and NT-proBNP concentrations are the most important biomarkers for predicting readmission of elderly HF patients within 1-year. Smoking history, LVEF, and NYHA classification also emerged as key factors, with their feature values impacting patient readmission risk. Additionally, certain laboratory measurements such as PAD, LVDD, DBiL, FT4, and FBG, which have been infrequently reported in previous related prediction model studies, were found to make significant contributions to the prediction in this research. From a clinical prognosis perspective, these variables may become potential factors for HF patients’ readmission, warranting further investigation through clinical trials to validate their relevance.

Limitations

This study, despite offering significant contributions, also recognizes certain constraints that merit attention. Firstly, it is constrained to a single-center retrospective study in a specific region of China, underscores the need for external validation in diverse countries and regions to enhance the model’s generalizability on a global scale. Moreover, different subtypes of HF manifest variations in pathogenesis, clinical presentation, and treatment approaches. This investigation did not segregate and assess the prognosis of different types of HF individually, suggesting an area for refinement in future studies. Taking account for the distinct phenotypes of HF patients and incorporating these variations into subsequent model enhancements could significantly improve predictive accuracy. Additionally, deeper levels of deep learning for constructing medical risk models have been reported to yield improved predictive results, representing a promising avenue for future research.

Conclusion

In this study, we conducted a comparative analysis of models constructed using both human expert and machine selection methods to predict readmissions among elderly HF patients. Our findings underscored the effectiveness of combining variables selected by both methods to create a human–machine collaboration model, which encompassed a broader range of variable features and demonstrated superior predictive performance. Furthermore, leveraging advanced SHAP technology for model interpretation and visualization enabled a precise exploration of risk factors for readmission among elderly HF patients. This approach not only enhanced the model's credibility in identifying high-risk elderly HF patients but also developed personalized treatment plans to improve HF prognosis management.

Data availability

The data that support the findings of this study are available on request from the corresponding author upon reasonable request.

References

Boorsma, E. M. et al. Congestion in heart failure: A contemporary look at physiology, diagnosis and treatment. Nat. Rev. Cardiol. 17(10), 641–655 (2020).
Article PubMed Google Scholar
Conrad, N. et al. Temporal trends and patterns in heart failure incidence: a population-based study of 4 million individuals. Lancet 391(10120), 572–580 (2018).
Article PubMed PubMed Central Google Scholar
Wei, J. Y. Age and the cardiovascular system. N. Engl. J. Med. 327(24), 1735–1739 (1992).
Article CAS PubMed Google Scholar
Liao, L., Allen, L. A. & Whellan, D. J. Economic burden of heart failure in the elderly. Pharmacoeconomics 26(6), 447–462 (2008).
Article PubMed Google Scholar
Patel, N. et al. Association between depression and readmission of heart failure: A national representative database study. Prog. Cardiovasc. Dis. 63(5), 585–590 (2020).
Article PubMed Google Scholar
Khan, M. S. et al. Trends in 30- and 90-day readmission rates for heart failure. Circ. Heart Fail. 14(4), e008335 (2021).
Article PubMed Google Scholar
Foroutan, F. et al. Global comparison of readmission rates for patients with heart failure. J. Am. Coll. Cardiol. 82(5), 430–444 (2023).
Article PubMed Google Scholar
Walraven, C. V., Jennings, A. & Forster, A. J. A meta-analysis of hospital 30-day avoidable readmission rates. J. Eval. Clin. Pract. 18(6), 1211–1218 (2012).
Article PubMed Google Scholar
Fonarow, G. C. & Ziaeian, B. Hospital readmission reduction program for heart failure: The spread of intended and unintended consequences. J. Am. Coll. Cardiol. 73(9), 1013–1015 (2019).
Article PubMed PubMed Central Google Scholar
Emmons-Bell, S., Johnson, C. & Roth, G. Prevalence, incidence and survival of heart failure: A systematic review. Heart 108(17), 1351–1360 (2022).
Article PubMed Google Scholar
Virani, S. S. et al. Heart Disease and Stroke Statistics-2021 update: A report from the American Heart Association. Circulation 143(8), e254–e743 (2021).
Article PubMed Google Scholar
Yazdan-Ashoori, P. et al. Utility of the LACE index at the bedside in predicting 30-day readmission or death in patients hospitalized with heart failure. Am. Heart J. 179, 51–58 (2016).
Article PubMed Google Scholar
Angraal, S. et al. Machine learning prediction of mortality and hospitalization in heart failure with preserved ejection fraction. JACC Heart Fail. 8(1), 12–21 (2020).
Article PubMed Google Scholar
Chen, S. Y. et al. Predicting six-month re-admission risk in heart failure patients using multiple machine learning methods: A study based on the Chinese Heart Failure Population Database. J. Clin. Med. 12(3), 870 (2023).
Article PubMed PubMed Central Google Scholar
Zihni, E. et al. Opening the black box of artificial intelligence for clinical decision support: A study predicting stroke outcome. PLoS ONE 15(4), e0231166 (2020).
Article CAS PubMed PubMed Central Google Scholar
Sharma, V. et al. Predicting 30-day readmissions in patients with heart failure using administrative data: A machine learning approach. J. Card. Fail. 28(5), 710–722 (2022).
Article PubMed Google Scholar
Okabe, T. et al. The impact of worsening renal function with elevated B-type natriuretic peptide at discharge on 1-year prognosis in heart failure patients. Sci. Rep. 10(1), 4451 (2020).
Article ADS CAS PubMed PubMed Central Google Scholar
Yang, M. et al. A novel nomogram to predict all-cause readmission or death risk in Chinese elderly patients with heart failure. ESC Heart Fail. 7(3), 1015–1024 (2020).
Article PubMed PubMed Central Google Scholar
Tong, R., Zhu, Z. & Ling, J. Comparison of linear and non-linear machine learning models for time-dependent readmission or mortality prediction among hospitalized heart failure patients. Heliyon 9(5), e16068 (2023).
Article PubMed PubMed Central Google Scholar
Rahman, M. S. et al. Heart failure emergency readmission prediction using stacking machine learning model. Diagnostics 13(11), 1948 (2023).
Article PubMed PubMed Central Google Scholar
Wang, K. et al. Interpretable prediction of 3-year all-cause mortality in patients with heart failure caused by coronary heart disease based on machine learning and SHAP. Comput. Biol. Med. 137, 104813 (2021).
Article PubMed Google Scholar
Ashfaq, A. et al. Readmission prediction using deep learning on electronic health records. J. Biomed. Inform. 97, 103256 (2019).
Article PubMed Google Scholar
DeLong, E. R., DeLong, D. M. & Clarke-Pearson, D. L. Comparing the areas under two or more correlated receiver operating characteristic curves: A nonparametric approach. Biometrics 44(3), 837–845 (1988).
Article CAS PubMed Google Scholar
Savarese, G. et al. Global burden of heart failure: A comprehensive and updated review of epidemiology. Cardiovasc. Res. 118(17), 3272–3287 (2023).
Article PubMed Google Scholar
Wang, H. et al. Prevalence and incidence of heart failure among urban patients in China: A national population-based analysis. Circ. Heart Fail. 14(10), e008406 (2021).
Article PubMed Google Scholar
Deo, R. C. Machine learning in medicine. Circulation 132(20), 1920–1930 (2015).
Article PubMed PubMed Central Google Scholar
Handelman, G. S. et al. eDoctor: Machine learning and the future of medicine. J. Intern. Med. 284(6), 603–619 (2018).
Article CAS PubMed Google Scholar
Lee, Y. W., Choi, J. W. & Shin, E. H. Machine learning model for predicting malaria using clinical information. Comput. Biol. Med. 129, 104151 (2021).
Article PubMed Google Scholar
El-Hasnony, I. M. et al. Multi-label active learning-based machine learning model for heart disease prediction. Sensors 22(3), 1184 (2022).
Article ADS PubMed PubMed Central Google Scholar
Martinez, O. et al. Machine learning for surgical time prediction. Comput. Methods Programs Biomed. 208, 106220 (2021).
Article PubMed Google Scholar
Jarrahi, M. H. Artificial intelligence and the future of work: Human–AI symbiosis in organizational decision making. Bus. Horizons 61(4), 577–586 (2018).
Article Google Scholar
Okoye, C. et al. Predicting mortality and re-hospitalization for heart failure: A machine-learning and cluster analysis on frailty and comorbidity. Aging Clin. Exp. Res. 35(12), 2919–2928 (2023).
Article PubMed PubMed Central Google Scholar
Sabouri, M. et al. Machine learning based readmission and mortality prediction in heart failure patients. Sci. Rep. 13(1), 18671 (2023).
Article ADS CAS PubMed PubMed Central Google Scholar
Wang, Z. et al. Worst-case discriminative feature learning via max–min ratio analysis. IEEE Trans. Pattern Anal. Mach. Intell. 46(1), 641–658 (2024).
Article PubMed Google Scholar
Adams, M. D. et al. Spatial modelling of particulate matter air pollution sensor measurements collected by community scientists while cycling, land use regression with spatial cross-validation, and applications of machine learning for data correction. Atmos. Environ. 230, 117479 (2020).
Article CAS Google Scholar
Zheng, Z. H. et al. Automated machine learning to evaluate the information content of tropospheric trace gas columns for fine particle estimates over India: A modeling testbed. J. Adv. Model Earth Syst. 15(3), e2022MS003099 (2023).
Article ADS Google Scholar
Prokhorenkova, L., Gusev, G., Vorobev, A. et al. CatBoost: Unbiased boosting with categorical features. In Neural Information Processing Systems (ACM, 2017).
Chu, J. B., Dong, W. & Huang, Z. X. Endpoint prediction of heart failure using electronic health records. J. Biomed. Inform. 109, 103518 (2020).
Article PubMed Google Scholar
Hu, Y. et al. Development and validation of a nomogram model for predicting the risk of readmission in patients with heart failure with reduced ejection fraction within 1 year. Cardiovasc. Ther. 2022, 4143173 (2022).
Article PubMed PubMed Central Google Scholar
Lundberg, S. M. & Su-In, L. A unified approach to interpreting model predictions. In Proceedings of the 31st International Conference on Neural Information Processing Systems 4768–4777 (Curran Associates Inc, 2017).

Download references

Funding

This study was supported by the Nanchong science and technology plan project (23JCYJPT0059); Research and development program of North Sichuan Medical College (CBY23-TD01); Sichuan Medical Science and Technology Innovation Research Society Program (YCH-KY-YCZD2024-015); Central Government Guides Local Scientific and Technological Development Projects (2022ZYD0056) and Doctor start-up foundation of North Sichuan Medical College (CBY23-QDA11).

Author information

These authors contributed equally: Hao Luo, Congyu Xiang and Lang Zeng.

Authors and Affiliations

Department of Cardiology, Affiliated Hospital of North Sichuan Medical College, No. 63, Wenhua Road, Nanchong, 637000, Sichuan Province, People’s Republic of China
Hao Luo, Lang Zeng, Shikang Li, Yanxu Liu, Cong Wen, Yangyang Cui, Linqin Du, Yang Zhou, Kun Wang, Lan Li, Zonglian Liu, Jun Pu & Rongchuan Yue
Department of Cardiology, People’s Hospital of Guang’an District, Guang’an, 638550, Sichuan Province, People’s Republic of China
Lijuan Xiong & Rongchuan Yue
School of Pharmacy, Institute of Material Medica, North Sichuan Medical College, Nanchong, 637000, Sichuan, People’s Republic of China
Xue Mei
Department of Nephrology, Affiliated Hospital of North Sichuan Medical College, Nanchong, 637000, Sichuan, People’s Republic of China
Qi Wu
Hubei Polytechnic University, Huangshi, 435003, Hubei, People’s Republic of China
Congyu Xiang

Authors

Hao Luo
View author publications
You can also search for this author in PubMed Google Scholar
Congyu Xiang
View author publications
You can also search for this author in PubMed Google Scholar
Lang Zeng
View author publications
You can also search for this author in PubMed Google Scholar
Shikang Li
View author publications
You can also search for this author in PubMed Google Scholar
Xue Mei
View author publications
You can also search for this author in PubMed Google Scholar
Lijuan Xiong
View author publications
You can also search for this author in PubMed Google Scholar
Yanxu Liu
View author publications
You can also search for this author in PubMed Google Scholar
Cong Wen
View author publications
You can also search for this author in PubMed Google Scholar
Yangyang Cui
View author publications
You can also search for this author in PubMed Google Scholar
Linqin Du
View author publications
You can also search for this author in PubMed Google Scholar
Yang Zhou
View author publications
You can also search for this author in PubMed Google Scholar
Kun Wang
View author publications
You can also search for this author in PubMed Google Scholar
Lan Li
View author publications
You can also search for this author in PubMed Google Scholar
Zonglian Liu
View author publications
You can also search for this author in PubMed Google Scholar
Qi Wu
View author publications
You can also search for this author in PubMed Google Scholar
Jun Pu
View author publications
You can also search for this author in PubMed Google Scholar
Rongchuan Yue
View author publications
You can also search for this author in PubMed Google Scholar

Contributions

L.H., X.C., Z.L.: Conceptualization, Methodology, Data Curation, Writing—Original Draft. L.S., M.X.: Data curation, Software. X.L., L.Y.: Visualization, Investigation. W.C., C.Y.: Investigation, Resources. D.L., Z.Y.: Formal analysis, Validation. W.K., L.L.: Methodology, Validation. L.Z., W.Q.: Data curation, Validation. P.J., Y.R.: Writing- Reviewing and Editing, Supervision.

Corresponding authors

Correspondence to Jun Pu or Rongchuan Yue.

Ethics declarations

Competing interests

The authors declare no competing interests.

Additional information

Publisher's note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Supplementary Information

Supplementary Information.

Rights and permissions

Open Access This article is licensed under a Creative Commons Attribution-NonCommercial-NoDerivatives 4.0 International License, which permits any non-commercial use, sharing, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons licence, and indicate if you modified the licensed material. You do not have permission under this licence to share adapted material derived from this article or parts of it. The images or other third party material in this article are included in the article’s Creative Commons licence, unless indicated otherwise in a credit line to the material. If material is not included in the article’s Creative Commons licence and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder. To view a copy of this licence, visit http://creativecommons.org/licenses/by-nc-nd/4.0/.

Reprints and permissions

About this article

Cite this article

Luo, H., Xiang, C., Zeng, L. et al. SHAP based predictive modeling for 1 year all-cause readmission risk in elderly heart failure patients: feature selection and model interpretation. Sci Rep 14, 17728 (2024). https://doi.org/10.1038/s41598-024-67844-7

Download citation

Received: 17 April 2024
Accepted: 16 July 2024
Published: 31 July 2024
DOI: https://doi.org/10.1038/s41598-024-67844-7

Keywords

Comments

By submitting a comment you agree to abide by our Terms and Community Guidelines. If you find something abusive or that does not comply with our terms or guidelines please flag it as inappropriate.

Subjects

Abstract

Similar content being viewed by others

Predicting unplanned readmission due to cardiovascular disease in hospitalized patients with cancer: a machine learning approach

A comparative study of explainable ensemble learning and logistic regression for predicting in-hospital mortality in the emergency department

Machine learning based readmission and mortality prediction in heart failure patients

Introduction

Methods

Research design

Data extraction

Data pre-processing

Feature selection and model construction

Model evaluation and result interpretation

Ethical approval

Statistical analysis

Result

Patient characteristics

Feature variable selection

Performance of expert group, machine group, and human–machine combination group models

Performance comparison of three group models

Model result interpretation

Discussion

Limitations

Conclusion

Data availability

References

Funding

Author information

Authors and Affiliations

Contributions

Corresponding authors

Ethics declarations

Competing interests

Additional information

Publisher's note

Supplementary Information

Supplementary Information.

Rights and permissions

About this article

Cite this article

Share this article

Keywords

Comments

Search

Quick links