Introduction

Patients on hemodialysis (HD) had a significantly higher mortality rate than the general population1,2,3,4. The life expectancy of patients with chronic hemodialysis (CHD) can be affected by underlying conditions such as aging, anemia, C-reactive protein, hypoalbuminemia, phosphorus, previous cardiovascular event, and dialysis adequacy2,5,6,7,8,9,10. Temporary vascular catheter could also be an independent risk factor of mortality in patients with CHD6,11. Previous studies already reported some clinical factors associated with mortality risks in patients with CHD; however, patients with end-stage kidney disease present considerable heterogeneity in the disease pattern with broad comorbidities12,13. Thus, making a survival outcome prediction model via limited clinical indicators remains challenging.

Various approaches have been attempted to modify the mortality prediction models for patients with CHD. The systematic review of Panupong Hansrivijit et al. demonstrated the precision of factors in predicting mortality in patients with chronic kidney disease, including those undergoing HD and peritoneal dialysis14. Moreover, Chava L. Ramspek et al. conducted a systemic review and further meta-analysis of independent external validation studies to determine the most ideal predictive performance study13. Mikko Haapio et al. performed two developed prognostic models of newly entered mortality prediction for patients with chronic dialysis via logistic regression (LGR) with stepwise variable selection and showed some variables to establish a practical, fine-performing model; however, they may overestimate the mortality risk because of considerably the lower mortality rate observed in the newer cohort10.

Recently, artificial intelligence (AI) has become increasingly popular in the field of patient survival/mortality analysis. Patients with CHD can provide robust serum laboratory data during HD treatments, considering that physicians commonly use them to monitor patients' condition. Machine learning (ML), a branch of AI that imitates human intelligence by incorporating and analyzing available data, is widely utilized in this context15,16,17. It has a unique potential for predicting survival outcomes and identifying mortality risk factors in patients with CHD. Powerful prediction models for patients with CHD have been developed using ML methods such as LGR, random forest (RF), and eXtreme gradient boosting (XGB). The research field encompasses diverse areas, including dialysis adequacy predictions18, survival prognostic prediction19,20,21, and time-dependent adverse event prediction22.

Identifying mortality risk factors in patients with CHD may facilitate early intervention and improve outcomes. Given that various ML techniques are still undergoing development and competition23, relying on a single approach may not consistently outperform others in all conditions. Therefore, the performance and accuracy of these techniques should be evaluated comprehensively. Hence, this study aimed to investigate the importance of risk factors identified using multiple ML methods. We also sought to establish a two-stage ML algorithm-based prediction scheme by comparing the accuracy and consistency of different ML methods to determine the most suitable model and identify common risk factors among patients with CHD who experienced mortality events in different years.

Methods

Study design and population

This retrospective observational cohort included 805 patients who received HD at Shin-Kong Wu Ho-Su Memorial Hospital between December 2006 and December 2012. The primary objective of creating this cohort was to assess the impact of reducing intra-dialysis phosphorus on mortality in patients with CHD was evaluated24. This cohort excluded the following criteria: (1) a history of hospitalization for acute events, including cardiovascular, cerebrovascular, and infectious diseases; (2) newly active diseases within 3 months before the data collection; and (3) missing information. After completing the essential data preprocessing steps before applying machine learning (ML) methods in the present study, a total of 800 patients were selected, as they had complete and relevant data required for further analysis. These patients were deemed suitable for inclusion in the study, ensuring a comprehensive dataset for subsequent ML modeling and analysis.

This study conformed to the principles of the Declaration of Helsinki, with approval by the Ethics Committee of the Shin-Kong Wu Ho-Su Memorial Hospital (protocol No.: 20220112R). Given that our study was based on medical records and data review, informed consent was relinquished by the Ethics Committee of the Shin-Kong Wu Ho-Su Memorial Hospital. Furthermore, patient information was anonymized and de-identified before the analysis.

Data collection

This study included 44 variables, such as demographic, biochemical laboratory data, and underlying comorbidities with disease and drugs (e.g., diabetic mellitus, hypertension, cardiovascular disease (CVD), chronic obstructive pulmonary disease, renin–angiotensin–aldosterone system blocker, antiplatelet drug, statin, and beta-blocker). We incorporated all these factors into our analysis because of their critical relevance and strong association to clinical outcomes in CHD patients. Additionally, these parameters were easily obtainable in clinical practice.

In this study, CVD was characterized as a composite of various conditions significantly affecting the mortality of CHD patients. These encompass coronary artery diseases, heart failure, hypertensive heart disease, arrhythmias, valvular heart disease, peripheral artery disease, and thromboembolic disease. Within our clinical practice, routine follow-ups for CHD patients encompassed serum biochemical laboratory assessments, including dialysis quality, electrolyte levels, hemogram, nutritional status, iron profile, lipid profile, and parathyroid function.

The biochemical laboratory data in our study included urea kinetics (Kt/V), urea reduction ratio (URR), blood urea nitrogen (BUN, mg/dL) from pre- and post-HD, creatinine (Cr, mg/dL) from pre- and post-HD, sodium (Na, mEq/L) from pre- and post-HD, potassium (K, mEq/L) from pre- and post-HD, ionized calcium (iCa, mEq/L), phosphate from pre- and post-HD (P, mg/dL), intact parathyroid hormone (iPTH, pg/mL), alkaline phosphatase (IU/L), aspartate aminotransferase (AST, U/L), alanine transaminase (ALT, U/L), total bilirubin (mg/dL), aluminum (ng/mL), uric acid (mg/dL), albumin (g/dL), triglyceride (mg/dL), total cholesterol (mg/dL), high-density lipoprotein (mg/dL), low-density lipoprotein (LDL, mg/dL), hemoglobin (g/dL), hematocrit (%), mean corpuscular volume (MCV, fL), ferritin (μg/L), iron (μg/dL), total iron binding capacity (TIBC) (μg/dL), transferrin saturation (TSAT) (%), ante cibum (AC) blood glucose (mg/dL), post cibum (PC) blood glucose (mg/dL), total protein (g/dL), and cardiothoracic ratio (CTR) (%). Following an 8-h fast for routine biochemical testing, blood samples were collected from patients both before and after their dialysis session. Without using a tourniquet, samples were collected from tunneled catheters, arteriovenous fistulas, or grafts; the initial sample was discarded from heparin-primed catheters.

Statistical analyses

Continuous data were reported as mean ± standard deviation and categorical data were expressed as numbers (%) of patients. To compare the means of continuous variables, analysis of variance (ANOVA) was employed. Additionally, the chi-square test (χ2 test) was utilized to compare categorical variables between different groups.

Figure 1 presents the algorithm of data-driven ML methods in CHD patients. We divided the prediction subgroups into 1- and 3-year mortality. Each subgroup utilized ML algorithms to construct prediction models and evaluate the important factors. To establish the fundamentals of model building and validation, we followed a multistage process. First, we divided the patients into validation and testing datasets. The validation dataset comprised 80% of patients with CHD, while the remaining 20% constituted the testing dataset. Next, we employed a tenfold method on the validation dataset. This method involved dividing the dataset into 10 equal parts or folds, ensuring data randomization. Once the folds were established, we proceeded to build five different ML models: LGR, decision tree (DT), RF, gradient boosting (GB), and XGB. Through this approach, the performance of each model can be thoroughly validated and evaluated according to the different folds from the validation dataset. All five ML models were evaluated their respective indicators, including accuracy, sensitivity, specificity, and area under the curve (AUC).

Figure 1
figure 1

The algorithm of data-driven machine-learning methods in chronic hemodialysis patients.

After the evaluation of these ML models, we compared the results with those of the LGR model and the best modified model from other four. To achieve this, we selected the most significant risk variables from the LGR model alone, while the four remaining models (DT, RF, GB, and XGB) ranked such variables by averaging their rankings. We used these top variables from LGR to modify the rebuilt logistic regression (rebuilt LGR) model and modify the best model among the four remaining models by incorporating the variables with the highest average rank into the most suitable model. Finally, we compared the results with the best remodified model and the rebuilt LGR model through validation and compared the results with the previous training dataset. A two-tailed p value of < 0.05 was considered statistically significant. All statistical data were analyzed using R for MacOS version 4.2.1.

Results

Study population characteristics

We included 800 patients receiving HD, with 718 of them surviving at the 1-year mark and 519 surviving at the 3-year mark. Table 1 summarizes the participants’ basic characteristics, comorbidities, and laboratory data. The overall mean age of the study population was 63.30 ± 13.26 years. In the subgroup with a one-year observation, the mean age of the survival group was 62.73 ± 13.20 years, whereas the mortality group had a mean age of 73.55 ± 11.94 years. For the with a 3-year observation, the survival group had a mean age of 60.92 ± 13.04 years, while the mortality group had a mean age of 71.24 ± 11.43 years.

Table 1 The baseline characteristics of study population.

Comparison of survival prediction performance among different ML methods

Table 2 shows the comparison of survival prediction performance among different ML models, including LGR, DT, RF, GB, and XGB, in terms of accuracy, sensitivity, specificity, and AUC. For the 1-year mortality prediction model, both RF and GB showed high accuracy and AUC in both validation and testing datasets. Specifically, RF had 0.948 accuracy in validation datasets and 0.941 in testing datasets, while GB had 0.949 and 0.941, respectively. Regarding AUC, RF obtained 0.734 in validation datasets and 0.806 in testing datasets, while GB had 0.737 and 0.793, respectively. For the 3-year mortality prediction model, RF had the highest accuracy in both validation and testing datasets (0.794 and 0.804, respectively), and its AUC values were 0.751 and 0.763, respectively, indicating solid performance. Overall, RF outperformed other models in terms of accuracy and AUC for all four study cutoff periods in both validation and testing datasets.

Table 2 Comparison of survival prediction performance among different machine learning models.

Ranking of CHD mortality risk variables using ML methods

Figure 2 presents the average ranking of variables from four ML models (DT, RF, GB, and XGB) in 1- and 3-year mortality predication. Given that RF had the highest accuracy and AUC results in both the validation and testing datasets, we selected it as the optimal modified model for our study and subsequently put the top average ranking variables. Figure 3 illustrates the RF accuracy trends by the accumulated number of variables for the 1- and 3-year mortality prediction models. The 1- and 3-year accuracy trends of mortality prediction model for RF indicated that only the top 14 and 12 important variables were required to achieve the maximum accuracy, respectively.

Figure 2
figure 2

The average ranking of variables from four machine-learning models (decision tree, random forest, gradient boosting, and eXtreme gradient boosting ) in (A) 1-year and (B) 3-year mortality prediction. COPD chronic obstructive pulmonary disease, RAAS renin–angiotensin–aldosterone system, Kt/V urea kinetic, URR urea reduction ratio, BUN blood urea nitrogen, AC Ante Cibum (before meals), PC Post Cibum (after meals), Na sodium, K Potassium, iCa ionized calcium, iPTH intact parathyroid hormone, Alk-p alkaline phosphatase, AST aspartate aminotransferase, ALT alanine transaminase, Bil-T total bilirubin, HDL high-density lipoprotein, LDL low-density lipoprotein, Hb hemoglobin, Ht hematocrit, MCV mean corpuscular volume, TIBC total iron binding capacity, TSAT transferrin saturation, Al Aluminum (ng/mL), CTR cardiothoracic ratio.

Figure 3
figure 3

The random forest accuracy trend by the accumulated number of variables for (A) 1-year mortality and (B) 3-year mortality prediction models.

Table 3 summarizes the rankings of variables for CHD mortality risk via all of the five ML methods. The top half of Table 3 displays the average ranking of variables from DT, RF, GB, and XGB before reaching the maximum accuracy for the 1- and 3-year models. For the 1-year model, the top 14 variables included age, post-HD creatinine, AST, total bilirubin, post-HD BUN, pre-HD creatinine, Kt/V, CTR, LDL, albumin, iPTH, alkaline phosphatase, aluminum, and TIBC. For the 3-year model, the top 12 variables included age, AST, CTR, pre-HD creatinine, alkaline phosphatase, AC blood glucose, iPTH, iCa, post-HD creatinine, ferritin, ALT, and hematocrit. At the bottom half of Table 3, we list the significant variables identified by the LGR model before being used in the modified model. The significant variables in the 1-year model were age, CTR, ferritin, and ALT, whereas those in the 3-year model were age, CTR, iPTH, sex, alkaline phosphatase, pre-HD creatinine, post-HD creatinine, and phosphorus.

Table 3 The rankings of variables for CHD mortality risk via different machine learning models.

Survival prediction performance between the RF with stepwise remodeling and the rebuilt LGR

We used the highest average rank variables from DT, RF, GB, and XGB for the RF with stepwise remodeling method. The results were then compared with those of the rebuilt LGR model. Table 4 compares the survival prediction performance between RF stepwise modeling and rebuilt LGR in terms of accuracy, recall, specificity, and AUC. In the 1-year prediction model, the accuracy of RF stepwise modeling was 0.940, whereas that of the rebuilt LGR was 0.937. The specificity was 1.000 in the RF and 0.996 in the LGR, with AUCs of 0.727 and 0.576, respectively. In the 3-year prediction model, the accuracy of RF stepwise modeling was 0.801, whereas that of the rebuilt LGR was 0.767. Regarding specificity, RF and LGR obtained 0.989 and 0.940, with AUCs of 0.805 and 0.806, respectively. Overall, the stepwise RF model demonstrated superior predictive performance compared to the traditional LGR method in predicting the mortality of CHD patients.

Table 4 The comparison of survival predicts from RF stepwise modeling and rebuild LGR.

Advantage of DT algorithm

DT provided a useful and understandable algorithm. Figure 4A shows the algorithm of DT in the 1-year prediction model. The first cutoff criterion was age below 60 years, which accounted for 41% of the validation dataset. The DT model judged the “age below 60 years” as survival (present as “0”), while the actual survival rate was 92%. The rest of the 59% then moved into the second cutoff criterion, that is, pre-HD creatinine level being greater or equal to 5.5 mg/dL; only 3% of the patients failed to meet the criterion. The DT model judged “age greater or equal to 60 years” and “pre-HD creatinine level less than 5.5 mg/dL” as a mortality result (present as “1”), but the actual survival rate was only 20% in the validation dataset. Further cutoff criteria included AST, TIBC, ALT, iron, MCV, pre-HD BUN, TSAT, triglyceride, AC glucose, and total cholesterol level sequentially. Finally, 13 groups had been categorized and moved into the terminal (leaf) of the DT. Figure 4B shows the algorithm of DT in the 3-year prediction model. The first three cutoff criteria were the same (age below 60 years, pre-HD creatinine level greater or equal to 5.5 mg/dL, and AST below 45 U/L). Further cutoff criteria were TSAT, TIBC, ALT, age below 78 years, MCV, aluminum, alkaline phosphatase, and hemoglobin. Likewise, 13 groups had been categorized and moved into the terminal (leaf) of the DT.

Figure 4
figure 4

The algorithm of decision tree (DT) in the (A) 1-year prediction model (the 7th fold) and (B) 3-year prediction model (the 1st fold). BUN blood urea nitrogen, AC ante Cibum (before meals), AST aspartate aminotransferase, ALT alanine transaminase, MCV mean corpuscular volume, TIBC total iron binding capacity, TSAT transferrin Saturation.

Discussion

Our study adopted five ML methods to manage the best prediction model from different years of mortality in patients with CHD, then successfully developed a two-stage ML algorithm-based prediction scheme to achieve the stepwise RF model, which incorporates the most important factors of CHD mortality risk based on the average rank from DT, RF, GB, and XGB. The stepwise RF model in our study demonstrated superior predictive performance compared to the traditional LGR method for mortality in CHD patients over both 1-year and 3-year model. Finally, we can integrate our proposed ML scheme into the electronic reporting system to enhance patient care (Fig. 5).

Figure 5
figure 5

The application of the data-driven machine-learning methods in clinical practice from our study.

All five ML models individually provided solid and consistent performance in predicting the morality of patients with CHD, which were divided into the validation and testing datasets. Regarding the validation and testing datasets from 1- and 3-year mortality prediction model, the RF had better accuracy and area-under-curve results among the five different ML methods. Moreover, the RF only required the top 14 important variables from 1-year accuracy trend and only top 12 variables from the 3-year accuracy trend to reach the maximum accuracy. Interestingly, both the RF stepwise modeling and rebuilt LGR methods needed considerably fewer variables to provide a similar performance from all 44 variables of the 1-year and 3-year models while demonstrating high efficiency in analyzing mortality risk prediction in patients with CHD. Notably, although the DT method did not exhibit the highest accuracy and AUC, it showed a good potential because it provided an algorithm that is very comprehensible and offered some selectable and adjustable variables according to the current clinical trend or physician’s clinical preference.

The top important variables of CHD identified by the five different ML methods provide consistent results. The average rank from all ML methods concluded that age, creatinine (pre- and post-HD), and CTR were the most frequent top variables in all four cutoff periods. Albumin, aluminum, alkaline phosphatase, AST, and AC blood glucose were also top indicators for mortality prediction. Some of these important variables identified by ML methods agree with previous studies. For example, multiple prospective cohort studies reported that CTR25,26,27, elevated serum alkaline phosphatase28,29,30,31, and lower serum albumin levels32,33,34 are associated with higher mortality risk in patients with CHD. Higher serum aluminum levels also represent as a mortality risk factor35,36,37 and even has some potential associations with CTR, although the cause of the relationship or mechanism is still vague38.

Serum creatinine levels before and after HD were highlighted as the top-ranked variables from the ML models and also frequently measured in clinical practice. Walther et al. also concluded that pre-dialysis and interdialytic change of serum creatinine is highly related to mortality in patients undergoing HD39. However, muscle lean mass40,41,42,43, infection status44,45, severe illness46,47 and poor nutrition status39,45 that affects the metabolism and catabolism may influence the serum creatinine level. Some studies investigated the modified creatinine index (mCI) for better mortality prediction40,43,44,48. However, the baseline characteristic of serum creatinine level in patients with CHD could be more varied in different study periods, making it very difficult to define in clinical practice. Moreover, the mCI incorporated age, sex, and Kt/V (urea kinetics) into the formula, resulting in multifactorial interference and potentially misleading the prediction bias.

URR, one of the most common indicators of HD dose delivery, is generally associated with decreased mortality49,50,51,52. Unexpectedly, the URR in our study did not match the top variable factors, probably because the URR in the survival group (1-year subgroup: 0.73 ± 0.06; 3-year subgroup: 0.73 ± 0.06) and mortality group (0.76 ± 0.07 and 0.74 ± 0.06 respectively) were both in optimized dose (≥ 0.65) without statistically significance. Moreover, URR is higher in patients with sarcopenia or malnutrition, leading to higher mortality risk in those with CHD. McClellan et al. concluded that a URR of 0.70–0.74 has an increasing trend of mortality risk compared with 0.65–0.6953. Considering that the URR is also affected by urea distribution volume changes and urea generation during HD, the 2015 National Kidney Foundation’s Kidney Disease Outcomes Quality Initiative (KDOQI) recommends the targeted single pool Kt/V (spKt/V) for the dialysis adequacy instead of URR52.

Table 5 presents a range of data-driven ML methods employed for identifying risk factors and modifying mortality prediction in patients with CHD. Victoria Garcia-Montemayor et al. concluded that RF is adequate for mortality prediction in patients with CHD, superior to LGR20. According to Kaixiang Sheng et al., XGB can effectively identify high-risk patients within 1 year after HD initiation19. In the study of Covadonga Díez-Sanmartín et al., combining XGBoost method with the corresponding Kaplan–Meier curve presentation to evaluate the risk profile of patients undergoing dialysis demonstrated very high accuracy, specificity, and AUC results16. Oguz Akbilgic et al. also used RF to identify the risk factors of mortality within 1 year after dialysis introduction and inferred that RF also had a strong prediction performance compared with other ML methods, such as artificial neural networks, support vector machines, and k-nearest neighbors algorithm21. Furthermore, Cheng-Hong Yang et al. revealed that whale optimization algorithm with full-adjusted-Cox proportional hazards (WOA-CoxPH) could evaluate risks better than RF and typical Cox proportional hazards (CoxPH) in patients with CHD54. However, these previous studies had a relatively short prediction period, mostly within 2 years. Our study emphasizes the characteristics of patients with CHD and has effectively achieved a robust prediction performance for up to 3 years using the stepwise RF model, modified from two-stage ML algorithm-based prediction scheme. Figure 5 demonstrate our approach of the AI system by incorporating the top numerical risk variables selected by various ML methods, our study model can effectively assist physicians, caregivers, and patients in predicting short-term post-dialysis mortality outcomes. This model allows for enhanced patient-centered decision-making and increased awareness about laboratory data that could be risk factors, especially for patients with a high short-term mortality risk. It also enables individuals to achieve a better quality of life earlier and helps avoid unnecessary healthcare expenditures.

Table 5 Lecture review of the mortality prediction model in chronic hemodialysis patients.

This study has some limitations. First, the top variables modified by ML only provide the relationship of mortality in patients with CHD but not infer the positive or negative associations from these variables. Moreover, not all the top variables agree with previous study results, especially serum creatinine level, which can be affected by various clinical conditions that may mislead the data-driven ML results. In the future implementation of ML-identified variables to an unknown disease, these variables should be clinically investigated further. Second, we initially extended the analysis cutoff period up to 7 years but then only selected within 3 years to avoid data-censoring risk of bias by a high mortality rate after dialysis initiation. A long-term prediction model may need larger data and even longer study period for better qualification and quantification. Third, our dataset only contained a composite parameter for CVD. However, prognoses may vary among different CVD categories in CHD patients. Additional research may be required to examine the effects of subclassifications of CVD on mortality. Fourth, the individuals in our study were chosen from a pool that did not encompass recently hospitalized patients dealing with cardiovascular or infectious concerns. This subset was anticipated to exhibit a greater likelihood of survival compared to those who had recently been hospitalized, a factor that could potentially skew our study results. Therefore, our model may only be applicable to the CHD patients who are relatively stable. Finally, the developed prediction models by ML methods in our study are limited to a single medical center. Hence, the model modified from our study population may not be applicable to other similar at-risk research groups. Future studies on ML such as federated learning that incorporates multiple medical centers and research groups could be helpful for improving predictive performance and strengthening clinical decisions.

Conclusion

The adoption of the stepwise RF model, modified from two-stage ML algorithm-based prediction scheme, enhances patient-centered decision-making, and improves outcomes, particularly for patients with a high short-term mortality risk in both 1-year and 3-year periods. The findings of this study can offer valuable information to nephrologists, increasing awareness about risky laboratory data. However, for longer prediction periods, future studies should consider incorporating larger study populations and diverse groups to further enhance predictive performance.