Introduction

Liver transplantation has become the standard treatment for irreversible acute liver failure and end-stage liver diseases. Advances in surgical technique and post-operative care have markedly decreased early mortality rate after liver transplantation1, 2. Nevertheless, given the relatively high risk of surgery and limited availability of organs, predicting the short-term outcome of liver transplant recipients using various scores and models continues to be important3, 4.

The Model for End-stage Liver Disease (MELD) score, developed in the late 1990s5, has been incorporated into the organ allocation system since 20026. The correlation between preoperative MELD scores and early mortality has been studied with mixed results3, 4, 7. Acute Physiology and Chronic Health Evaluation (APACHE) scores and Simplified Acute Physiology Score (SAPS) models are widely used for severity of illness assessment and outcome predictions in critically ill patients8, 9. Studies comparing MELD scores with APACHE II in liver transplant patients10 and APACHE II with SAPS 3 scores in solid organ transplant patients11 have shown inconclusive results.

Liver transplantation was not incorporated into scores or models until the recently updated APACHE IV and SAPS 3 models12, 13. However, there has been no comparison of APACHE IV and SAPS 3 with other outcome prediction models in patients undergoing liver transplantation. Therefore, we compared the performance of prognostic models in predicting early mortality in liver transplant patients: APACHE IV-LT specific predicted mortality, SAPS 3, APACHE II, MELD-Na, MELD, and Child-Turcotte-Pugh (CTP) scores. Factors associated with in-hospital mortality after liver transplantation were also evaluated.

Results

Characteristics of the study population

Between October 2010 and September 2014, 633 patients who had undergone living donor or deceased donor liver transplantation were admitted to the surgical ICU. After excluding 42 pediatric patients and one re-transplantation patient, 590 patients were included for analysis. MELD scores were < 15 in 309 (52.4%) patients, 15 to 24 in 161 (27.3%) patients, and ≥25 in 120 (20.7%) patients. Seventeen of the 590 patients (2.9%) died in the hospital after liver transplantation. Causes of in-hospital mortality included sepsis (8 patients), postoperative massive bleeding (3 patients), primary allograft nonfunction (2 patients), acute respiratory distress syndrome due to pneumonia (2 patients), massive pulmonary thromboembolism (1 patient), and brain herniation (1 patient). Including the 8 patients who were discharged but died within 1 year after liver transplantation, the overall 1 year mortality was 4.2% (25/590).

Comparison among models in predicting in-hospital mortality

APACHE IV-LT specific predicted mortality showed excellent discrimination in predicting in-hospital mortality with an AUC of 0.91 (95% CI [0.86–0.96]) (Table 1). After adjusting for multiple comparison using the Holm method, APACHE IV-LT specific predicted mortality showed larger AUCs compared to SAPS 3, MELD-Na, and CTP (Table 1). Discrimination was very good or good for all other models in predicting in-hospital mortality except for CTP score (Fig. 1A). All 6 prognostic models showed good calibration and adequately described the in-hospital mortality pattern (Table 1, Supplementary Fig. 1). The APACHE IV score also showed very good discrimination in predicting in-hospital mortality with an AUC of 0.83 (95% CI [0.72–0.94]).

Table 1 Performance of APACHE IV, SAPS 3, APACHE II, MELD-Na, MELD, and CTP models on prediction of in-hospital mortality.
Figure 1
figure 1

Comparison of the ROC curves of APACHE IV-liver transplantation specific predicted mortality, APACHE IV, SAPS 3, APACHE II MELD-Na, MELD, and CTP scores in predicting in-hospital (A) and 1 year mortality (B). (A) The AUCs are 0.91, 0.83, 0.78, 0.81, 0.74, 0.76, and 0.68 in APACHE IV-liver transplantation specific predicted mortality, APACHE IV, SAPS 3, APACHE II, MELD-Na, MELD, and CTP models, respectively. (B) The AUCs are 0.83, 0.78, 0.71, 0.73, 0.67, 0.69, and 0.64 in APACHE IV-liver transplantation specific predicted mortality, APACHE IV, SAPS 3, APACHE II, MELD-Na, MELD, and CTP models, respectively. ROC, receiver operating characteristic; APACHE, acute physiology and chronic health evaluation; SAPS, Simplified Acute Physiology Score; CTP, Child-Turcotte-Pugh score.

APACHE IV-LT specific predicted mortality showed excellent or very good discrimination in all MELD score groups with better performance compared to SAPS 3 or APACHE II in patients with MELD scores between 15 and 24, and CTP in patients with MELD scores less than 15 (Table 2). In deceased donor liver transplantation, APACHE IV-LT specific predicted mortality showed very good discrimination and better performance compared to SAPS 3 or MELD-Na scores (Table 2).

Table 2 Comparison of APACHE IV, SAPS 3, APACHE II, MELD-Na, and CTP scores according to MELD score in predicting in-hospital mortality.

Factors associated with in-hospital mortality

Due to collinearity, donor status and MELD scores were chosen over operation type (Pearson’s correlation coefficient 0.950) and MELD-Na scores (Pearson’s correlation coefficient 0.797) for univariable and multivariable analyses, respectively.

Compared to in-hospital survivors, non-survivors had higher MELD, MELD-Na, and CTP scores before transplantation and higher APACHE IV, SAPS 3, and APACHE II scores (Table 3). Non-survivors were more likely to require vasopressor support at ICU admission, require more postoperative transfusion, develop AKI, and require preoperative and postoperative renal replacement therapy (Table 3).

Table 3 Patient characteristics of in-hospital survivors and non-survivors.

After adjusting for relevant factors with p < 0.2 in univariable analyses, APACHE IV-LT specific predicted mortality, preoperative corrected sodium level, preoperative RRT, postoperative RRT, and ICU readmission were identified as independent factors associated with in-hospital mortality (Table 4). After adjusting for variables with p < 0.1 in the univariable analyses, APACHE IV-LT specific predicted mortality (OR 1.06, 95% CI [1.03–1.10], p < 0.001), preoperative corrected sodium level (OR 1.12, 95% CI [1.00–1.26], p = 0.05), postoperative RRT (OR 16.75, 95% CI [4.37–64.16], p < 0.001), and ICU readmission (OR 8.33 [1.83–38.05], p = 0.01) were identified as independent factors associated with in-hospital mortality (Supplementary Table 1). In living donor liver transplantation, ICU readmission (OR 54.83, 95% CI [2.79–1076.08], p = 0.008) and inotropic support on admission to ICU (OR 28.76, 95% CI [1.14–725.46], p = 0.041) were independent risk factors of in-hospital mortality (Supplementary Table 2), whereas preoperative corrected sodium levels (OR 1.17. 95% CI [1.01–1.35], p = 0.036) and preoperative RRT (OR 17.72, 95%CI [1.51–208.36], p = 0.022) were independent risk factors in deceased donor liver transplantation (Supplementary Table 3).

Table 4 Factors associated with in-hospital mortality after liver transplantation.

Comparison among models in predicting 3-month mortality

APACHE IV-LT specific predicted mortality showed very good discrimination in predicting 3-month mortality with an AUC of 0.87 (95% CI [0.79–0.95]) (Supplementary Table 4). After adjusting for multiple comparison using the Holm method, APACHE IV-LT specific predicted mortality showed larger AUCs compared to CTP (p = 0.02, Supplementary Table 4). Discrimination was very good or good for all other models in predicting 3-month mortality except for CTP score (Supplementary Table 4). All 6 prognostic models showed good calibration and adequately described the 3-month mortality pattern (Supplementary Table 4).

Comparison among models in predicting 1 year mortality

In predicting 1 year mortality, APACHE IV-LT specific predicted mortality showed an AUC of 0.83 (95% CI [0.76–0.90]), indicating very good discrimination (Fig. 1B) and all 6 models showed good calibration (Table 5, Supplementary Fig. 2). After adjusting for multiple comparison using the Holm method, the AUC of APACHE IV-LT specific predicted mortality was larger compared to MELD-Na (p = 0.035) and CTP (p  = 0.030) (Table 5).

Table 5 Performance of APACHE IV, SAPS 3, APACHE II, MELD-Na, MELD, and CTP models on prediction of 1 year mortality.

APACHE IV-LT specific predicted mortality showed very good or good discrimination in all MELD score groups but did not show any significant difference compared to other models (Supplementary Table 5) In deceased donor liver transplantation, APACHE IV-LT specific predicted mortality showed good discrimination and better performance compared to SAPS 3 (p < 0.001), APACHE II scores (p < 0.001), and MELD-Na scores (p = 0.002) (Supplementary Table 5).

Compared to 1 year survivors, non-survivors showed higher MELD, MELD-Na, and CTP scores before transplantation and higher APACHE IV, SAPS 3, and APACHE II scores (Supplementary Table 6). Non-survivors were more likely to require vasopressors at ICU admission, receive more intraoperative and postoperative transfusion, develop AKI, require longer duration of mechanical ventilation, and require preoperative and postoperative renal replacement therapy (Supplementary Table 6).

Comparison of mortality by subgroups

Between groups of patients with MELD scores < 15 and MELD scores ≥ 25, there was a 5.2% to 8.6% difference in survival rate for up to 18 months after transplantation (Supplementary Table 7, Supplementary Fig. 3). Living donor liver transplant patients had higher survival rates compared to deceased donor liver transplant patients (1.5% vs 6.2%, p = 0.005). Lower APACHE IV scores correlated with higher survival rates (Supplementary Table 8, Supplementary Fig. 4).

Discussion

The main findings of this study are that the APACHE IV-LT specific predicted mortality 1) showed very good to excellent discrimination and calibration in predicting in-hospital and 1 year mortality after liver transplantation, 2) showed better discrimination in in-hospital and 1 year mortality compared to other scores, and 3) was the only model that showed good to excellent discrimination in in-hospital and 1 year mortality in all MELD groups and in both living and deceased donor liver transplantation.

The APACHE II score8, introduced in 1985, is an old version of the APACHE system but still widely used because of its simplicity and capability of classifying severity of illness and predicting hospital mortality14. The APACHE II score did not have liver transplantation in the diagnostic category and was shown to overestimate in-hospital mortality in postoperative liver transplantation patients unless orthotopic liver transplantation specific diagnostic weight was applied15. The liver transplant-specific coefficients using original APACHE II score was reported to be a good predictor of hospital and 1 year mortality after liver transplantation16 and in our study, the performance of liver transplant-specific coefficient of APACHE II score was similar to the performance of the APACHE II score. The APACHE IV score was developed in 2006 and has been widely implemented to general ICUs and specific patient groups12, 17, 18. A major advantage of the APACHE IV model is its accommodation of 116 detailed admitting diagnostic options, including postoperative liver transplantation, which promotes outcome analysis in specific subgroups12. A recent study of 195 orthotopic liver transplant patients showed that APACHE IV score (AUC 0.94) demonstrated better performance compared to MELD score (AUC 0.69) in predicting in-hospital mortality after deceased donor liver transplantation19. Despite the discrepancy between our study population and that from which liver transplant-specific diagnostic weighted equation of APACHE IV for mortality prediction was derived (70% living donor liver transplantation vs. 158 orthotopic liver transplantation only)12, our results were similar and showed that APACHE IV outperformed other scores.

Since the development of MELD scores in 2000 to predict 3-month mortality after transjugular intrahepatic portosystemic shunt (TIPS)5, MELD scores have been used to prioritize liver allocation and predict mortality of liver cirrhosis patients awaiting liver transplantation20. However, MELD scores that incorporated sodium (MELD-Na) were shown to better predict mortality among candidates for liver transplantation compared with the MELD score21. Consequently, serum sodium was recently added to the MELD score by the OPTN. The original MELD score and the MELD-Na were included in our study for comparison with other scoring systems. Similar to SAPS 3, the discrimination and calibration of MELD and MELD-Na scores were good in predicting in-hospital and 3-month mortality. However, the APACHE IV-LT specific predicted mortality showed better discrimination in predicting 1 year mortality compared to MELD-Na scores. A previous study has also shown similar results20 and may be attributed to the original purpose of the scores and that only values prior to liver transplantation are incorporated.

The SAPS 3 model was developed in 200522 and has shown good discrimination in ICU patients18, 23. The SAPS 3 model also has subgroups of admission categories including the anatomical site of surgery. Transplantation- specific diagnostic weighted equation was derived from 172 transplant patients, 90 of which were liver transplantations13. In 152 orthotopic liver transplant patients, SAPS 3 was similar to APACHE II in predicting in-hospital mortality after liver transplantation with moderate discrimination11. Similarly, the performance of SAPS 3 in our study in predicting in-hospital and 1 year mortality was comparable to other models, except for APACHE IV-LT specific predicted mortality.

The CTP score has been used as a classic tool to grade the severity of liver disease24. Previous studies comparing CTP with MELD and APACHE II scores suggest that the CTP score is less accurate in predicting early and late post-transplant mortality10, 25. The lack of extrahepatic parameters and physiologic variables and the basis on which the CTP score was developed may account for its poor discriminative performance in predicting in-hospital mortality and 1 year mortality after liver transplantation, as shown again in our study.

When comparing different scoring systems, differences in incorporated variables, study population or patient mix, time between development of the model and patient enrollment, mortality rates, and sample size between the study population and the original cohort used in the development of the scoring system should be considered17, 26. More specifically, APACHE and SAPS scores are calculated after ICU admission and incorporate comorbidities, postoperative vital signs, and laboratory values with an aim to predict in-hospital mortality, whereas MELD and CTP scores only account for select preoperative values, mostly related to hepatic function, with an aim to assess the severity of liver dysfunction. Consequently, the performance of APACHE IV, SAPS 3, and APACHE II tend to be better compared to MELD-Na, MELD and CTP scores in predicting in-hospital mortality and 1 year mortality. In addition, unique perioperative aspects of hepatic dysfunction and liver transplantation such as hypotension, lactic acidosis, and coagulopathy followed by subsequent rapid recovery after transplantation may be reflected in APACHE scores and SAPS27, 28. Liver transplant patients are unique in that the wide variety of abnormalities quickly recover after transplantation, which may explain the inaccuracy of APACHE II when the diagnostic category weight of ‘postoperative gastrointestinal surgery’ is used15.

APACHE IV-LT specific predicted mortality showed excellent or very good discrimination in all MELD score groups and outperformed other models in predicting in-hospital mortality. APACHE IV was also the only scoring system that showed good or better discrimination in living donor and deceased donor liver transplantation. The APACHE IV- post liver transplant specific weighted equation that contains detailed postoperative vital signs and laboratory values may explain the superior performance in all aspects compared to other scores.

In accordance with our study, ICU readmission has been known to be highly correlated with in-hospital mortality not only in general ICU population but also in liver transplant patients29, 30. In our study, 41.2% of non-survivors were readmitted to the ICU after initial ICU discharge within the same hospital stay whereas only 4.7% of survivors were readmitted. Frequent causes of ICU readmission include postoperative bleeding, respiratory complications, and sepsis. Renal dysfunction is common in patients awaiting liver transplantation and after liver transplantation and has significant impact on perioperative and long-term morbidity and mortality31. In patients awaiting liver transplantation, the predicted 3-month mortality rate in patients on dialysis is up to 10 times higher compared to patients who do not require dialysis5, 21.

There are a few limitations to our study. Our study was conducted in a single center with a high proportion of living donor liver transplantation and hepatitis B patients. Similar to our study results, deceased donor liver transplantation have been associated with worse outcome compared to living donor liver transplantation32. The superior performance of the APACHE IV score in our study is most prominent in deceased donor liver transplant patients who have more severe preoperative conditions. Therefore, our results should be interpreted and applied taking into account that the majority of our study population were living donor liver transplant patients with less severe preoperative conditions. Second, the in-hospital and 1 year mortality rate was less than 5%. The small proportion of non-survivors limits the assessment of predictive model performance. However, considering that most patients are monitored in the ICU after liver transplantation, validation of the APACHE IV score with 590 patients helps confirm the utility of the APACHE IV score in liver transplant patients. Third, identified risk factors of in-hospital and 1 year mortality such as preoperative and postoperative renal replacement therapy and ICU readmission showed relatively wide confidence intervals, which may be due to the small number of non-survivors. Therefore, application of these risk factors into different circumstances and patient mix should be done with caution.

In conclusion, the APACHE IV score showed good discrimination and calibration in predicting in-hospital and 1 year mortality after liver transplantation and in all MELD groups and in both living and deceased donor liver transplantation.

Methods

This study was approved by the institutional review board of the Seoul National University Hospital (1506–096–681). Informed consent was waived by the IRB due to the retrospective design of the study.

Patient population

Patients who had undergone living or deceased donor liver transplantation from October 2010 to September 2014 at Seoul National University Hospital were included in this study. Pediatric patients (<18 years of age) and re-transplantation patients were excluded.

Data collection

Data were obtained from the electronic medical record database to calculate APACHE IV-LT specific predicted mortality, SAPS 3, APACHE II, MELD-Na, MELD, and CTP scores. Coexisting diseases, body mass index, preoperative Na, recipient operation time, donor status, operation type, numbers of intra- and postoperative RBC transfusion units, postoperative acute kidney injury and renal replacement therapy, reoperation, biliary complications, surgical site infections, ICU and hospital length of stay, in-hospital and 1 year mortality were recorded.

Score calculation

APACHE IV-LT specific predicted mortality and APACHE II scores were calculated using the worst lab values obtained within 24 hours of ICU admission and SAPS 3 were calculated using the worst lab values within 1 hour of ICU admission. MELD and MELD-Na scores were calculated using the most recent pre-transplantation labs obtained in the 48 hours prior to liver transplantation. MELD-Na score incorporated by the Organ Procurement and Transplantation Network (OPTN) as of January 2016 (https://optn.transplant.hrsa.gov/news/meld-serum-sodium-policy-changes). CTP score was calculated using the most recent laboratory values and physical findings before transplantation24.

Discrimination and calibration of prognostic models

Discrimination refers to the ability to rank patients correctly according to their risk of death and was assessed using the area under the receiver operating characteristic (ROC) curves (AUC)33. It was classified as excellent, very good, good, moderate, and poor when AUCs were 0.9 to 0.99, 0.8 to 0.89, 0.7 to 0.79, 0.6 to 0.69, or <0.6, respectively33. If a statistical significance was observed in the AUC curve, Youden index (max [sensitivity + specificity − 1]) was used to determine the optimal cut-off point for each score34. To further assess discrimination of each prognostic model, patients were stratified into 3 groups according to their MELD score: <15, 15 to 24, and ≥25, which largely correlate with former United Network for Organ Sharing (UNOS) statuses 3, 2B, and sick 2B and 2 A, respectively7 and by the type of donor (living vs deceased).

Calibration was defined as the ability of a model to describe the mortality pattern in the data. The Hosmer–Lemeshow goodness-of-fit test was used to evaluate the agreement between observed and expected number of survivors and non-survivors across all strata with equal number of patients (C statistics) or with 10 groups divided by expected mortality intervals (H statistics), with a non-significant p-value (>0.05) indicating good calibration35.

Clinical outcomes

In-hospital, 3-month, and 1 year mortality were recorded. Postoperative acute kidney injury (AKI) was classified into risk, injury, and failure according to the risk, injury, failure, loss of kidney function, and end stage kidney disease criteria. Preoperative renal replacement therapy (RRT) was defined as RRT initiated before liver transplantation and continued thereafter. Postoperative RRT was defined as RRT that was applied only after liver transplantation.

Statistical analysis

Data were reported as the mean [standard deviation] and percentages for qualitative variables. All variables were tested for normal distribution with the Shapiro-Wilk test. Student’s t-test was used for normally distributed continuous variables. Variables with non-normal distribution and sample size less than 30 were analysed by the Mann-Whitney U test. Chi-square test or Fisher’s exact test (if cell size ≤ 5) was used for categorical variables. P values < 0.05 were considered statistically significant.

The Delong method36 was used to measure and compare AUCs to assess discrimination for in-hospital and 1 year mortality. The Holm–Bonferroni correction for multiple comparisons was applied to control the family-wise error rate and minimize type I and type II errors37. Calibration was assessed using the Hosmer-Lemeshow goodness-of-fit C and H statistics, with a P value greater than 0.05 indicating good calibration35. The standardized mortality ratio was calculated by dividing the observed mortality rate by the predicted mortality rate.

To identify risk factors of in-hospital mortality after liver transplantation, univariable logistic regression was performed after determining differences between survivors and non-survivors using the t-test and chi-square test (two-tailed). Risk factors with p values < 0.2 and p values < 0.1 in the univariable analysis were entered into multivariable logistic regression with forward selection. Collinearity between variables was tested before modeling, and if present (Pearson’s correlation coefficient > 0.7), only one variable was entered into the statistical analysis. Patient survival was also analyzed according to the MELD score groups (<15, 15–24, and ≥25) and donor type (living donor/ deceased donor). Statistical analysis was performed with SAS (SAS system for Windows, version 9.3; SAS institute, Cary, NC) and R (version 3.2.1) statistical software.