Introduction

Predicting the fate of someone with acute myeloid leukaemia (AML) at diagnosis is challenging [1, 2]. We recently reviewed several of these complexities in achieving accurate and precise estimates of outcomes in LEUKAEMIA [3]. Initial prediction efforts focused on clinical and laboratory co-variates such as WBC, percentage or numbers of myeloblasts and histology [4]. Cytogenetics data were soon added [5]. Most recently, data from studies of mutation topography, typically detected by targeted or next-generation sequencing (NGS), were added often displacing prior predictive co-variates. For example, the 2017 European Leukemia Net (ELN) model includes only data on cytogenetics and mutation topography. Predictive models using the expression pattern of genes related to leukaemia cell stemness are also reported [6]. Also new is the use of data from measurable residual disease (MRD)-testing but these data are not applicable to predicting outcomes at diagnosis [7]. The most recent predictive models divide persons with AML into more than 15 cohorts with statistically different prognoses [8,9,10]. Is this a clinically manageable number of predictive cohorts and are there convincing data these classifications are improving outcomes of persons with AML? Data so far show only a modest impact, if any [11]. For example, data from the US Surveillance and End Results (SEER) dataset indicate only a 10% 5-year survival improvement since 1999 (https://seer.cancer.gov/statfacts/html/amyl.html).

Most prediction models have concordance statistics (C-statistics) of 0.65–0.80 indicating only fair accuracy [3]. Can we do better? Are we too focused solely on leukaemia cell biology whilst ignoring other potentially important mechanisms influencing the complex interaction between the leukaemia and the host such as the bone marrow micro-environment and host immune response. Also, are there important aspects of the leukaemia cell biology we are likewise ignoring such as metabolism? Put otherwise, are there latent co-variates, co-variates that might improve prediction accuracy?

Recent studies suggest data regarding the bone marrow micro-environment, immune system and leukaemia cell metabolism might improve prediction accuracy. These data are reported to independently predict outcomes such as complete remission rate, cumulative incidence of relapse, event-, relapse- and leukaemia-free survivals (EFS, RFS and LFS) and/or survival in multi-variable analyses. Moreover, these co-variates are reported to improve the accuracy of more widely-used models such as the 2017 ELN model. We discuss these models below.

Micro-environment and immune-risk models

Considerable data indicate cells in the bone marrow microenvironment, including immune, endothelial and stromal cells, the composition of the extracellular matrix and soluble factors such as cytokines, hepatocyte growth factor, vascular endothelial growth factor and angiopoietins are important in leukaemia development and progression [12]. The impact of the immune system in AML is increasingly studied. For example, there are several reports of correlations between blood and bone marrow natural killer (NK)-cells and survival [13, 14]. Specific T-cell phenotypes are also reportedly associated with leukaemia prognosis. For example, some data indicate pre- or post-therapy blood concentrations of PD-1+CD8+ T-cells and pre-therapy blood CD28-CD57+CD8+ T-cells correlate with EFS and survival [13]. Another study reported a high proportion of blood eomesodermin (Eomes +) T-betlow CD8+ T-cells correlates with fewer complete remissions (CR) and worse survival [15].

Zhang et al. reported frequencies of CD4+CD25+CD127lo regulatory T-cells (Tregs) in blood and bone marrow were associated to poor prognosis [16]. Han et al. reported increased inducible T-cell co-stimulator ligand positive Treg frequency in bone marrow was an unfavourable prognostic marker [17]. Kong et al. reported T-cell immunoglobulin and immune receptor tyrosine-based inhibitory motif domain (TIGIT) expression on blood CD8+ T-cells is increased in persons with AML and correlates with induction chemotherapy failure and with posttransplant relapse [18]. Several studies report high expression of PD-1, PD-L1 or PD-L2 was associated with poor survival [19, 20]. Increased co-expression of PD-1/CTLA-4 or PD-L2/CTLA-4 correlated with poor survival. Co-expression of PD-1/PD-L1, PD-1/PD-L1/PD-L2, or PD-1/LAG-3 correlated with poor survival in subjects with FLT3, RUNX1 and/or TET2 mutations [20]. Stamm et al. reported high PVR and PVRL2 expression, as novel immune checkpoints, correlated with poor outcomes [21].

There are several predictive models of AML using data of immune cells identified by multi-parameter flow cytometry and/or NGS with bio-informatics. We reported an immune risk score derived from public datasets from Gene Expression Omnibus where we estimated proportions of immune cells in bone marrow samples using CIBERSORTx [22]. Data of six types of immune cells were used to develop prediction models for EFS and survival in persons receiving intensive induction chemotherapy. Prediction value of the model was validated in several datasets. Concentrations of activated NK-cells had the strongest predictive weight. The C-statistics of the immune risk score was 0.68 (0.63–0.73). However, A model adding the immune risk score to the ELN risk category (C-statistics 0.78 (0.73, 0.82) and age (C-statistics 0.66 [0.62–0.70]) had a revised C-statistics of 0.83 [0.79, 0.87]. The upper boundary of the 95% confidence interval is a marked improvement. Figure 1 displays the progressive improvement in prediction accuracy by combining different predictive co-variates and scores. Similar data are reported by others. For example, Bruck et al. identified several immune cell types with phenotypes in bone marrow correlated with prognosis including M1-polarised macrophages, FOXP3+ helper T-cells, Tregs and CTLA4−LAG3− helper T-cells [23]. Dong et al. constructed a survival prediction model for persons with cytogenetically normal AML based on expression of 9 immune-related genes with C-statistics of 0.79 [24]. Zhu et al. reported a prediction model composed of 6 immune-related genes with an C-statistics 0.72 [25]. Cytokine profiles and interactions have also been used to predict outcomes of AML therapy. For example, one study reported correlations between serum concentrations of FLT3-ligand and interleukin-6 with LFS and survivalL [26]. Also, tumour necrosis factor-α, serum soluble interleukin-2 receptor-α (sIL2RA) and IL-10 concentration are reported to independently predict survival in AML [27,28,29]. In conclusion, immune-based prognostic and prediction models often complement and/or improve current AML prediction models.

Fig. 1: Comparison of C-statistics of the merged risk score and other single risk categories.
figure 1

A C-statistics were compared for prognostic co-variates and risk scores alone and combined. A higher C-statistic indicates better prediction accuracy. B Areas under the curve (AUC) of a receiver-operator characteristic (ROC) curve were compared for prognostic co-variates and risk scores alone or combined. The dashed line indicates no prediction accuracy. An increasing AUC indicates increasing prediction accuracy.

Metabolic-risk models

Another approach to improving prediction accuracy in AML involves leukaemia cell metabolism. Mutations in genes with metabolically active gene products such as isocitrate dehydrogenase isoform-1 (IDH1) and IDH2 are associated with changes in cell metabolism and possibly leukaemia initiation [30, 31] Chen et al. identified 47 metabolites significantly altered in serum samples from 400 subjects with AML compared with controls by gas chromatography time-of-flight mass spectrometry-based metabolomics [32]. They identified six serum glucose metabolites, lactate, 2-oxoglutarate, pyruvate, 2-hydroygluterate, glycerol-3-phosphate and citrate, whose concentrations correlated with EFS and survival in subjects receiving induction chemotherapy and validated in another cohort. Zhou et al. reported increased plasma concentrations of lysine and taurine predict outcomes of persons with AML-M2 [33].

Serum metabolomic profiling has also been used to identify metabolites associated with outcomes of children with AML receiving chemotherapy [34]. Higher levels of pantothenic acid were associated with response to cytarabine and with worse RFS. We and others reported prognostic models based on metabolism-related gene expression data from public datasets [35, 36]. Both models were validated with C-statistics of 0.88 and 0.78. Wang et al. [36] combined their metabolic model with cytogenetics and age co-variates improving the C-statistics for survival prediction from 0.69 and 0.65 to 0.78, a significant improvement.

Other predictive co-variates

Two recent studies in older persons with AML using predominately conventional subject-related co-variates reported C-statistics of 0.72–0.74 similar to the C-statistic of the 2017 ELN risk classification [37, 38]. Predictive value of epigenetic regulatory genes such as DNMT3A and global methylation state have also been evaluated [39,40,41]. For example, a genome-wide methylation score is reported to predict outcomes of persons with AML with a higher methylation-score associated with a lower rate of complete remission [42]. Some alternative splicing events are also reported predict AML outcomes with C-statistics of 0.96 but without external validation [43]. Adding data of splicing signature improved prediction accuracy to the 2017 ELN risk classification with C-statistics of about 0.75 and a leukaemia stemness score combine with splicing signature improved prediction accuracy to the 2017 ELN risk classification with C-statistics of about 0.72 [44]. To the extent infection correlates with risk of death during intensive induction chemotherapy studies of the gastro-intestinal microbiome can also improve predicting EFS and survival [45].

Therapy

Accurate prediction can improve therapy decisions in persons with AML [3]. Increasingly, physicians are aware of the importance of prediction accuracy in choosing competing therapies such as intensive induction therapy with cytarabine and daunorubicin versus less intensive therapy with azacytidine and venetoclax. Some recent studies report benefits of metabolic interventions such as enasideinib and ivosidenib but these are unconfirmed in randomised controlled trials [46].

Discussion

Accurate prediction is fundamental to optimising AML therapy (Fig. 2). Data we discuss indicate micro-environment, immune and metabolism-related co-variates and others such as epigenetics and splicing gene profiles are independent outcome predictors in AML in multi-variable regression analyses. Adding these data to current prediction models increases accuracy including newer models incorporating mutation topography (Table 1).

Fig. 2: Immune, metabolic, cytogenetic and molecular co-variates correlated with therapy response and survival in persons with acute myeloid leukaemia receiving intensive induction chemotherapy.
figure 2

Figure show potential interactions.

Table 1 Predictive co-variates.

None of current or newer prediction models we cite included data of MRD-testing at the end of therapy as an outcome predictor in model building. Whether current or new pre-therapy predictive models are better than results of post-therapy MRD-testing in predicting post-remission therapy outcomes is uncertain. Elsewhere we discuss the advantages and limitations of post-therapy MRD-testing as a predictor of subsequent outcomes in persons with AML [47]. Presently, MRD-testing data correlate strongly with outcomes but has high false-positive and -negative rates. Whether this limitation can be overcome is uncertain. Moreover, utility is limited to post-therapy setting rather than being useful at diagnosis.

Another issue is why and when do we want to determine prognosis or predict outcomes. Is it to identify the best initial therapy, say intensive therapy with cytarabine and daunorubicin, less intensive chemotherapy say with azacitidine with venetoclax, or a targeted therapy, say with enasideinib or ivosidenib? This decision is driven not only by the co-variates we identify but also by co-variates less often considered such as co-morbidities, access to medical care, expertise of the treating team, sophistication of supportive care, patient preference and economics. Or is our goal to predict what intervention should be given next. Obviously, this will be largely driven by outcome of the initial therapy and co-variates we cite above. For example, if the goal of an older person with substantial co-morbidities is to achieve the longest interval of high quality-of-life, achieving a complete remission may not be the appropriate therapy objective. In contrast, a young, otherwise healthy person may be willing to accept substantial adverse events for a chance, however small, of cure. In this instance a co-variate such as post-therapy MRD-testing may be the best predictive biomarker. What is important is that physicians and patients acknowledge our inaccuracy and imprecision in predicting outcomes of an intervention. What level of accuracy and precision is acceptable to drive a therapy decision is obviously subjective with no correct answer?

The new prediction models we discuss need optimisation and external validation in large datasets of uniformly-treated persons. To know if they are prognostic rather than predictive, they need to be tested in studies of diverse therapies. If validated they could be introduced into clinical practice and help with therapy decision-making.

In summary we show adding micro-environment-, immune- and metabolism-related and other co-variates improves prediction accuracy in newly-diagnosed persons with AML, predominately young people receiving intensive induction chemotherapy. Whether these co-variates are similarly useful in other therapy settings in unknown such as less intensive or targeted and immune. Because initial intensive therapy of AML is relatively uniform these co-variates are presently best regarded as predictive rather than prognostic. In appropriate therapy settings using prediction models which include micro-environment-, immune- and metabolism-related co-variates may be clinically useful. However, we need validation and optimisation in large prospective dataset of AML receiving diverse therapies such as cytarabine and daunorubicin versus azacitidine and venetoclax or enasideinib. When these are accomplished these new models be clinically-useful to predict outcomes and choose therapy(ies).