For many years, studies of new drugs for acute myeloid leukemia (AML) and other cancers have used a three-step approach moving from phase-1 to −2 to −3. Phase-1 trials typically estimate the new drug’s maximum tolerated dose (MTD) or a dose maximally impacting its presumed target (optimal biologic dose, OBD). Phase-2 trials evaluate efficacy, often defined as response. A conclusion the response rate justifies further study frequently leads to a phase-3 trial, randomly assigning subjects to receive the new drug or conventional therapy to determine which is better. The focus in phase-3 is generally on one primary outcome such as event-free survival (EFS) or survival. In each type of trial, the endpoint not considered primary is termed secondary, viz, response in phase-1 or toxicity in phase-2 and often not formally evaluated, especially if the trial fails to meet the primary endpoint.

Much of the current treatment of AML has evolved from this approach, resulting in regulatory approval of seven new drugs for AML in 2017–2018 [1,2,3], although the improvements afforded are modest, particularly in absolute rather than relative terms, and their applicability to all adults with the disease is uncertain [4]. In any event, the standard approach to trials ignores the complexity of AML and, very likely, what researchers and research subjects want to know. Here we discuss: (1) the focus on one primary endpoint; (2) disregard in phase-1 and −2 studies for the heterogeneity of AML; (3) use of generic false-positive and -negative rates; and (4) use of study-designs insufficiently adaptive in phase-3.

Physicians are often interested in multiple outcomes, for example, not only safety but also response, not only response but also survival, not only survival but also quality-of-life and so forth. We doubt many research subjects envision that the sole purpose of a phase-1 trial is identifying the MTD or OBD for future studies. Rather, most participate for a tangible personal benefit such as improved survival. But, because phase-1 trials often evaluate efficacy as a secondary objective, discordance arises with the investigator (seemingly) primarily interested in safety and the subject interested in safety but also efficacy. As typically only 6–20 subjects are treated at the MTD in phase-1, relatively little is known about toxicity after one phase-1 trial; nonetheless, phase-2 trials usually only informally monitor toxicity. Likewise, in phase-3 trials, arguments can be made for the primacy of survival or EFS as criteria for regulatory approval a new therapy [5]. However, only one of these endpoints is usually considered primary. Not only can the distinction between primary and secondary endpoints be arbitrary but much less attention is paid to the latter. In AML, for example, survival rather than complete remission is typically the primary endpoint of phase-3 trials. Although there are discordances between complete remission and survival [6, 7], many clinicians would argue that there is value in achieving a complete remission, for example, the possibility of fewer transfusions, less time in hospital, or increased psychological well-being, even if survival is not improved. Because complete remission rate is typically a secondary endpoint, current study-designs provide little encouragement to explore these possibilities. In all this, there is a loss of information inconsistent with subjects’ expectations after giving informed consent.

It is obvious that different subjects are a priori at different risks of toxicity, often motivating exclusion of persons with ECOG performance scores of 3–4, for example, from phase-1 trials. However, Rogatko et al. [8] reported that subject-specific variables interact with dose in determining toxicity among subjects who are routinely eligible for phase-1 trials . Nonetheless, if two of the first three subjects at a dose-level in a phase-1 study have an adverse event in a trial conducted using the conventional 3 + 3 design that dose-level is declared unsafe and never re-visited, regardless of whether the subjects were 30 or 70 years old, had an ECOG performance score of 0 or 2 or had a bilirubin of 0.6 or 1.4 mg/dL, values typically consistent with trial-entry. Single-arm phase-2 trials are inherently comparative: the worse the estimated efficacy compared with a perceived standard treatment, the less the motivation to start a phase-3 study. Many heterogeneous biologic covariates are associated with efficacy outcomes in AML [9]. Nonetheless, phase-2 trials typically assume the only effect being measured is the drug being tested rather than subject- and disease-related variables [10], measurement error and chance.

Phase-3 trials in AML routinely stipulate false-positive and -negative rates of, respectively, 5% and 10–20%, a metric common to trials in many diseases. However, these error rates seem more acceptable in diseases with effective therapies, for example, in hypertension or diabetes but less so in a disease without effective therapy such as poor-prognosis AML. Here, the consequences of a false-negative result are more, and of a false-positive result less, substantial. Phase-3 trials in AML might allow false-positive rates similar to the 20%, often stipulated in randomized phase-2 trials. Certainly, the time needed to (eventually) discover the false-positive is time that might otherwise be spent studying other new therapies. Nonetheless, we believe use of generic false- positive and -negative rates is difficult to defend in AML.

Given our limited ability to accurately predict outcomes of subjects receiving a new therapy had they received an older one [11, 12], there is little doubt of the need for randomized trials. Many new therapies turn out no better or even worse than older ones. However, intuition suggests many people properly informed (i.e., beyond the brief, standardized alternative therapy section of the usual informed consent document) of the likely unsatisfactory outcome with conventional therapies would reason “how much worse than the conventional therapy can the new therapy be” and decline randomization. Furthermore, very few AML trials use current outcomes data to influence randomization probabilities. This practice challenges subjects’ expectation their physicians are constantly learning to improve their care.

Modern study-designs can address some of these problems. Examples are designs that (a) simultaneously monitor multiple outcomes and make adaptive decisions based on more than one outcome [13, 14], (b) account for covariates in phase-1 and −2 trials [15, 16], or (c) allow repeated outcome-adaptive randomization [17]. Although available for years and using either frequentist or Bayesian frameworks [18, 19], these study-designs are rarely used; only 2% of 1235 phase-1 trials conducted between 1991 and 2006 used an innovative statistical design [20]. We doubt the situation is substantively different today.

New study-designs often require more subjects, time, and resources than current phase-1, −2, and −3 designs. Outcome-adaptive randomization typically requires ~ 15% more subjects, comparable to the increase in sample size with 2:1 rather than 1:1, randomization [21]. The resultant longer trials might delay approval of an effective therapy. Hence, balancing between subjects’ preference and public health benefit is necessary. In contrast, larger sample sizes in phase-1 and −2 trials might result in fewer expensive, time-consuming, negative phase-3 trials [22].

There are many reasons why phase-3 trials in AML often fail to confirm the promise of earlier trials [4, 23]. We discuss several reasons here. We argue the disruption and perceived inconvenience of using more-modern adaptive trial designs may be justified. Our hypothesis can only be tested if these newer study-designs find wider use.