Main

Glancing at the current therapeutic approach to metastatic renal cell carcinoma (RCC), it immediately appears that we are dealing with one of the most dynamic and revolutionary events occurred in the past decade. The introduction in an 8-year lifespan of seven new and effective agents, promptly approved by the American and European regulatory authorities on account of their undeniable efficacy, has markedly reversed the therapeutic scenario of patients with advanced or metastatic disease, allowing a shift from a situation of orphan disease to a new one paradoxically characterised by a crowding of almost embarrassing opportunities (Vogelzang, 2006). Indeed, unlikely from previous therapies based on immune response modulation with cytokines, the most recent advances in molecular biology have allowed identification of a panel of new agents targeting vascular endothelial growth factor (VEGF), such as bevacizumab plus interferon (IFN), sorafenib, sunitinib, pazopanib, and axitinib, or the mammalian target of rapamycin (mTOR), such as temsirolimus and everolimus. These cell pathways are able to inhibit both angiogenesis and cell proliferation, resulting in more consistent disease control rates and prolongation of progression-free survival (PFS) (Escudier et al, 2013). Interestingly, some of the new most promising treatment options under investigation in mRCC – vaccines and anti-programmed cell death 1 agents (checkpoint inhibitors) – are immunotherapeutic approaches, depicting a ‘back to the past’ treatment scenario (IMA901; Brahmer et al, 2012).

Owing to registration and commercial obligations, the initial pivotal studies with bevacizumab, sorafenib, sunitinib and pazopanib were carried out using either placebo or IFN as a comparative reference arm (Motzer et al, 2007; Escudier et al, 2007b; Sternberg et al, 2010). Once the effectiveness of these agents was established and their extended use in everyday clinical practice was completed, direct comparison with an active control arm other than placebo or IFN (the earlier ‘active’ arm) was strongly recommended by both physicians and regulatory authorities for better characterisation of the succeeding agents (Kane et al, 2008). In this regard, some investigators identified sorafenib as a suitable new active and already approved agent to be used as a ‘comparator’ (Eisen, 2012). As preclinical data showed a low affinity of this molecule for VEGF receptors and because of the available clinical data on PFS in a randomised phase 2 trial vs IFN, it has been assumed that this agent could be considered the ‘standard minimal reference arm,’ somehow ideal for comparison with the experimental arms of new agents (Eisen, 2012). Nevertheless, nonhomogeneous and conflicting results not in line with the expectations of the original hypotheses called for deeper analyses and reappraisal of comparative studies to be carried out, particularly focusing on the role of control arms in relation to baseline patient characteristics (Eisen, 2012).

The purpose of the present investigation is to re-examine, and discuss data and clinical results in the comparative studies carried out in the treatment of RCC that included the two main represented control arms sorafenib and IFN. We perform this analysis to better understand the unexpected results sometimes observed, to evaluate the role of baseline patient population modifications over the years and to consider other possible baseline conditions that could positively or negatively influence the performance of evaluated agents.

Materials and methods

The present qualitative review of studies included data collected up to 31 March, 2013, from PubMed and the main databases of oncology and urology congresses in Europe (European Society for Medical Oncology, European Cancer Organization, European Association of Urology) and the United States (American Society of Clinical Oncology [ASCO], American Urological Association). The selection took into account only randomised phase 2 and 3 studies conducted in RCC with the following targeted agents: bevacizumab plus IFN, sorafenib, sunitinib, temsirolimus, pazopanib, everolimus, axitinib and tivozanib. Only studies having IFN or sorafenib as a control arm vs an experimental arm have been included, an exception has been done for studies directly comparing IFN and sorafenib (drugs evaluated alone or in combination); these studies have been included limiting the analysis only to non-combination arm. To follow the performance of the comparator arms over time, historical data for each comparator arm were identified: these were defined as the first prospective phase 3 study or the largest meta-analysis published on PubMed able to provide PFS data for IFN or sorafenib.

Studies were divided into two groups according to first-line or second-line treatments. To describe the time trend of the comparator arms, the following parameters, if available, were analysed at baseline: accrual year range, number of enrolled patients, percentage of patients with nephrectomy, risk score according to Memorial Sloan-Kettering Cancer Center (MSKCC) criteria and sites of metastases of potential prognostic relevance (the lung, liver, bone). The three parameters for evaluating disease control included PFS, overall survival (OS) and overall response rate (ORR).

Results

After the first analysis performed on PubMed and on the main European and American congress database, 24 studies were identified as randomised studies. Of these, 14 fulfilled the selection criteria assigned in the ‘Methods’ section and were recognised as phase 3 in nine cases and phase 2 in five cases. A further nonrandomised study was included as a control arm for IFN. All the studies considered are reported in Table 1.

Table 1 Legend for study indentification

Interferon as the control arm

Five studies have been selected with IFN as a control arm: sunitinib phase 3 (Motzer et al, 2007), temsirolimus phase 3 (Hudes et al, 2007), AVOREN (Escudier et al, 2010; Escudier et al, 2013), CALGB-90206 (Rini et al, 2009), and the Phase 2 sorafenib trial conducted in 2009 (Escudier et al, 2009). The IFN historical control was drawn from a large meta-analysis that included IFN studies gathered between 1993 and 2001 (Motzer et al, 2002). Table 2 shows the results observed in the group of patients treated with IFN in six selected studies.

Table 2 Data pertaining to IFN as control arm in six comparative studies included in the analysis

Nephrectomy

The percentage of patients undergoing nephrectomy increased over the years, moving from 55% in the period 1993–2001 reported by Motzer et al (2002), to 80–90% in the subsequent years. The analysis did not include the AVOREN and temsirolimus phase 3 trials because, in the first, nephrectomy was a criteria for inclusion and, in the second, only poor-risk patients who were probably not appropriate candidates for surgery were enrolled.

Motzer score

The incidence of poor-prognosis patients according to MSKCC criteria enrolled in the selected studies decreased, from 20% of patients in the historical control to 0% in the phase 2 sorafenib trial (Escudier et al, 2009), whereas the trend of intermediate-prognosis patients remained stable (59% in the sunitinib phase 3, 64% CALGB-90206 and 56% observed in the AVOREN study) or decreased slightly (Escudier et al, 2009). In contrast, the trend of good-prognosis patients increased from 18% in the historical control to 51% in the sorafenib phase 2 trial by Escudier et al (2009). The analysis excluded temsirolimus phase 3 trial, as the status of ‘poor-risk patients’ was a mandatory inclusion criteria, although the definition of poor risk was modified during the conduct of the study and was slightly different from the standard MSKCC definition (Hudes et al, 2007).

Metastatic sites

A trend of slight increase was observed for patients with lung metastases (67–81%) as well as for those with bone metastases (from 23% observed in historical controls to 37% in the phase 2 sorafenib study (Escudier et al, 2009)). In contrast, the percentage of patients with liver metastases at baseline appeared stable during the years.

Disease control

The trend of disease control observed in the course of the years is stable in terms of ORR, slightly increased for PFS (4.7–5.6 months), as shown in Figure 1, and strongly increased for OS (from 13 months to 33.5 months). However, with the exception of the phase 2 sorafenib study (2009) (Escudier et al, 2009), in all the remaining studies PFS of the experimental arm was statistically significantly superior compared with IFN. Temsirolimus phase 3 study, which included primarily poor-risk patients, was excluded from this analysis because of the consequent reduction in benefit for all the parameters analysed.

Figure 1
figure 1

Trend of PFS vs year of accrual observed in the five studies including IFN as first-line treatment option.

Sorafenib as the control arm

Analysis of the databases led to the selection of seven studies designed with sorafenib as the control arm (Table 3). The group of studies concerning first-line treatment included five trials: Sorafenib+IFN vs sorafenib (conducted by Jonasch et al (2010)), ROSORC (Procopio et al, 2011), AMG 386 (Rini et al, 2012), TIVO-1 (Motzer et al, 2012) and AGILE 1051 (Hutson et al, 2013a). As TIVO-1 was undertaken in both the first and second line, this study was included in the group of first-line studies because 70% of patients were enrolled in this setting. The historical control for the first-line group was identified in the study by Escudier et al (2009).

Table 3 Data pertaining to Sorafenib as control arm in nine comparative studies included in the analysis

The analysis of the second-line setting included two studies: AXIS (Rini et al, 2011) and INTORSECT (Hutson et al, 2013a) (Table 2). The historical control for the second line was the TARGET study (Escudier et al, 2007a). The three studies included populations who relapsed after cytokines (TARGET), after sunitinib (INTORSECT) and after approved VEGF inhibitors, mTOR inhibitors or cytokines (AXIS).

Nephrectomy

In these studies, the percentage of patients undergoing nephrectomy appeared stable for both the first- and second-line groups. The lowest percentage of patients with nephrectomy was observed in the ROSORC (74%) (Procopio et al, 2011) and INTORSECT (87%) trials (Hutson et al, 2013a).

Motzer score

In the first-line group, a high percentage (34–54%) of good-prognosis and a low percentage (0–6%) of poor-prognosis patients was reported, the highest percentages of patients with intermediate prognosis were observed in TIVO-1 (62%) (Hudes et al, 2007). In the group of second-line studies, a chronological trend in the reduction of good-prognosis and an increase in intermediate- and poor-prognosis patients was documented.

Metastatic sites

A comparative analysis was possible only as far as first-line studies are analysed. Escudier et al (2009) showed the highest percentage of patients with lung metastases (87%), whereas the lowest rate was seen in ROSORC (15%) (Procopio et al, 2011). Except for the study by Jonasch et al (2010) and the ROSORC (13 and 5% of the cases, respectively), the percentage of patients with liver metastases did not change. The highest percentage of patients with bone metastases was observed in the phase 2 sorafenib study by Escudier et al (2009) (32%), whereas the lowest was observed in the Jonasch et al (2010) and ROSORC studies (8 and 5%, respectively). However, it should be noted that in the ROSORC trial 52% of patients were registered as cases with ‘multiple metastatic sites’.

Disease control

PFS

A general positive trend in PFS over time was observed in all the studies (Figure 2a): data rise from 5.7 months of the phase 2 study (Escudier et al, 2009) to 9.1 months of TIVO-1 (Rini et al, 2012b). The AGILE trial represents an exception to this trend because the median PFS falls to 6.5 months (Motzer et al, 2012). The phase 2 study (Rini et al, 2012a) was associated with the lowest PFS benefit, while the study TIVO-1 (Rini et al, 2012b) with the highest.

Figure 2
figure 2

Trend of PFS vs year of accrual observed in the studies including sorafenib as first (A) or second (B) line treatment option.

Figure 2b shows clinical benefit in terms of PFS observed in second-line studies using sorafenib as the experimental arm. A negative trend in PFS over time is reported because data decline from 5.5 months of the TARGET study (Escudier et al, 2007a) to 4.7 months of the AXIS trial (Rini et al, 2011) and finally to the 3.9 months of the last trial INTORSECT (Hutson et al, 2013a).

Overall Survival

Similar to what was observed for the IFN group, the benefit in terms of OS observed for sorafenib showed a rising trend. In the TARGET study (Escudier et al, 2007a) the OS was 17.8 months, in AXIS (Rini et al, 2011) the OS rose to 19.3 months and in INTORSECT (Hutson et al, 2013a) the OS decreased again to 16.6 months.

Overall Response Rate

For the group of first-line studies, lower benefits in ORR were observed in Escudier et al (2009) and AGILE 1051; data concerning ORR benefit were constant in the remaining studies. With regard to second-line studies, the restricted benefit in terms of ORR observed in historical controls was similar to INTORSECT (Hutson et al, 2013a) and AXIS (Rini et al, 2011).

Discussion

The analysis of data from the selected studies reported in Tables 2 and 3 provides information on how the features of renal cancer have evolved over the years and why some unexpected results have been observed in clinical trials.

How have the disease features evolved?

The positive trend in PFS in both the IFN and sorafenib control arms observed for the first-line treatments suggests that the features of RCC patients at baseline have consistently improved over time. This finding is probably related to the improvement of baseline conditions of the patients, resulting from an increased indication for palliative nephrectomy, advances in surgical techniques and current possibilities for making the initial diagnosis much earlier than in the past (Kane et al, 2008) All of these factors resulted in improvement of the patient risk score, according to Patil (2012), however, we believe that a very important role could be also played by the improved radiological methodologies for scanning slides and the public awareness of RCC disease. The performance of IFN between 1996 and 2001 in a population with 55% nephrectomies and 18% good-prognosis patients accounts for a benefit in terms of time to tumour progression of 4.7 months and life expectancy of 13 months (Motzer et al, 2002). Conversely, in the last study with IFN as the control arm (Escudier et al, 2009) 83% of patients underwent nephrectomy, 51% were good-prognosis patients and a PFS of 5.6 months was observed, confirming what Mickish and Flanigan suggested in their prospective studies about the role of palliative nephrectomy in terms of OS (Flanigan et al, 2001; Mickisch et al, 2001; Eisen, 2012). In addition, the possibility of an early diagnosis also plays an important role, allowing patients to undergo surgery in good general conditions and, consequently, improving their prognosis.

Why do unexpected results appear?

In developmental studies of molecularly targeted agents, unexpected outcomes of control arms in terms of PFS and OS, disproving the statistical hypotheses, have sometimes been observed. A possible explanation of these results could derive from an indirect comparison of studies with different population. Taking this into account, careful analyses of baseline characteristics of patients in the control arm could suggest hypothesis regarding the causes of such outcomes.

Phase 2 study of first-line sorafenib vs IFN

As reported, the phase 2 study comparing sorafenib vs IFN showed the lowest PFS associated with the use of sorafenib in first-line setting (5.6 months) (Escudier et al, 2009). We wondered how this negative result could be explained. Analysing the study, it appears that, besides an imbalance between the two experimental groups (sorafenib and IFN) at enrollment, in both treatment arms baseline patient characteristics were less favourable than in all other studies for metastatic sites and number of metastases (Escudier et al, 2012). Indeed, 72% of patients in the sorafenib arm and 64% in the IFN arm had 5 metastatic sites (Szczylik and Staehler, 2007). In this situation, the performance of the IFN arm was definitely better than the IFN arms of other studies (Table 2), whereas the PFS of the sorafenib arm was the shortest compared with those observed in other first-line studies (Procopio et al, 2012). This finding should support a speculative hypothesis by Escudier et al, 2009, 2012, arguing that it is likely that in patients with a high number of metastases and highly compromising metastatic sites, the efficacy of targeted therapies and of the IFN is lower, because of the poor biology and more aggressive features of the disease.

TIVO-1

In the TIVO-1 study was reported the highest PFS for the control arm sorafenib, corresponding to 9.1 months. How could we explain this result?

With the aim to evaluate the differences between the tyrosine kinase inhibitor (TKI) tivozanib with the comparative arm sorafenib, the TIVO-1 study was conducted in a population with 70% of patients receiving first-line treatment and 30% receiving second-line treatment after relapsing from cytokines (Motzer et al, 2012). Study results among treatment naive patients showed a PFS of 12.7 months for patients treated with tivozanib, the greatest benefit so far observed using targeted therapies in RCC. A recent document by the Oncologic Drug Advisory Committee of the US Food and Drug Administration analysing the methodology of the study (Food and Drug Oncologic Drug Advisory Comitee, 2013) questioned whether this considerable benefit is entirely the result of the drug. Looking at Table 3, it appears that the baseline characteristics of patients in TIVO-1 were similar to those of patients enrolled in AMG 386 (Rini et al, 2012): both studies show similar Motzer score profiles, with a prevalence of intermediate-prognosis patients (61% vs 62%) and about 35% good-prognosis patients (37% vs 34%). Similar at baseline is also the percentage of brain, bone and liver metastatic sites, whereas more favourable patient selection resulted because of nephrectomy, normally present in 80–90% of treated patients, that was a mandatory inclusion criteria in the TIVO-1, as in the AVOREN study. In fact, the nephrectomy in the TIVO-1 study was an upfront criteria because it was based on the extension of the prior phase II study (Nosov et al, 2012). These peculiarities could support a thesis that part of the benefit observed in terms of PFS could be related to the more favourable baseline conditions of the enrolled population. Confirmation should address the observation that the performance of the control arm in both TIVO-1 and AMG 386 achieved the best PFS values for sorafenib alone, exceeding 9 months (Procopio et al, 2012).

Second-line studies with sorafenib

The negative trend in PFS reported for sorafenib used in the second-line setting as control arm appear antithetical to the first-line studies, in which improvements in PFS were observed over time. A possible explanation could relate to two factors: the relevant differences in first-line treatments (cytokines and chemotherapy with TARGET, sunitinib for the other two studies) with a consequent potential negative impact on the second-line entity of PFS, as shown in the AXIS trial, and the decreasing percentage of good-prognosis baseline MSKCC scores among patients accrued in these studies (52, 28 and 13%, respectively) with a consequent progressive increase of poor-prognosis patients (0, 33, and 13%, respectively).

Another situation that is difficult to explain is the fluctuating trend in OS of the sorafenib arm observed in these studies.

The OS trend observed in second-line therapy with sorafenib seems unusual: because of the availability of new agents and the possibility of further treatment lines, theoretically, OS should increase over time. Probably the best explanation, apart from the discussed modifications in patient study populations, is strictly connected to the percentage of patients undergoing other therapies following second-line treatment. In fact, notwithstanding the limits of the historical comparisons, we can make the following observations. In the first developmental phase 3 study, sorafenib vs placebo (TARGET), subsequent therapies in patients relapsed after sorafenib were few. When AXIS (second-line head-to-head axitinib vs sorafenib) was undertaken, there was wide availability of agents for treatment following second-line therapy, and this could explain the considerable OS values observed. Looking at INTORSECT (second-line head-to-head temsirolimus vs sorafenib), the rate of patients undergoing treatment after sorafenib was only 6.3%, then the consistent OS value of 16.6 months should be taken as an important sign of the effectiveness of sorafenib. In particular, the OS data observed in the INTORSECT trial were completely unexpected: the difference between sorafenib and the mTOR inhibitor exceeds 4 months of benefit. As reported by Hutson in his presentation during ESMO 2012, even though OS was a secondary end point, the magnitude of the data should not be underestimated.

AGILE 1051

As reported, the observed median PFS of 6.5 vs 10.6 months observed for sorafenib in this study clearly goes against the grain compared with previous investigations, 9.0 months in AMG 386 (Rini et al, 2012) and 9.1 months in TIVO-1 (Motzer et al, 2012). How could we explain this trend? The phase 3 first-line study AGILE evaluated the new targeted agent axitinib vs sorafenib as the control arm (Hutson et al, 2013b). In accordance with the discussant at the ASCO 2013 Genitourinary Cancers Symposium, it is likely that this weak performance of both sorafenib and axitinib may be partially justified by the particular geographical distribution of high-rate accruing centres involved in the study. Most of the centres were located in Eastern Europe, Asia, South America and Africa and likely were not yet skilled in or accustomed in the use of TKIs and possibly were forced to enter patients into clinical studies in the absence of other efficacious treatment options, mainly for economic reasons (Srinivas, 2013).

Conclusions

We deem that the present retrospective analysis of control (comparator) arm data derived from studies carried out so far in mRCC provides useful information for a more precise and rewarding use of these agents in the future.

What message has been learned from comparing comparators? Over the past decade, IFN achieved the most impressive improvement in OS, from 13 to 33.5 months, whereas sorafenib improved its mean PFS from 5.5 to a maximum of 9.1 months. Were these improvements the result of an increase in efficacy? Unquestionably not; rather, they arise from the general improvement of patient conditions at study entry, more reliable diagnostic and treatment procedures and further experience of investigators in management of adverse events. Through these new conditions, substantial improvements in the possibilities of disease control have been achieved. Considering the previous points, we believe that patients with kidney tumour can achieve additional benefits in survival through the sequential therapy of the two TKI sorafenib-sunitinib or vice versa, as demonstrated in the recent phase 3 trial SWITCH (Michel, 2014). The final message to be conveyed is that, analysing the results of a study, investigators should avoid a simplistic approach that looks only at ‘absolute results’ in terms of PFS or OS. Instead, they must critically evaluate results in relation to the clinical conditions of patients, their prognostic risk class composition and even their geographical distribution.