Introduction

Bladder cancer is the sixth most common cancer worldwide.1 Non-muscle invasive bladder cancer (NMIBC; stage Ta or T1), a histological subtype of bladder cancer, accounts for ~85% of all cases. NMIBC patients are typically treated by transurethral resection and intravesical therapy of Bacillus Calmette-Guérin or mitomycin-C to prevent recurrence or progression to muscle invasive bladder cancer (MIBC; stage T2, T3 or T4).2 Despite these treatment options, many patients fail to respond to the therapy and experience disease recurrence or progression,3 which is a facet of the disease that strongly influences patient survival.

Recent genomics or epigenomics studies of bladder cancer based on next-generation sequencing technologies have gradually and precisely uncovered the genomic status of the disease. In addition to the gene expression features described by previous traditional genome-wide approaches, various novel molecular characteristics of bladder cancer were introduced through integrative genomic approaches.4, 5, 6 Indeed, a number of putative treatment targets for bladder cancer, including highly amplified novel molecules such as CCNE1, CEBPA, and MUC1, have been proposed in various investigations. However, their clinical relevance as prognostic or predictive biomarkers has been insufficiently proven. Furthermore, no studies have examined the regulation of these genes’ expression and whether other genes being associated with them can provide additional insight into the mechanisms of disease progression or tumor invasion.

Here, we examined an association between previously reported novel molecules and cancer outcome using multiple patient cohorts. We analyzed molecules with high copy number alteration (CNA) and their coexpressed genes to assess whether their molecular signature could identify patients with an increased likelihood of disease progression. Through gene-to-gene network and upstream regulator analyses, we also identified a putative signaling pathway that potentially mediates NMIBC aggressiveness.

Materials and methods

Patients and gene expression data

In this study, a total of four cohorts of patients with bladder cancer were used for identifying and validating a prognostic signature. To explore a relationship between the DNA copy number and mRNA expression changes, data sets containing CNA, gene expression and DNA methylation in bladder cancer were obtained from the public database provided by The Cancer Genome Atlas (TCGA) consortium (n=404).5 Prognostic gene sets including genes with significantly correlated copy number and expression levels were explored in a gene expression data set (GSE13507, n=165) from the Chungbuk National University Hospital.7 Among a total of 165 patient samples in this data set, 63 patients had MIBCs, and the remaining 102 patients had NMIBCs. To assess NMIBC prognosis, we only used 102 primary NMIBC samples as the exploration data set (the Korean cohort, n=102). Another gene expression data set (GSE19915, n=144) of bladder cancer samples obtained from 144 patients undergoing transurethral resection at the University Hospital of Lund was used for validation.8 Among these, 97 NMIBCs were chosen as the first validation data set (the Swedish cohort, n=97). Another gene expression data set from a European consortium for bladder cancer microarray study (GSE5479, n=404) was used for the critical validation of the signature.9 Of the 404 patients, 302 NMIBC samples were selected as the second validation data set (the European cohort, n=302).

To reduce confounding factors affecting the analysis, any patients diagnosed with a concomitant carcinoma in situ (CIS) lesion or only a CIS lesion were excluded. In this study, progression of the disease was defined as an increase in stage from either Ta or T1 to T2 or higher after disease relapse. All of the copy number or gene expression data are available at the TCGA consortium or the Gene Expression Omnibus public database (numbers GSE13507, GSE19915, and GSE5479).

Statistical analysis

Pearson and Spearman correlation coefficients were calculated to evaluate the association between the DNA copy number and gene expression. For more assessments of relationship between continuous mRNA expression and discrete copy number status, we equally distributed bladder cancer patients into five subgroups based on their copy number values (that is, each quintile involved 20% of the patients) and performed Polyserial correlation tests on the expression data and copy number groups. To estimate the significance of gene expression difference between the patient subgroups according to the DNA methylation value, we performed a two-sample t-test for each gene. The association between the DNA copy number and gene expression in the presence of DNA methylation was assessed using multivariate linear regression models.

To generate a gene set that was highly associated with gene features, we used previously established method based on a correlation test, namely iterative in trans correlation analysis.10 Briefly, a Pearson correlation test for a gene feature was applied to the exploration data set to select genes that exhibited significant correlation coefficients (|r|>0.5 and P<0.001). Using a gene expression data matrix consisting of a gene feature and its correlated genes, we performed hierarchical clustering analysis using the centered correlation coefficient as the measure of similarity and complete linkage clustering method. According to the patient clustering results, the patients were divided into two subgroups and the time to progression of the patients in each subgroup was estimated. The Kaplan–Meier method was used to calculate progression-free survival, and differences in survival between the subgroups were assessed using log-rank statistics. BRB-Array Tools (National Cancer Institute, Bethesda, MD, USA) were used for a critical validation procedure. To test the prognostic value of the signature, we applied several prediction models based on the compound covariate predictor, linear discriminant analysis, 1-nearest neighbor classifier, 3-nearest neighbor classifier, nearest centroid, and Bayesian compound covariate predictor. The prognostic association between the signature and risk factors was assessed using multivariate Cox regression models. A backward–forward step procedure (function step, R package stats) was applied to optimize the multivariate model with the most informative variables.11

Gene set enrichment analysis was performed to identify the most significant gene sets associated with diseases, molecular functions, and physiological system development. The significance of over-represented gene sets was estimated using the Fisher’s exact test. To explore regulator candidates contributing to disease progression, we performed an upstream regulator analysis that searched known targets of each regulator in the data set and compared their direction of change to the expected change based on previously published literature. Gene set enrichment and upstream regulator analyses were performed using the Ingenuity Pathway Analysis (IPA, Ingenuity Systems, www.ingenuity.com).

Results

Baseline characteristics of NMIBC patient cohorts

Clinical and histopathological variables for the bladder cancer patient cohorts used in this study are listed in Supplementary Table 1. The median follow-up times for the Korean, Swedish and European cohorts were 55.3 (range, 3.2–137 months), 49.5 (range, 1.2–79.8 months) and 52.0 months (range, 1–185 months), respectively. During the follow-up period, 11, 15 and 44 of the bladder cancer patients experienced disease progression in the Korean, Swedish, and European cohorts, respectively.

Prognostic value of CCNE1 CNAs

Among a number of recent studies illustrating various novel molecular characteristics of bladder cancer through integrative genomic approaches,4, 5, 6 one displayed significantly novel CNAs as cancer drivers associated with bladder tumorigenesis.4 Although highly amplified novel candidates, including CCNE1, CEBPA, and MUC1, were originally reported, no associations with expression or disease prognosis have been elucidated to date, which prompted us to explore a further relationship among CNAs, transcriptional changes and disease outcome. We first evaluated a correlation between the DNA copy number and gene expression using the data obtained from the TCGA data portal. When the Pearson and the Spearman correlation coefficients between the copy number and the expression values of the 14 candidate genes were estimated, 13 (92.9%) exhibited a statistically significant positive correlation, with a median correlation coefficient of 0.5 (range, 0.1–0.83) and 0.42 (range, 0.12–0.77), respectively (Table 1). We also equally stratified bladder cancer patients into five subgroups based on their copy number values and measured the copy number group-specific expression patterns of the 14 candidate genes. Polyserial correlation analysis showed that mRNA expression levels were significantly associated with the groups of copy number in the candidate genes (Supplementary Figure 1).

Table 1 Genes with significant copy number changes and the number of their correlated genes

Epigenetic factors such as DNA methylation have an important role in regulating gene expression. Aberrant DNA methylation can affect the gene expression and may introduce noise into the evaluation of an association between the DNA copy number and mRNA expression. In the TCGA cohort, for identifying whether the DNA methylation influences the association between the DNA copy number and mRNA expression, the patients were stratified into two groups according to the DNA methylation values: the group with DNA methylation values in the upper 50th percentile (high) and the group with DNA methylation values in the lower 50th percentile (low). The 14 candidate genes were more highly expressed in the low-methylation group compared with the high-methylation group (each P<0.01 by two-sample t-test; Supplementary Figure 2). When the Pearson and the Spearman correlation coefficients between the copy number and the expression values were calculated in the high- and low-methylation groups, however, there were no significant differences of relationship between the copy number and the gene expression (Supplementary Figure 2). We also applied a multivariate linear regression analysis in each gene to evaluate the association between the DNA copy number and mRNA expression in the presence of DNA methylation. Supplementary Table 2 summarizes the multivariate test results, showing that the DNA copy number variations of the candidate genes had significantly positive correlations with their mRNA expression levels, even after considering the DNA methylation status. Taken together, these results demonstrate that the association between the DNA copy number and mRNA expression is independent of the DNA methylation.

We next generated in trans gene sets correlated with the 14 genes with significant CNA to identify an expression signature altered by CNA during disease progression. Using the exploration data set, we performed an unsupervised hierarchical clustering analysis of in trans genes correlated with each gene feature, divided the NMIBC samples into two groups based on patient clusters and estimated the prognostic value of each gene set for NMIBC progression. CCNE1 was the gene that was most strongly associated with NMIBC progression (Table 1). By hierarchical cluster analysis using a total of 300 genes correlated with CCNE1 expression, patients with NMIBC were divided into two groups: a high CCNE1 cluster (HCE) and low CCNE1 cluster (LCE; Figure 1a). The progression rate of the HCE patients was significantly increased compared with LCE patients (P=0.03 by log-rank test; Figure 1b).

Figure 1
figure 1

Gene expression pattern of the CCNE1 signature and progression-free survival of two clusters in the exploration data set (n=102). (a) Gene expression patterns of CCNE1 and its associated genes. A total of 300 genes with expression patterns that highly correlate with CCNE1 were selected for a cluster analysis (Pearson correlation test, |r|>0.5 and P<0.001). The patients were divided into two groups: a low CCNE1 cluster (LCE) and a high CCNE1 cluster (HCE). (b) Kaplan–Meier plot depicting time to progression. The progression rate of the HCE patients was significantly increased compared with LCE patients (P=0.03 by log-rank test).

Validation of the signature consisting of CCNE1 and its associated genes in the independent cohorts

To validate our findings, we used gene expression data from an independent cohort of Swedish patients with bladder tumors. Using the same procedure employed in the previous result, the patients in this cohort were divided into two groups (HCE and LCE) by hierarchical cluster analysis using the 61 genes that overlapped with the 300 genes derived from the exploration data set, and the progression-free survival of each group was estimated. Kaplan–Meier analysis revealed that the CCNE1 signature was a significant predictor of progression-free NMIBC survival in the Swedish cohort (P=0.01 by log-rank test; Figure 2a).

Figure 2
figure 2

The validation of the CCNE1 signature in an independent cohort. Progression-free survival of two patient groups classified by the CCNE1 signature (a) in the Swedish cohort (n=97) and (b) in the European cohort (n=302).

For further critical validation, we additionally used gene expression data from another independent cohort of European patients with bladder tumors. We analyzed the data set of the European cohort using the same clustering algorithms and Kaplan–Meier analyses. Consistent with our previous result, Kaplan–Meier analysis indicated that the CCNE1 signature was a significant predictor of progression-free NMIBC survival in the European cohort (P=0.03 by log-rank test; Figure 2b). Next, we applied multiple class prediction methods based on the classical training-test procedure to evaluate the effectiveness of our signature for disease progression (Supplementary Figure 3a). After the genes that were most differentially expressed between the HCE and LCE subgroups in the exploration data set (the training set) were selected, they were combined to form a series of classifiers estimating the probability of whether a particular bladder tumor belongs to the HCE or LCE subgroup. The number of genes in the classifiers was optimized to minimize misclassification during the leave-one-out cross-validation of the tumors in the training set. For each leave-one-out cross-validation training set, the entire model-building process was repeated, including the gene selection process. When applied to the European cohort (the test set), all prediction models produced consistent and statistically significant prediction patterns. Kaplan–Meier plots in the test set indicated significant differences regarding the risk of progression between patients in the HCE and LCE subgroups (Supplementary Figure 3b).

To compare discrimination powers between single gene and a gene set signature based on CCNE1, we analyzed 199 NMIBC samples in a pooled data set combined with the Korean and the Swedish cohorts containing the CCNE1 gene. When dividing them into two groups according to the expression level of CCNE1, although the frequency of progression was clearly higher in the group with CCNE1 expression levels in the upper 50th percentile than in the group with CCNE1 expression levels in the lower 50th percentile, its statistical significance was not observed (P=0.078 by log-rank test; Supplementary Figure 4), demonstrating a better performance of the CCNE1 signature than CCNE1 single-gene expression.

We also applied multivariate Cox regression analyses to our signature with CCNE1 single gene and known clinicopathologic risk factors of NMIBC to verify the independent utility of the newly identified signature based on CCNE1 expression. In the pooled cohort combined with the Korean and Swedish patients, the CCNE1 signature was an independent risk factor for disease progression (hazard ratio=2.898, 95% confidence interval=1.027–8.187, P=0.044; Supplementary Table 3) even after applying a variable selection procedure. When another multivariate test was carried out in the European cohort, the CCNE1 signature still retained its statistical significance for progression of NMIBC (hazard ratio=6.849, 95% confidence interval=1.613–29.092, P=0.009; Supplementary Table 4).

Biological insights into the prognostic CCNE1 signature

To explore the biological characteristics that are active during NMIBC progression, a gene set enrichment test of the 300 genes involved in the progression signature was performed using IPA. When applying the 300 genes to IPA, the genes involved in cancer, cell cycle, and connective tissue development and function were significantly enriched. In the category of molecular and cellular functions, genes involved in DNA repair or cell death and survival were also significantly represented, as were cell cycle genes, indicating that the biological processes associated with cell cycle, the DNA-repair system, and cell death/survival might closely affect the aggressive clinical behavior of NMIBC (Supplementary Figure 5).

To identify predominant regulators and signaling pathways reflecting the observed expression changes during progression, upstream regulator and gene-to-gene network analyses of the 300 genes were also performed using IPA. When searching for regulator candidates on CCNE1 upstream, we observed functional connectivity between CCND1 and CCNE1, in which CCNE1 is a downstream effector of CCND1 (Figure 3). CCND1, a member of the cyclin domain along with CCNE1, is involved in both cyclins and cell cycle regulation pathways, thus controlling cell cycle progression. CCND1 is activated in various cancers, thus increasing the invasiveness of cancer cells, including bladder cancer.12 CCND1 is also correlated with early cancer onset and tumor progression,13 which indicates that the CCND1 may be an important mediator in the progression of NMIBC. Among the regulator candidates, we also identified a functional relationship between CCNE1 and SP1. The transcript regulator SP1, which mediates cell death and growth, is a direct binding partner of CCNE1 and is involved in the DNA-repair system (Figure 3). SP1 is reportedly overexpressed in many cancers and is associated with poor prognosis.14 Thus, SP1 activation might account for the poorer prognosis of the NMIBC subgroup HCE. Interestingly, among the other satellite genes in the gene network, RBL1 was commonly regulated by CCND1, CCNE1, and SP1 (Figure 3). RBL1, a transcription factor that is involved in the cyclin domain similar to CCND1 and CCNE1, was positively expressed in numerous cancers,15, 16, 17 including bladder cancer cell.16 In the exploration data set, RBL1 was more highly expressed in the HCE compared with the LCE subgroup (P<0.001 by two-sample t-test; Supplementary Figure 6).

Figure 3
figure 3

Gene networks enriched with genes associated with disease progression. Upregulated genes in the high CCNE1 cluster (HCE) group are indicated in red. The intensity of color is indicative of the degree of overexpression. Orange-colored elements indicate predictive activated genes. Each line and arrow represents functional and physical interactions between the genes and the direction of regulation reported in the literature, respectively. The networks were generated by using Ingenuity Pathways Analysis (Ingenuity Systems, www.ingenuity.com).

Discussion

Multiple genetic aberrations arising from various biological factors, such as CNA, DNA mutation or epigenetic modification, during the long process of tumorigenesis are responsible for the initiation and progression of cancers.18 Indubitably, CNA is one of the important activators to variously differentiate gene expressions in cancer.19 In the current study, using Pearson correlation tests, we showed that many novel genes with CNAs were significantly correlated with altered expression in bladder cancers. Then, we performed an in trans correlation analysis to identify gene sets whose expression were associated with genes with CNAs and potential molecular signatures contributing NMIBC progression. As a result, a molecular signature consisting of 300 genes correlated with copy number-altered CCNE1 expression was identified and was capable of predicting progression in NMIBC. The validity of this signature as a prognostic indicator was confirmed in two additional independent NMIBC patient cohorts. The CCNE1 signature was also an independent prognostic factor when comparing other known clinical and pathological risk factors. Finally, we identified a putative molecular mechanism defined by CCND1-CCNE1-SP1 signaling that is potentially responsible for disease progression by upstream regulator analysis.

Although gene products that are involved in cell-cycle regulation, such as p53, pRB, p16NK4a, p14ARF, cyclin D1, p27, CDK4, and cyclin E, are involved in bladder tumorigenesis,20, 21, 22, 23, 24, 25 these genes have reportedly limited value for predicting the clinical course of bladder cancer. Although some investigators have observed an association between these genes and tumor invasion, a significant prognostic value of cell-cycle proteins in NMIBC disease progression has not been demonstrated to date.26 Referring to the previous study,4 we found that many genes with significant CNA in bladder cancer, such as CCNE1, CCND1, TP53, and RB1, are involved in cell-cycle regulation (Table 1). Through an estimation of prognosis using a gene set coexpressed with each of these genes, we discovered that an expression signature of CCNE1 and its associated genes was a novel predictor of NMIBC progression. CCNE1 is amplified and overexpressed in many tumors, which results in chromosome instability, suggesting that its deregulation may contribute to tumorigenesis.27, 28, 29, 30, 31 Previous investigations revealing that alteration of the cell cycle by CCNE1 overexpression aggressively affected the clinical behavior of bladder cancer have suggested a crucial role for CCNE1 in the progression of bladder cancer.32, 33 Taken together, these data underscore the involvement of CCNE1 in NMIBC progression.

To identify the cascade of upstream transcriptional regulators that can explain the observed gene expression changes in a data set, we performed upstream regulator analysis using IPA and identified a putative pathway defined by CCND1-CCNE1-SP1 signaling during progression. CCND1 is a well-established human oncogene. Substantial evidence suggests the involvement of CCND1 amplification and overexpression in various cancer types, including breast, lung, melanoma, and oral squamous cell carcinomas.34, 35 Furthermore, recent studies illustrated that CCND1 exhibits potential as a therapeutic target for several cancers.36, 37, 38 Because CCNE1 is a downstream target of CCND1, our results strongly support the potential involvement of CCND1 in NMIBC progression. SP1, a direct binding partner of CCNE1, mediates cell death and growth and is clinically overexpressed in many cancers,14 implying that SP1 cooperates with CCNE1, ultimately resulting in a poorer prognosis of NMIBC. As a common downstream effector of CCND1, CCNE1, and SP1, RBL1 is a gene that is involved in the RB protein family, which is principally known for its central role on cell cycle regulation,39 and is highly expressed in numerous cancers,15, 16, 17 indicating that changes in RBL1 might reflect aggressive clinical behavior in NMIBC.

In conclusion, we identified a high-risk prognostic subgroup defined by CNA and expression of CCNE1 in NMIBC. As a predictive indicator, this newly identified signature may not only contribute an understating of the biology of aggressive changes in bladder cancer but also enable predictions of clinical behavior of heterogeneous NMIBC patients at diagnosis. However, to practically use the signature in the clinical field, more elaborate and rigorous investigation steps, not only identifying limited number of markers that still harbor the robustness of the signature but also selecting targets detectable through a liquid biopsy from blood or voided urine, are needed.