Skip to main content

Thank you for visiting nature.com. You are using a browser version with limited support for CSS. To obtain the best experience, we recommend you use a more up to date browser (or turn off compatibility mode in Internet Explorer). In the meantime, to ensure continued support, we are displaying the site without styles and JavaScript.

  • Article
  • Published:

Hidden patterns of gene expression provide prognostic insight for colorectal cancer

Abstract

Cancer tissue samples contain cancer cells and non-cancer cells with each biopsied site containing distinct proportions of these populations. Consequently, assigning useful tumor subtypes based on gene expression measurements from clinical samples is challenging. We applied a blind source separation approach to extract cancer cell-intrinsic gene expression patterns within clinical tumor samples of colorectal cancer. After a blind source separation, we found that a cancer cell-intrinsic gene expression program unique to each patient exists in the “residual” expression profile remaining after separation of the gene expression data. We performed a consensus clustering analysis of the extracted gene expression profiles to identify novel and robust cancer cell-intrinsic subtypes. We validated the identified subtypes using an independent clinical gene expression dataset. The cancer cell-intrinsic subtypes are independent of biopsy site and provided prognostic information in addition to currently available clinical and molecular variables. After validating this approach in colorectal cancer, we further identified novel tumor subtypes with unique clinical information across multiple types of cancer. These cancer cell-intrinsic molecular subtypes provide novel prognostic value for clinical assessment of cancer.

This is a preview of subscription content, access via your institution

Access options

Buy this article

Prices may be subject to local taxes which are calculated during checkout

Fig. 1: An illustration of NMF application to tumor tissue samples.
Fig. 2: Comparison of the clustering performances of multi-regional samples between the original gene expression profiles and the residual datasets.
Fig. 3: Identifying an optimal residual dataset representing cancer cell-intrinsic information from the TCGA dataset.
Fig. 4: Identification of an RS classifier and comparison of the distribution between the RSs and the previous CRC subtypes.
Fig. 5: Molecular characteristics, anticancer drug responses, and mutational profiles of the CRC RSs.
Fig. 6: Prognostic association between the RSs.
Fig. 7: Identification of unique prognostically informative subtypes of PAAD, HNSC, and LGG by RS classification.

Similar content being viewed by others

Data availability

All the datasets analyzed in this study are publicly available as described in the Materials and Methods in the manuscript.

References

  1. Aran D, Sirota M, Butte AJ. Systematic pan-cancer analysis of tumour purity. Nat Commun. 2015;6:8971.

    Article  CAS  Google Scholar 

  2. Lee HO, Hong Y, Etlioglu HE, Cho YB, Pomella V, Van den Bosch B, et al. Lineage-dependent gene expression programs influence the immune landscape of colorectal cancer. Nat Genet. 2020;52:594–603.

    Article  CAS  Google Scholar 

  3. Moffitt RA, Marayati R, Flate EL, Volmar KE, Loeza SG, Hoadley KA, et al. Virtual microdissection identifies distinct tumor- and stroma-specific subtypes of pancreatic ductal adenocarcinoma. Nat Genet. 2015;47:1168–78.

    Article  CAS  Google Scholar 

  4. Isella C, Terrasi A, Bellomo SE, Petti C, Galatola G, Muratore A, et al. Stromal contribution to the colorectal cancer transcriptome. Nat Genet. 2015;47:312–9.

    Article  CAS  Google Scholar 

  5. Dunne PD, Alderdice M, O’Reilly PG, Roddy AC, McCorry AMB, Richman S, et al. Cancer-cell intrinsic gene expression signatures overcome intratumoural heterogeneity bias in colorectal cancer patient classification. Nat Commun. 2017;8:15657.

    Article  CAS  Google Scholar 

  6. Dunne PD, McArt DG, Bradley CA, O’Reilly PG, Barrett HL, Cummins R, et al. Challenging the cancer molecular stratification dogma: intratumoral heterogeneity undermines consensus molecular subtypes and potential diagnostic value in colorectal cancer. Clin Cancer Res. 2016;22:4095–104.

    Article  CAS  Google Scholar 

  7. Isella C, Brundu F, Bellomo SE, Galimi F, Zanella E, Porporato R, et al. Selective analysis of cancer-cell intrinsic transcriptional traits defines novel clinically relevant subtypes of colorectal cancer. Nat Commun. 2017;8:15107.

    Article  CAS  Google Scholar 

  8. Lee DD, Seung HS. Learning the parts of objects by non-negative matrix factorization. Nature. 1999;401:788–91.

    Article  CAS  Google Scholar 

  9. Sadanandam A, Lyssiotis CA, Homicsko K, Collisson EA, Gibb WJ, Wullschleger S, et al. A colorectal cancer classification system that associates cellular phenotype and responses to therapy. Nat Med. 2013;19:619–25.

    Article  CAS  Google Scholar 

  10. Devarajan K. Nonnegative matrix factorization: an analytical and interpretive tool in computational biology. PLoS Comput Biol. 2008;4:e1000029.

    Article  Google Scholar 

  11. Brunet JP, Tamayo P, Golub TR, Mesirov JP. Metagenes and molecular pattern discovery using matrix factorization. Proc Natl Acad Sci USA. 2004;101:4164–9.

    Article  CAS  Google Scholar 

  12. Davis S, Meltzer PS. GEOquery: a bridge between the Gene Expression Omnibus (GEO) and BioConductor. Bioinformatics. 2007;23:1846–7.

    Article  Google Scholar 

  13. Miller JA, Cai C, Langfelder P, Geschwind DH, Kurian SM, Salomon DR, et al. Strategies for aggregating gene expression data: the collapseRows R function. BMC Bioinforma. 2011;12:322.

    Article  CAS  Google Scholar 

  14. Colaprico A, Silva TC, Olsen C, Garofano L, Cava C, Garolini D, et al. TCGAbiolinks: an R/Bioconductor package for integrative analysis of TCGA data. Nucleic Acids Res. 2016;44:e71.

    Article  Google Scholar 

  15. Tripathi MK, Deane NG, Zhu J, An H, Mima S, Wang X, et al. Nuclear factor of activated T-cell activity is associated with metastatic capacity in colon cancer. Cancer Res. 2014;74:6947–57.

    Article  CAS  Google Scholar 

  16. Kirzin S, Marisa L, Guimbaud R, De Reynies A, Legrain M, Laurent-Puig P, et al. Sporadic early-onset colorectal cancer is a specific sub-type of cancer: a morphological, molecular and genetics study. PLoS One. 2014;9:e103159.

    Article  Google Scholar 

  17. Marisa L, de Reynies A, Duval A, Selves J, Gaub MP, Vescovo L, et al. Gene expression classification of colon cancer into molecular subtypes: characterization, validation, and prognostic value. PLoS Med. 2013;10:e1001453.

    Article  CAS  Google Scholar 

  18. Thorsteinsson M, Kirkeby LT, Hansen R, Lund LR, Sorensen LT, Gerds TA, et al. Gene expression profiles in stages II and III colon cancers: application of a 128-gene signature. Int J Colorectal Dis. 2012;27:1579–86.

    Article  Google Scholar 

  19. Schlicker A, Beran G, Chresta CM, McWalter G, Pritchard A, Weston S, et al. Subtypes of primary colorectal tumors correlate with response to targeted treatment in colorectal cell lines. BMC Med Genomics. 2012;5:66.

    Article  CAS  Google Scholar 

  20. Laibe S, Lagarde A, Ferrari A, Monges G, Birnbaum D, Olschwang S, et al. A seven-gene signature aggregates a subgroup of stage II colon cancers with stage III. OMICS. 2012;16:560–5.

    Article  CAS  Google Scholar 

  21. de Sousa EMF, Colak S, Buikhuisen J, Koster J, Cameron K, de Jong JH, et al. Methylation of cancer-stem-cell-associated Wnt target genes predicts poor prognosis in colorectal cancer patients. Cell Stem Cell. 2011;9:476–85.

    Article  Google Scholar 

  22. Smith JJ, Deane NG, Wu F, Merchant NB, Zhang B, Jiang A, et al. Experimentally derived metastasis gene expression profile predicts recurrence and death in patients with colon cancer. Gastroenterology. 2010;138:958–68.

    Article  CAS  Google Scholar 

  23. Jorissen RN, Gibbs P, Christie M, Prakash S, Lipton L, Desai J, et al. Metastasis-Associated Gene Expression Changes Predict Poor Outcomes in Patients with Dukes Stage B and C Colorectal Cancer. Clin Cancer Res. 2009;15:7642–51.

    Article  CAS  Google Scholar 

  24. Jorissen RN, Lipton L, Gibbs P, Chapman M, Desai J, Jones IT, et al. DNA copy-number alterations underlie gene expression differences between microsatellite stable and unstable colorectal cancers. Clin Cancer Res. 2008;14:8061–9.

    Article  CAS  Google Scholar 

  25. Gautier L, Cope L, Bolstad BM, Irizarry RA. affy—analysis of Affymetrix GeneChip data at the probe level. Bioinformatics. 2004;20:307–15.

    Article  CAS  Google Scholar 

  26. Johnson WE, Li C, Rabinovic A. Adjusting batch effects in microarray expression data using empirical Bayes methods. Biostatistics. 2007;8:118–27.

    Article  Google Scholar 

  27. Ritchie ME, Phipson B, Wu D, Hu Y, Law CW, Shi W, et al. limma powers differential expression analyses for RNA-sequencing and microarray studies. Nucleic Acids Res. 2015;43:e47.

    Article  Google Scholar 

  28. Barretina J, Caponigro G, Stransky N, Venkatesan K, Margolin AA, Kim S, et al. The Cancer Cell Line Encyclopedia enables predictive modelling of anticancer drug sensitivity. Nature. 2012;483:603–7.

    Article  CAS  Google Scholar 

  29. Gaujoux R, Seoighe C. A flexible R package for nonnegative matrix factorization. BMC Bioinforma. 2010;11:367.

    Article  Google Scholar 

  30. Moller-Levet CS, Cho KH, Wolkenhauer O. Microarray data clustering based on temporal variation: FCV with TSD preclustering. Appl Bioinforma. 2003;2:35–45.

    CAS  Google Scholar 

  31. Monti PT S, Mesirov J, Golub T. Consensus Clustering: A Resampling-Based Method for Class Discovery and Visualization of Gene Expression Microarray Data. Mach Learn. 2003;51:91–118.

    Article  Google Scholar 

  32. Wilkerson MD, Hayes DN. ConsensusClusterPlus: a class discovery tool with confidence assessments and item tracking. Bioinformatics. 2010;26:1572–3.

    Article  CAS  Google Scholar 

  33. Freedman D, Purves R, Fradon D, Callum L Statistics, 4th edn. W. W. Norton & Company, 2007.

  34. Guinney J, Dienstmann R, Wang X, de Reynies A, Schlicker A, Soneson C, et al. The consensus molecular subtypes of colorectal cancer. Nat Med. 2015;21:1350–6.

    Article  CAS  Google Scholar 

  35. Van Dongen S. Graph clustering via a discrete uncoupling process. Siam J Matrix Anal Appl. 2008;30:121–41.

    Article  Google Scholar 

  36. Studer M WeightedCluster Library Manual: A practical guide to creating typologies of trajectories in the social sciences with R. LIVES Working Papers 2013.

  37. Tusher VG, Tibshirani R, Chu G. Significance analysis of microarrays applied to the ionizing radiation response. Proc Natl Acad Sci USA. 2001;98:5116–21.

    Article  CAS  Google Scholar 

  38. Eide PW, Bruun J, Lothe RA, Sveen A. CMScaller: an R package for consensus molecular subtyping of colorectal cancer pre-clinical models. Sci Rep. 2017;7:16618.

    Article  Google Scholar 

  39. Hoshida Y. Nearest template prediction: a single-sample-based flexible class prediction with confidence assessment. PLoS ONE. 2010;5:e15543.

    Article  Google Scholar 

  40. Wu D, Smyth GK. Camera: a competitive gene set test accounting for inter-gene correlation. Nucleic Acids Res. 2012;40:e133.

    Article  CAS  Google Scholar 

  41. Cox DR. Regression models and life-tables. J R Stat Soc Ser B (Stat Methodol). 1972;34:187–220.

    Google Scholar 

  42. Calon A, Espinet E, Palomo-Ponce S, Tauriello DV, Iglesias M, Cespedes MV, et al. Dependency of colorectal cancer on a TGF-beta-driven program in stromal cells for metastasis initiation. Cancer Cell. 2012;22:571–84.

    Article  CAS  Google Scholar 

  43. Rouillard AD, Gundersen GW, Fernandez NF, Wang Z, Monteiro CD, McDermott MG, et al. The harmonizome: a collection of processed datasets gathered to serve and mine knowledge about genes and proteins. Database (Oxford). 2016;2016:1–16.

    Article  Google Scholar 

  44. Xu R, Wunsch D 2nd. Survey of clustering algorithms. IEEE Trans Neural Netw. 2005;16:645–78.

    Article  Google Scholar 

  45. Becht E, Giraldo NA, Lacroix L, Buttard B, Elarouci N, Petitprez F, et al. Estimating the population abundance of tissue-infiltrating immune and stromal cell populations using gene expression. Genome Biol. 2016;17:218.

    Article  Google Scholar 

  46. Liu T, Zhang X, So CK, Wang S, Wang P, Yan L, et al. Regulation of Cdx2 expression by promoter methylation, and effects of Cdx2 transfection on morphology and gene expression of human esophageal epithelial cells. Carcinogenesis. 2007;28:488–96.

    Article  CAS  Google Scholar 

  47. Lorentz O, Duluc I, Arcangelis AD, Simon-Assmann P, Kedinger M, Freund JN. Key role of the Cdx2 homeobox gene in extracellular matrix-mediated intestinal cell differentiation. J Cell Biol. 1997;139:1553–65.

    Article  CAS  Google Scholar 

  48. Medico E, Russo M, Picco G, Cancelliere C, Valtorta E, Corti G, et al. The molecular landscape of colorectal cancer cell lines unveils clinically actionable kinase targets. Nature Communications. 2015;6:1–10.

    Article  Google Scholar 

  49. Khambata-Ford S, Garrett CR, Meropol NJ, Basik M, Harbison CT, Wu S, et al. Expression of epiregulin and amphiregulin and K-ras mutation status predict disease control in metastatic colorectal cancer patients treated with cetuximab. J Clin Oncol. 2007;25:3230–7.

    Article  CAS  Google Scholar 

  50. Misale S, Di Nicolantonio F, Sartore-Bianchi A, Siena S, Bardelli A. Resistance to anti-EGFR therapy in colorectal cancer: from heterogeneity to convergent evolution. Cancer Disco. 2014;4:1269–80.

    Article  CAS  Google Scholar 

  51. Dean L, Kane M. Cetuximab Therapy and RAS and BRAF Genotype. In: Pratt VM, Scott SA, Pirmohamed M, Esquivel B, Kane MS, Kattman BL, et al. (eds). Medical Genetics Summaries: Bethesda (MD), 2012.

  52. Liu J, Lichtenberg T, Hoadley KA, Poisson LM, Lazar AJ, Cherniack AD, et al. An integrated TCGA pan-cancer clinical data resource to drive high-quality survival outcome analytics. Cell. 2018;173:400–16.e411.

    Article  CAS  Google Scholar 

  53. Dalerba P, Sahoo D, Paik S, Guo X, Yothers G, Song N, et al. CDX2 as a prognostic biomarker in stage II and stage III colon cancer. N. Engl J Med. 2016;374:211–22.

    Article  CAS  Google Scholar 

  54. Wang Q, Hu B, Hu X, Kim H, Squatrito M, Scarpace L, et al. Tumor evolution of glioma-intrinsic gene expression subtypes associates with immunological changes in the microenvironment. Cancer Cell. 2017;32:42–56.e46.

    Article  Google Scholar 

  55. Bailey P, Chang DK, Nones K, Johns AL, Patch AM, Gingras MC, et al. Genomic analyses identify molecular subtypes of pancreatic cancer. Nature. 2016;531:47–52.

    Article  CAS  Google Scholar 

  56. Cancer Genome Atlas N. Comprehensive genomic characterization of head and neck squamous cell carcinomas. Nature. 2015;517:576–82.

    Article  Google Scholar 

  57. Brennan CW, Verhaak RG, McKenna A, Campos B, Noushmehr H, Salama SR, et al. The somatic genomic landscape of glioblastoma. Cell. 2013;155:462–77.

    Article  CAS  Google Scholar 

Download references

Acknowledgements

This work was supported by the National Research Foundation of Korea (NRF) funded by the Korea Government, the Ministry of Science and ICT (2020R1A2B5B03094920), the Electronics and Telecommunications Research Institute (ETRI) grant [22ZS1100, Core Technology Research for Self-Improving Integrated Artificial Intelligence System], the Bio & Medical Technology Development Program of the National Research Foundation (NRF) funded by the Ministry of Science & ICT (2021M3A9I4024447) and the KAIST Grand Challenge 30 Project. The authors thank Nancy R. Gough and Corbin S. Hopper for thoughtful discussion and editorial assistance.

Author information

Authors and Affiliations

Authors

Contributions

DK and K-HC conceived and conducted the research, and co-wrote the manuscript. K-HC designed the project and supervised the research.

Corresponding author

Correspondence to Kwang-Hyun Cho.

Ethics declarations

Conflict of interest

The authors declare no competing interests.

Additional information

Publisher’s note Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Supplementary information

Rights and permissions

Springer Nature or its licensor holds exclusive rights to this article under a publishing agreement with the author(s) or other rightsholder(s); author self-archiving of the accepted manuscript version of this article is solely governed by the terms of such publishing agreement and applicable law.

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Kim, D., Cho, KH. Hidden patterns of gene expression provide prognostic insight for colorectal cancer. Cancer Gene Ther 30, 11–21 (2023). https://doi.org/10.1038/s41417-022-00520-y

Download citation

  • Received:

  • Revised:

  • Accepted:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1038/s41417-022-00520-y

Search

Quick links