Skip to main content

Thank you for visiting nature.com. You are using a browser version with limited support for CSS. To obtain the best experience, we recommend you use a more up to date browser (or turn off compatibility mode in Internet Explorer). In the meantime, to ensure continued support, we are displaying the site without styles and JavaScript.

  • Article
  • Published:

Side effect prediction based on drug-induced gene expression profiles and random forest with iterative feature selection

Abstract

One in every ten drug candidates fail in clinical trials mainly due to efficacy and safety related issues, despite in-depth preclinical testing. Even some of the approved drugs such as chemotherapeutics are notorious for their side effects that are burdensome on patients. In order to pave the way for new therapeutics with more tolerable side effects, the mechanisms underlying side effects need to be fully elucidated. In this work, we addressed the common side effects of chemotherapeutics, namely alopecia, diarrhea and edema. A strategy based on Random Forest algorithm unveiled an expression signature involving 40 genes that predicted these side effects with an accuracy of 89%. We further characterized the resulting signature and its association with the side effects using functional enrichment analysis and protein-protein interaction networks. This work contributes to the ongoing efforts in drug development for early identification of side effects to use the resources more effectively.

This is a preview of subscription content, access via your institution

Access options

Buy this article

Prices may be subject to local taxes which are calculated during checkout

Fig. 1: Model development with iterative feature selection.
Fig. 2: Performance metrics varying with dimension of the feature space.
Fig. 3: Heatmap displaying standardized expression level of the signature genes and association with the studied side effects.
Fig. 4: Average local variable importance for the signature genes computed for each side effect sample set.
Fig. 5: Functional enrichment analysis of the signature genes.

Similar content being viewed by others

Code availability

The code will be provided upon request.

References

  1. Dowden H, Munro J. Trends in clinical success rates and therapeutic focus. Nat Rev Drug Discov. 2019;18:495–6.

    Article  CAS  PubMed  Google Scholar 

  2. Fogel DB. Factors associated with clinical trials that fail and opportunities for improving the likelihood of success: a review. Contemp Clin Trials Commun. 2018;11:156–64.

    Article  PubMed  PubMed Central  Google Scholar 

  3. Hay M, Thomas DW, Craighead JL, Economides C, Rosenthal J. Clinical development success rates for investigational drugs. Nat Biotechnol. 2014;32:40–51.

    Article  CAS  PubMed  Google Scholar 

  4. Hingorani AD, Kuan V, Finan C, Kruger FA, Gaulton A, Chopade S, et al. Improving the odds of drug development success through human genomics: modelling study. Sci Rep. 2019;9:18911.

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  5. Hodos RA, Kidd BA, Shameer K, Readhead BP, Dudley JT. In silico methods for drug repurposing and pharmacology. Wiley Interdiscip Rev Syst Biol Med. 2016;8:186–210.

    Article  PubMed  PubMed Central  Google Scholar 

  6. Turanli B, Altay O, Borén J, Turkez H, Nielsen J, Uhlen M, et al. Systems biology based drug repositioning for development of cancer therapy. Semin Cancer Biol. 2019;68:47–58.

  7. Musa A, Ghoraie LS, Zhang SD, Glazko G, Yli-Harja O, Dehmer M, et al. A review of connectivity map and computational approaches in pharmacogenomics. Brief Bioinform. 2018;19:506–23.

    CAS  PubMed  Google Scholar 

  8. Kohonen P, Parkkinen JA, Willighagen EL, Ceder R, Wennerberg K, Kaski S, et al. A transcriptomics data-driven gene space accurately predicts liver cytopathology and drug-induced liver injury. Nat Commun. 2017;8:15932.

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  9. Fielden MR, Eynon BP, Natsoulis G, Jarnagin K, Banas D, Kolaja KL. A gene expression signature that predicts the future onset of drug-induced renal tubular toxicity. Toxicol Pathol. 2005;33:675–83.

    Article  CAS  PubMed  Google Scholar 

  10. Wang Z, Clark NR, Ma’ayan A. Drug-induced adverse events prediction with the LINCS L1000 data. Bioinformatics. 2016;32:2338–45.

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  11. Leo B. Random forests. Machine Learning 2001. p. 32.

  12. Touw WG, Bayjanov JR, Overmars L, Backus L, Boekhorst J, Wels M, et al. Data mining in the Life Sciences with Random Forest: a walk in the park or lost in the jungle? Brief Bioinform. 2013;14:315–26.

    Article  PubMed  Google Scholar 

  13. Oh TG, Kim SM, Caussy C, Fu T, Guo J, Bassirian S, et al. A universal gut-microbiome-derived signature predicts cirrhosis. Cell Metab. 2020;32:901.

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  14. Arumugam M, Raes J, Pelletier E, Le Paslier D, Yamada T, Mende DR, et al. Enterotypes of the human gut microbiome. Nature. 2011;473:174–80.

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  15. Moorthy K, Mohamad MS. Random forest for gene selection and microarray data classification. Bioinformation. 2011;7:142–6.

    Article  PubMed  PubMed Central  Google Scholar 

  16. Yu H, Chen J, Xu X, Li Y, Zhao H, Fang Y, et al. A systematic prediction of multiple drug-target interactions from chemical, genomic, and pharmacological data. PLoS One. 2012;7:e37608.

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  17. Moore JH, Asselbergs FW, Williams SM. Bioinformatics challenges for genome-wide association studies. Bioinformatics. 2010;26:445–55.

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  18. Diaz-Uriarte R, de Andres SA. Gene selection and classification of microarray data using random forest. BMC Bioinform. 2006;7:3.

  19. Cao DS, Liang YZ, Deng Z, Hu QN, He M, Xu QS, et al. Genome-scale screening of drug-target associations relevant to Ki using a chemogenomics approach. PLoS One. 2013;8:e57680.

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  20. Chen B, Sheridan RP, Hornak V, Voigt JH. Comparison of random forest and Pipeline Pilot Naïve Bayes in prospective QSAR predictions. J Chem Inform Model. 2012;52:792–803.

    Article  CAS  Google Scholar 

  21. Svetnik V, Liaw A, Tong C, Culberson JC, Sheridan RP, Feuston BP. Random forest: a classification and regression tool for compound classification and QSAR modeling. J Chem Inform Computer Sci. 2003;43:1947–58.

    Article  CAS  Google Scholar 

  22. Cano G, Garcia-Rodriguez J, Garcia-Garcia A, Perez-Sanchez H, Benediktsson JA, Thapa A, et al. Automatic selection of molecular descriptors using random forest: application to drug discovery. Exp Syst Appl. 2017;72:151–9.

    Article  Google Scholar 

  23. Raja K, Patrick M, Elder JT, Tsoi LC. Machine learning workflow to enhance predictions of Adverse Drug Reactions (ADRs) through drug-gene interactions: application to drugs for cutaneous diseases. Sci Rep. 2017;7:3690.

    Article  PubMed  PubMed Central  Google Scholar 

  24. Zhao X, Chen L, Guo ZH, Liu T. Predicting drug side effects with compact integration of heterogeneous networks. Curr Bioinform. 2019;14:709–20.

    Article  CAS  Google Scholar 

  25. Rossi A, Caro G, Fortuna MC, Pigliacelli F, D’Arino A, Carlesimo M. Prevention and treatment of chemotherapy-induced alopecia. Dermatol Pract Concept. 2020;10:e2020074.

    Article  PubMed  PubMed Central  Google Scholar 

  26. Haque E, Alabdaljabar MS, Ruddy KJ, Haddad TC, Thompson CA, Lehman JS, et al. Management of chemotherapy-induced alopecia (CIA): a comprehensive review and future directions. Crit Rev Oncol Hematol. 2020;156:103093.

    Article  PubMed  Google Scholar 

  27. Lim HK, Kim KM, Jeong SY, Choi EK, Jung J. Chrysin increases the therapeutic efficacy of docetaxel and mitigates docetaxel-induced edema. Integr Cancer Ther. 2017;16:496–504.

    Article  CAS  PubMed  Google Scholar 

  28. Schmitz KH, DiSipio T, Gordon LG, Hayes SC. Adverse breast cancer treatment effects: the economic case for making rehabilitative programs standard of care. Support Care Cancer. 2015;23:1807–17.

    Article  PubMed  Google Scholar 

  29. Norman SA, Localio AR, Potashnik SL, Simoes Torpey HA, Kallan MJ, Weber AL, et al. Lymphedema in breast cancer survivors: incidence, degree, time course, treatment, and symptoms. J Clin Oncol. 2009;27:390–7.

    Article  PubMed  PubMed Central  Google Scholar 

  30. Dean LT, Ransome Y, Frasso-Jaramillo L, Moss SL, Zhang Y, Ashing K, et al. Drivers of cost differences between US breast cancer survivors with or without lymphedema. J Cancer Surviv. 2019;13:804–14.

    Article  PubMed  PubMed Central  Google Scholar 

  31. Aoishi Y, Oura S, Nishiguchi H, Hirai Y, Miyasaka M, Kawaji M, et al. Risk factors for breast cancer-related lymphedema: correlation with docetaxel administration. Breast Cancer. 2020;27:929–37.

    Article  PubMed  Google Scholar 

  32. Stein A, Voigt W, Jordan K. Chemotherapy-induced diarrhea: pathophysiology, frequency and guideline-based management. Ther Adv Med Oncol. 2010;2:51–63.

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  33. Secombe KR, Van Sebille YZA, Mayo BJ, Coller JK, Gibson RJ, Bowen JM. Diarrhea induced by small molecule tyrosine kinase inhibitors compared with chemotherapy: potential role of the microbiome. Integr Cancer Ther. 2020;19:1534735420928493.

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  34. O’Reilly M, Mellotte G, Ryan B, O’Connor A. Gastrointestinal side effects of cancer treatments. Ther Adv Chronic Dis. 2020;11:2040622320970354.

    Article  PubMed  PubMed Central  Google Scholar 

  35. Dunnill CJ, Al-Tameemi W, Collett A, Haslam IS, Georgopoulos NT. A clinical and biological guide for understanding chemotherapy-induced alopecia and its prevention. Oncologist 2018;23:84–96.

    Article  PubMed  Google Scholar 

  36. Haslam IS, Smart E. Chemotherapy-induced hair loss: the use of biomarkers for predicting alopecic severity and treatment efficacy. Biomark Insights. 2019;14:1177271919842180.

    Article  PubMed  PubMed Central  Google Scholar 

  37. Kuhn M, Letunic I, Jensen LJ, Bork P. The SIDER database of drugs and side effects. Nucleic Acids Res. 2016;44:D1075–9.

    Article  CAS  PubMed  Google Scholar 

  38. Subramanian A, Narayan R, Corsello SM, Peck DD, Natoli TE, Lu X, et al. A next generation connectivity map: L1000 platform and the first 1,000,000 profiles. Cell 2017;171:1437–52. e17

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  39. Enache OM, Lahr DL, Natoli TE, Litichevskiy L, Wadden D, Flynn C, et al. The GCTx format and cmap{Py, R, M, J} packages: resources for optimized storage and integrated traversal of annotated dense matrices. Bioinformatics. 2019;35:1427–9.

    Article  CAS  PubMed  Google Scholar 

  40. Edgar R, Domrachev M, Lash AE. Gene Expression Omnibus: NCBI gene expression and hybridization array data repository. Nucleic Acids Res. 2002;30:207–10.

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  41. Ward Jr. JH. Hierarchical grouping to optimize an objective function. J Am Stat Assoc. 1963;58:236–244.

  42. Warnes GR, Bolker B, Bonebakker L, Gentleman R, Andy Liaw WH, Lumley T, et al. gplots: various R programming tools for plotting data. R package version 3.0.1.1 ed2019.

  43. Maechler M, Rousseeuw P, Struyf A, Hubert M, Hornik K. cluster: Cluster analysis basics and extensions. R package version 2.0.7-1 ed2018.

  44. Yu GC, Wang LG, Han YY, He QY. clusterProfiler: an R package for comparing biological themes among gene clusters. Omics—J Integr Biol. 2012;16:284–7.

    Article  CAS  Google Scholar 

  45. Yu GC, He QY. ReactomePA: an R/bioconductor package for reactome pathway analysis and visualization. Mol Biosyst. 2016;12:477–9.

    Article  CAS  PubMed  Google Scholar 

  46. Stark C, Breitkreutz B, Reguly T, Boucher L, Breitkreutz A, Tyers M. BioGRID: a general repository for interaction datasets. Nucleic Acids Res. 2006;34:D535–D9.

    Article  CAS  PubMed  Google Scholar 

  47. Shannon P, Markiel A, Ozier O, Baliga N, Wang J, Ramage D, et al. Cytoscape: a software environment for integrated models of biomolecular interaction networks. Genome Res. 2003;13:2498–504.

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  48. Bader G, Hogue C. An automated method for finding molecular complexes in large protein interaction networks. BMC Bioinform. 2003;4:2.

  49. Huang dW, Sherman BT, Lempicki RA. Systematic and integrative analysis of large gene lists using DAVID bioinformatics resources. Nat Protoc. 2009;4:44–57.

    Article  CAS  Google Scholar 

  50. Keenan AB, Torre D, Lachmann A, Leong AK, Wojciechowicz ML, Utti V, et al. ChEA3: transcription factor enrichment analysis by orthogonal omics integration. Nucleic Acids Res. 2019;47:W212–W24.

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  51. Huang BF, Boutros PC. The parameter sensitivity of random forests. BMC Bioinform. 2016;17:331.

    Article  Google Scholar 

  52. Feng CL, Chen HW, Yuan XQ, Sun MQ, Chu KX, Liu HQ, et al. Gene expression data based deep learning model for accurate prediction of drug-induced liver injury in advance. J Chem Inform Modeling. 2019;59:3240–50.

    Article  CAS  Google Scholar 

  53. Atias N, Sharan R. An algorithmic framework for predicting side effects of drugs. J Comput Biol. 2011;18:207–18.

    Article  CAS  PubMed  Google Scholar 

  54. Huang LC, Wu XG, Chen JY. Predicting adverse side effects of drugs. BMC Genom. 2011;12:S11.

  55. Nguyen PA, Born DA, Deaton AM, Nioi P, Ward LD. Phenotypes associated with genes encoding drug targets are predictive of clinical trial side effects. Nat Commun. 2019;10:1579.

    Article  PubMed  PubMed Central  Google Scholar 

  56. Wishart DS, Feunang YD, Guo AC, Lo EJ, Marcu A, Grant JR, et al. DrugBank 5.0: a major update to the DrugBank database for 2018. Nucleic Acids Res. 2018;46:D1074–D82.

    Article  CAS  PubMed  Google Scholar 

  57. Monti S, Tamayo P, Mesirov J, Golub T. Consensus clustering: a resampling-based method for class discovery and visualization of gene expression microarray data. Mach Learn. 2003;52:91–118.

    Article  Google Scholar 

  58. Rouillard AD, Gundersen GW, Fernandez NF, Wang Z, Monteiro CD, McDermott MG, et al. The harmonizome: a collection of processed datasets gathered to serve and mine knowledge about genes and proteins. Database. 2016;2016:baw100.

  59. Consortium U. UniProt: the universal protein knowledgebase in 2021. Nucleic Acids Res. 2021;49:D480–D9.

    Article  Google Scholar 

  60. Raudvere U, Kolberg L, Kuzmin I, Arak T, Adler P, Peterson H, et al. g:Profiler: a web server for functional enrichment analysis and conversions of gene lists (2019 update). Nucleic Acids Res. 2019;47:W191–W8.

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  61. Binns D, Dimmer E, Huntley R, Barrell D, O’Donovan C, Apweiler R. QuickGO: a web-based tool for Gene Ontology searching. Bioinformatics. 2009;25:3045–6.

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  62. Davis AP, Grondin CJ, Johnson RJ, Sciaky D, Wiegers J, Wiegers TC, et al. Comparative toxicogenomics database (CTD): update 2021. Nucleic Acids Res. 2020;49:D1138–D1143.

  63. Hofmann MA, Drury S, Fu C, Qu W, Taguchi A, Lu Y, et al. RAGE mediates a novel proinflammatory axis: a central cell surface receptor for S100/calgranulin polypeptides. Cell. 1999;97:889–901.

    Article  CAS  PubMed  Google Scholar 

  64. Body-Malapel M, Djouina M, Waxin C, Langlois A, Gower-Rousseau C, Zerbib P, et al. The RAGE signaling pathway is involved in intestinal inflammation and represents a promising therapeutic target for Inflammatory Bowel Diseases. Mucosal Immunol. 2019;12:468–78.

    Article  CAS  PubMed  Google Scholar 

  65. Anbazhagan AN, Priyamvada S, Alrefai WA, Dudeja PK. Pathophysiology of IBD associated diarrhea. Tissue Barriers. 2018;6:e1463897.

    Article  PubMed  PubMed Central  Google Scholar 

  66. Deng YN, Xia Z, Zhang P, Ejaz S, Liang S. Transcription factor RREB1: from target genes towards biological functions. Int J Biol Sci. 2020;16:1463–73.

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  67. Fisler DA, Sikaria D, Yavorski JM, Tu YN, Blanck G. Elucidating feed-forward apoptosis signatures in breast cancer datasets: Higher FOS expression associated with a better outcome. Oncol Lett. 2018;16:2757–63.

    PubMed  PubMed Central  Google Scholar 

  68. Orlova A, Wagner C, de Araujo ED, Bajusz D, Neubauer HA, Herling M, et al. Direct targeting options for STAT3 and STAT5 in cancer. Cancers. 2019;11:1930.

  69. Siddappa M, Wani SA, Long MD, Leach DA, Mathé EA, Bevan CL, et al. Identification of transcription factor co-regulators that drive prostate cancer progression. Sci Rep. 2020;10:20332.

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  70. Subramanya RD, Coda AB, Sinha AA. Transcriptional profiling in alopecia areata defines immune and cell cycle control related genes within disease-specific signatures. Genomics. 2010;96:146–53.

    Article  CAS  PubMed  Google Scholar 

  71. Coda AB, Qafalijaj Hysa V, Seiffert-Sinha K, Sinha AA. Peripheral blood gene expression in alopecia areata reveals molecular pathways distinguishing heritability, disease and severity. Genes Immun. 2010;11:531–41.

    Article  CAS  PubMed  Google Scholar 

  72. Arber N, Hibshoosh H, Yasui W, Neugut AI, Hibshoosh A, Yao Y, et al. Abnormalities in the expression of cell cycle-related proteins in tumors of the small bowel. Cancer Epidemiol Biomark Prev. 1999;8:1101–5.

    CAS  Google Scholar 

  73. Luo YR, Zhou ST, Yang L, Liu YP, Jiang SY, Dawuli Y, et al. Porcine epidemic diarrhoea virus induces cell-cycle arrest through the DNA Damage-signalling pathway. J Vet Res. 2020;64:25–32.

    Article  PubMed  PubMed Central  Google Scholar 

  74. Xu W, McArthur G. Cell cycle regulation and melanoma. Curr Oncol Rep. 2016;18:34.

    Article  PubMed  Google Scholar 

  75. Bodó E, van Beek N, Naumann V, Ohnemus U, Brzoska T, Abels C, et al. Modulation of chemotherapy-induced human hair follicle damage by 17-beta estradiol and prednisolone: potential stimulators of normal hair regrowth by “dystrophic catagen” promotion? J Invest Dermatol. 2009;129:506–9.

    Article  PubMed  Google Scholar 

  76. Escalante J, McQuade RM, Stojanovska V, Nurgali K. Impact of chemotherapy on gastrointestinal functions and the enteric nervous system. Maturitas. 2017;105:23–9.

    Article  PubMed  Google Scholar 

  77. Cassandri M, Smirnov A, Novelli F, Pitolli C, Agostini M, Malewicz M, et al. Zinc-finger proteins in health and disease. Cell Death Discov. 2017;3:17071.

    Article  PubMed  PubMed Central  Google Scholar 

  78. Katainen R, Dave K, Pitkänen E, Palin K, Kivioja T, Välimäki N, et al. CTCF/cohesin-binding sites are frequently mutated in cancer. Nat Genet. 2015;47:818–21.

    Article  CAS  PubMed  Google Scholar 

  79. Friedberg EC. How nucleotide excision repair protects against cancer. Nat Rev Cancer. 2001;1:22–33.

    Article  CAS  PubMed  Google Scholar 

  80. Alupei MC, Maity P, Esser PR, Krikki I, Tuorto F, Parlato R, et al. Loss of proteostasis is a pathomechanism in cockayne syndrome. Cell Rep. 2018;23:1612–9.

    Article  CAS  PubMed  Google Scholar 

  81. Goodall GJ, Wickramasinghe VO. RNA in cancer. Nat Rev Cancer. 2021;21:22–36.

    Article  CAS  PubMed  Google Scholar 

  82. Hanahan D, Weinberg RA. Hallmarks of cancer: the next generation. Cell 2011;144:646–74.

    Article  CAS  PubMed  Google Scholar 

  83. Prasad AS. Zinc deficiency in human subjects. Prog Clin Biol Res. 1983;129:1–33.

    CAS  PubMed  Google Scholar 

Download references

Acknowledgements

This study has been supported by TÜBİTAK (2209-A)-1919B011902354.

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Ozlem Ulucan.

Ethics declarations

Competing interests

The authors declare no competing interests.

Additional information

Publisher’s note Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Supplementary information

Rights and permissions

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Cakir, A., Tuncer, M., Taymaz-Nikerel, H. et al. Side effect prediction based on drug-induced gene expression profiles and random forest with iterative feature selection. Pharmacogenomics J 21, 673–681 (2021). https://doi.org/10.1038/s41397-021-00246-4

Download citation

  • Received:

  • Revised:

  • Accepted:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1038/s41397-021-00246-4

Search

Quick links