A guide to artificial intelligence for cancer researchers

Perez-Lopez, Raquel; Ghaffari Laleh, Narmin; Mahmood, Faisal; Kather, Jakob Nikolas

doi:10.1038/s41568-024-00694-7

Review Article
Published: 16 May 2024

A guide to artificial intelligence for cancer researchers

Nature Reviews Cancer volume 24, pages 427–441 (2024)Cite this article

5903 Accesses
186 Altmetric
Metrics details

Subjects

Abstract

Artificial intelligence (AI) has been commoditized. It has evolved from a specialty resource to a readily accessible tool for cancer researchers. AI-based tools can boost research productivity in daily workflows, but can also extract hidden information from existing data, thereby enabling new scientific discoveries. Building a basic literacy in these tools is useful for every cancer researcher. Researchers with a traditional biological science focus can use AI-based tools through off-the-shelf software, whereas those who are more computationally inclined can develop their own AI-based software pipelines. In this article, we provide a practical guide for non-computational cancer researchers to understand how AI-based tools can benefit them. We convey general principles of AI for applications in image analysis, natural language processing and drug discovery. In addition, we give examples of how non-computational researchers can get started on the journey to productively use AI in their own work.

Access through your institution

Buy or subscribe

This is a preview of subscription content, access via your institution

Access options

Access through your institution

Buy this article

Purchase on Springer Link
Instant access to full article PDF

Buy now

Prices may be subject to local taxes which are calculated during checkout

**Fig. 1: AI workflows in cancer research.**

**Fig. 2: Development from simple, specialized, shallow models to deep, multimodal, generalist models for computer vision.**

**Fig. 3: Text-based hypothetical AI workflows in cancer research.**

Artificial intelligence in histopathology: enhancing cancer research and clinical oncology

Article 22 September 2022

Guiding principles for the responsible development of artificial intelligence tools for healthcare

Article Open access 01 April 2023

AI in health and medicine

Article 20 January 2022

References

Jiang, T., Gradus, J. L. & Rosellini, A. J. Supervised machine learning: a brief primer. Behav. Ther. 51, 675–687 (2020).
Article PubMed PubMed Central Google Scholar
Alloghani, M., Al-Jumeily, D., Mustafina, J., Hussain, A. & Aljaaf, A. J. in Supervised and Unsupervised Learning for Data Science (eds Berry, M. W. et al.) 3–21 (Springer International, 2020).
Yala, A. et al. Optimizing risk-based breast cancer screening policies with reinforcement learning. Nat. Med. 28, 136–143 (2022).
Article CAS PubMed Google Scholar
Kaufmann, E. et al. Champion-level drone racing using deep reinforcement learning. Nature 620, 982–987 (2023).
Article CAS PubMed PubMed Central Google Scholar
Nasteski, V. An overview of the supervised machine learning methods. Horizons 4, 51–62 (2017).
Article Google Scholar
Dike, H. U., Zhou, Y., Deveerasetty, K. K. & Wu, Q. Unsupervised learning based on artificial neural network: a review. In 2018 IEEE International Conference on Cyborg and Bionic Systems (CBS) 322–327 (2018).
Shurrab, S. & Duwairi, R. Self-supervised learning methods and applications in medical imaging analysis: a survey. PeerJ Comput. Sci. 8, e1045 (2022).
Article PubMed PubMed Central Google Scholar
Wang, X. et al. Transformer-based unsupervised contrastive learning for histopathological image classification. Med. Image Anal. 81, 102559 (2022).
Article PubMed Google Scholar
Wang, X. et al. RetCCL: clustering-guided contrastive learning for whole-slide image retrieval. Med. Image Anal. 83, 102645 (2023).
Article PubMed Google Scholar
Vinyals, O. et al. Grandmaster level in StarCraft II using multi-agent reinforcement learning. Nature 575, 350–354 (2019).
Article CAS PubMed Google Scholar
Zhao, Y., Kosorok, M. R. & Zeng, D. Reinforcement learning design for cancer clinical trials. Stat. Med. 28, 3294–3315 (2009).
Article PubMed PubMed Central Google Scholar
Sapsford, R. & Jupp, V. Data Collection and Analysis (SAGE, 2006).
Yamashita, R., Nishio, M., Do, R. K. G. & Togashi, K. Convolutional neural networks: an overview and application in radiology. Insights Imaging 9, 611–629 (2018).
Article PubMed PubMed Central Google Scholar
Chowdhary, K. R. in Fundamentals of Artificial Intelligence (ed. Chowdhary, K. R.) 603–649 (Springer India, 2020).
Hochreiter, S. & Schmidhuber, J. Long short-term memory. Neural Comput. 9, 1735–1780 (1997).
Article CAS PubMed Google Scholar
Vaswani, A. et al. Attention is all you need. Preprint at https://doi.org/10.48550/arXiv.1706.03762 (2017).
Shmatko, A., Ghaffari Laleh, N., Gerstung, M. & Kather, J. N. Artificial intelligence in histopathology: enhancing cancer research and clinical oncology. Nat. Cancer 3, 1026–1038 (2022).
Article PubMed Google Scholar
Wagner, S. J. et al. Transformer-based biomarker prediction from colorectal cancer histology: a large-scale multicentric study. Cancer Cell 41, 1650–1661.e4 (2023).
Article CAS PubMed PubMed Central Google Scholar
Khan, A. et al. A survey of the vision transformers and their CNN-transformer based variants. Artif. Intell. Rev. 56, 2917–2970 (2023).
Article Google Scholar
Hamm, C. A. et al. Deep learning for liver tumor diagnosis part I: development of a convolutional neural network classifier for multi-phasic MRI. Eur. Radiol. 29, 3338–3347 (2019).
Article PubMed PubMed Central Google Scholar
Ren, J., Eriksen, J. G., Nijkamp, J. & Korreman, S. S. Comparing different CT, PET and MRI multi-modality image combinations for deep learning-based head and neck tumor segmentation. Acta Oncol. 60, 1399–1406 (2021).
Article CAS PubMed Google Scholar
Unger, M. & Kather, J. N. A systematic analysis of deep learning in genomics and histopathology for precision oncology. BMC Med. Genomics 17, 48 (2024).
Article PubMed PubMed Central Google Scholar
Gawehn, E., Hiss, J. A. & Schneider, G. Deep learning in drug discovery. Mol. Inform. 35, 3–14 (2016).
Article CAS PubMed Google Scholar
Bayramoglu, N., Kannala, J. & Heikkilä, J. Deep learning for magnification independent breast cancer histopathology image classification. In 2016 23rd International Conference on Pattern Recognition (ICPR) 2440–2445 (IEEE, 2016).
Galon, J. et al. Type, density, and location of immune cells within human colorectal tumors predict clinical outcome. Science 313, 1960–1964 (2006).
Article CAS PubMed Google Scholar
Schmidt, U., Weigert, M., Broaddus, C. & Myers, G. Cell detection with star-convex polygons. In Medical Image Computing and Computer Assisted Intervention — MICCAI 2018. Lecture Notes in Computer Science Vol. 11071 (eds Frangi, A. et al.) https://doi.org/10.1007/978-3-030-00934-2_30 (Springer, 2018).
Edlund, C. et al. LIVECell—a large-scale dataset for label-free live cell segmentation. Nat. Methods 18, 1038–1045 (2021).
Article CAS PubMed PubMed Central Google Scholar
Bankhead, P. et al. QuPath: open source software for digital pathology image analysis. Sci. Rep. 7, 16878 (2017).
Article PubMed PubMed Central Google Scholar
Schneider, C. A., Rasband, W. S. & Eliceiri, K. W. NIH image to imageJ: 25 years of image analysis. Nat. Methods 9, 671–675 (2012).
Article CAS PubMed PubMed Central Google Scholar
Rueden, C. T. et al. ImageJ2: ImageJ for the next generation of scientific image data. BMC Bioinformatics 18, 529 (2017).
Article PubMed PubMed Central Google Scholar
Schindelin, J. et al. Fiji: an open-source platform for biological-image analysis. Nat. Methods 9, 676–682 (2012).
Article CAS PubMed Google Scholar
Linkert, M. et al. Metadata matters: access to image data in the real world. J. Cell Biol. 189, 777–782 (2010).
Article CAS PubMed PubMed Central Google Scholar
Gómez-de-Mariscal, E. et al. DeepImageJ: a user-friendly environment to run deep learning models in ImageJ. Nat. Methods 18, 1192–1195 (2021).
Article PubMed Google Scholar
Betge, J. et al. The drug-induced phenotypic landscape of colorectal cancer organoids. Nat. Commun. 13, 3135 (2022).
Article CAS PubMed PubMed Central Google Scholar
Park, T. et al. Development of a deep learning based image processing tool for enhanced organoid analysis. Sci. Rep. 13, 19841 (2023).
Article CAS PubMed PubMed Central Google Scholar
Belthangady, C. & Royer, L. A. Applications, promises, and pitfalls of deep learning for fluorescence image reconstruction. Nat. Methods 16, 1215–1225 (2019).
Article CAS PubMed Google Scholar
Echle, A. et al. Deep learning in cancer pathology: a new generation of clinical biomarkers. Br. J. Cancer 124, 686–696 (2021).
Article PubMed Google Scholar
Cifci, D., Foersch, S. & Kather, J. N. Artificial intelligence to identify genetic alterations in conventional histopathology. J. Pathol. 257, 430–444 (2022).
Article PubMed Google Scholar
Greenson, J. K. et al. Pathologic predictors of microsatellite instability in colorectal cancer. Am. J. Surg. Pathol. 33, 126–133 (2009).
Article PubMed PubMed Central Google Scholar
Kather, J. N. et al. Deep learning can predict microsatellite instability directly from histology in gastrointestinal cancer. Nat. Med. 25, 1054–1056 (2019).
Article CAS PubMed PubMed Central Google Scholar
Echle, A. et al. Clinical-grade detection of microsatellite instability in colorectal tumors by deep learning. Gastroenterology 159, 1406–1416.e11 (2020).
Article CAS PubMed Google Scholar
Kather, J. N. et al. Pan-cancer image-based detection of clinically actionable genetic alterations. Nat. Cancer 1, 789–799 (2020).
Article CAS PubMed PubMed Central Google Scholar
Fu, Y. et al. Pan-cancer computational histopathology reveals mutations, tumor composition and prognosis. Nat. Cancer 1, 800–810 (2020).
Article CAS PubMed Google Scholar
Coudray, N. et al. Classification and mutation prediction from non-small cell lung cancer histopathology images using deep learning. Nat. Med. 24, 1559–1567 (2018).
Article CAS PubMed PubMed Central Google Scholar
Schmauch, B. et al. A deep learning model to predict RNA-seq expression of tumours from whole slide images. Nat. Commun. 11, 3877 (2020).
Article CAS PubMed PubMed Central Google Scholar
Binder, A. et al. Morphological and molecular breast cancer profiling through explainable machine learning. Nat. Mach. Intell. 3, 355–366 (2021).
Article Google Scholar
Loeffler, C. M. L. et al. Predicting mutational status of driver and suppressor genes directly from histopathology with deep learning: a systematic study across 23 solid tumor types. Front. Genet. 12, 806386 (2022).
Article PubMed PubMed Central Google Scholar
Chen, R. J. et al. Pan-cancer integrative histology-genomic analysis via multimodal deep learning. Cancer Cell 40, 865–878.e6 (2022).
Article CAS PubMed PubMed Central Google Scholar
Bilal, M. et al. Development and validation of a weakly supervised deep learning framework to predict the status of molecular pathways and key mutations in colorectal cancer from routine histology images: a retrospective study. Lancet Digit. Health 3, e763–e772 (2021).
Article CAS PubMed PubMed Central Google Scholar
Yamashita, R. et al. Deep learning model for the prediction of microsatellite instability in colorectal cancer: a diagnostic study. Lancet Oncol. 22, 132–141 (2021).
Article PubMed Google Scholar
Echle, A. et al. Artificial intelligence for detection of microsatellite instability in colorectal cancer—a multicentric analysis of a pre-screening tool for clinical application. ESMO Open 7, 100400 (2022).
Article CAS PubMed PubMed Central Google Scholar
Schirris, Y., Gavves, E., Nederlof, I., Horlings, H. M. & Teuwen, J. DeepSMILE: contrastive self-supervised pre-training benefits MSI and HRD classification directly from H&E whole-slide images in colorectal and breast cancer. Med. Image Anal. 79, 102464 (2022).
Article PubMed Google Scholar
Jain, M. S. & Massoud, T. F. Predicting tumour mutational burden from histopathological images using multiscale deep learning. Nat. Mach. Intell. 2, 356–362 (2020).
Article Google Scholar
Xu, H. et al. Spatial heterogeneity and organization of tumor mutation burden with immune infiltrates within tumors based on whole slide images correlated with patient survival in bladder cancer. J. Pathol. Inform. 13, 100105 (2022).
Article PubMed PubMed Central Google Scholar
Chen, S. et al. Deep learning-based approach to reveal tumor mutational burden status from whole slide images across multiple cancer types. Preprint at https://doi.org/10.48550/arXiv.2204.03257 (2023).
Shamai, G. et al. Artificial intelligence algorithms to assess hormonal status from tissue microarrays in patients with breast cancer. JAMA Netw. Open 2, e197700 (2019).
Article PubMed PubMed Central Google Scholar
Beck, A. H. et al. Systematic analysis of breast cancer morphology uncovers stromal features associated with survival. Sci. Transl. Med. 3, 108ra113 (2011).
Article PubMed Google Scholar
Arslan, S. et al. A systematic pan-cancer study on deep learning-based prediction of multi-omic biomarkers from routine pathology images. Commun. Med. 4, 48 (2024).
Article CAS PubMed PubMed Central Google Scholar
Campanella, G. et al. Clinical-grade computational pathology using weakly supervised deep learning on whole slide images. Nat. Med. 25, 1301–1309 (2019).
Article CAS PubMed PubMed Central Google Scholar
Lu, M. Y. et al. AI-based pathology predicts origins for cancers of unknown primary. Nature 594, 106–110 (2021).
Article CAS PubMed Google Scholar
Kleppe, A. et al. A clinical decision support system optimising adjuvant chemotherapy for colorectal cancers by integrating deep learning and pathological staging markers: a development and validation study. Lancet Oncol. 23, 1221–1232 (2022).
Article CAS PubMed Google Scholar
Jiang, X. et al. End-to-end prognostication in colorectal cancer by deep learning: a retrospective, multicentre study. Lancet Digit. Health 6, e33–e43 (2024).
Article CAS PubMed Google Scholar
Zeng, Q. et al. Artificial intelligence-based pathology as a biomarker of sensitivity to atezolizumab–bevacizumab in patients with hepatocellular carcinoma: a multicentre retrospective study. Lancet Oncol. 24, 1411–1422 (2023).
Article CAS PubMed Google Scholar
Ghaffari Laleh, N., Ligero, M., Perez-Lopez, R. & Kather, J. N. Facts and hopes on the use of artificial intelligence for predictive immunotherapy biomarkers in cancer. Clin. Cancer Res. 29, 316–323 (2022).
Article Google Scholar
Pedersen, A. et al. FastPathology: an open-source platform for deep learning-based research and decision support in digital pathology. IEEE Access 9, 58216–58229 (2021).
Article Google Scholar
Pocock, J. et al. TIAToolbox as an end-to-end library for advanced tissue image analytics. Commun. Med. 2, 120 (2022).
Article PubMed PubMed Central Google Scholar
Lu, M. Y. et al. Data-efficient and weakly supervised computational pathology on whole-slide images. Nat. Biomed. Eng. 5, 555–570 (2021).
Article PubMed PubMed Central Google Scholar
El Nahhas, O. S. M. et al. From whole-slide image to biomarker prediction: a protocol for end-to-end deep learning in computational pathology. Preprint at https://doi.org/10.48550/arXiv.2312.10944 (2023).
Paszke, A. et al. PyTorch: an imperative style, high-performance deep learning library. Preprint at https://doi.org/10.48550/arXiv.1912.01703 (2019).
Jorge Cardoso, M. et al. MONAI: an open-source framework for deep learning in healthcare. Preprint at https://doi.org/10.48550/arXiv.2211.02701 (2022).
Goode, A., Gilbert, B., Harkes, J., Jukic, D. & Satyanarayanan, M. OpenSlide: a vendor-neutral software foundation for digital pathology. J. Pathol. Inform. 4, 27 (2013).
Article PubMed PubMed Central Google Scholar
Martinez, K. & Cupitt, J. VIPS—a highly tuned image processing software architecture. In IEEE Int.Conf. Image Processing 2005; https://doi.org/10.1109/icip.2005.1530120 (2005).
Dolezal, J. M. et al. Deep learning generates synthetic cancer histology for explainability and education. NPJ Precis. Oncol. 7, 49 (2023).
Article PubMed PubMed Central Google Scholar
Plass, M. et al. Explainability and causability in digital pathology. Hip Int. 9, 251–260 (2023).
Google Scholar
Reis-Filho, J. S. & Kather, J. N. Overcoming the challenges to implementation of artificial intelligence in pathology. J. Natl Cancer Inst. 115, 608–612 (2023).
Article PubMed PubMed Central Google Scholar
Aggarwal, R. et al. Diagnostic accuracy of deep learning in medical imaging: a systematic review and meta-analysis. NPJ Digit. Med. 4, 65 (2021).
Article PubMed PubMed Central Google Scholar
Rajput, D., Wang, W.-J. & Chen, C.-C. Evaluation of a decided sample size in machine learning applications. BMC Bioinformatics 24, 48 (2023).
Article PubMed PubMed Central Google Scholar
Ligero, M. et al. Minimizing acquisition-related radiomics variability by image resampling and batch effect correction to allow for large-scale data analysis. Eur. Radiol. 31, 1460–1470 (2021).
Article PubMed Google Scholar
Zwanenburg, A. et al. The image biomarker standardization initiative: standardized quantitative radiomics for high-throughput image-based phenotyping. Radiology 295, 328–338 (2020).
Article PubMed Google Scholar
van Griethuysen, J. J. M. et al. Computational radiomics system to decode the radiographic phenotype. Cancer Res. 77, e104–e107 (2017).
Article PubMed PubMed Central Google Scholar
Fedorov, A. et al. 3D Slicer as an image computing platform for the quantitative imaging network. Magn. Reson. Imaging 30, 1323–1341 (2012).
Article PubMed PubMed Central Google Scholar
Yushkevich, P. A. et al. User-guided 3D active contour segmentation of anatomical structures: significantly improved efficiency and reliability. Neuroimage 31, 1116–1128 (2006).
Article PubMed Google Scholar
Khader, F. et al. Multimodal deep learning for integrating chest radiographs and clinical parameters: a case for transformers. Radiology 309, e230806 (2023).
Article PubMed Google Scholar
Yu, A. C., Mohajer, B. & Eng, J. External validation of deep learning algorithms for radiologic diagnosis: a systematic review. Radiol. Artif. Intell. 4, e210064 (2022).
Article PubMed PubMed Central Google Scholar
US FDA. Artificial intelligence and machine learning (AI/ML)-enabled medical devices; https://www.fda.gov/medical-devices/software-medical-device-samd/artificial-intelligence-and-machine-learning-aiml-enabled-medical-devices (2023).
Bruker Corporation. Artificial intelligence in NMR; https://www.bruker.com/en/landingpages/bbio/artificial-intelligence-in-nmr.html (2024).
Wasserthal, J. TotalSegmentator: tool for robust segmentation of 104 important anatomical structures in CT images. GitHub https://doi.org/10.5281/zenodo.6802613 (2023).
Garcia-Ruiz, A. et al. An accessible deep learning tool for voxel-wise classification of brain malignancies from perfusion MRI. Cell Rep. Med. 5, 101464 (2024).
Article PubMed PubMed Central Google Scholar
Lång, K. et al. Artificial intelligence-supported screen reading versus standard double reading in the Mammography Screening with Artificial Intelligence trial (MASAI): a clinical safety analysis of a randomised, controlled, non-inferiority, single-blinded, screening accuracy study. Lancet Oncol. 24, 936–944 (2023).
Article PubMed Google Scholar
Bera, K., Braman, N., Gupta, A., Velcheti, V. & Madabhushi, A. Predicting cancer outcomes with radiomics and artificial intelligence in radiology. Nat. Rev. Clin. Oncol. 19, 132–146 (2022).
Article CAS PubMed Google Scholar
Núñez, L. M. et al. Unraveling response to temozolomide in preclinical GL261 glioblastoma with MRI/MRSI using radiomics and signal source extraction. Sci. Rep. 10, 19699 (2020).
Article PubMed PubMed Central Google Scholar
Müller, J. et al. Radiomics-based tumor phenotype determination based on medical imaging and tumor microenvironment in a preclinical setting. Radiother. Oncol. 169, 96–104 (2022).
Article PubMed Google Scholar
Amirrashedi, M. et al. Leveraging deep neural networks to improve numerical and perceptual image quality in low-dose preclinical PET imaging. Comput. Med. Imaging Graph. 94, 102010 (2021).
Article PubMed Google Scholar
Zinn, P. O. et al. A coclinical radiogenomic validation study: conserved magnetic resonance radiomic appearance of periostin-expressing glioblastoma in patients and xenograft models. Clin. Cancer Res. 24, 6288–6299 (2018).
Article CAS PubMed PubMed Central Google Scholar
Lin, Y.-C. et al. Diffusion radiomics analysis of intratumoral heterogeneity in a murine prostate cancer model following radiotherapy: pixelwise correlation with histology. J. Magn. Reson. Imaging 46, 483–489 (2017).
Article PubMed Google Scholar
Moor, M. et al. Foundation models for generalist medical artificial intelligence. Nature 616, 259–265 (2023).
Article CAS PubMed Google Scholar
Chen, R. J. et al. Towards a general-purpose foundation model for computational pathology. Nat. Med. 30, 850–862 (2024).
Article CAS PubMed Google Scholar
Unger, M. & Kather, J. N. Deep learning in cancer genomics and histopathology. Genome Med. 16, 44 (2024).
Article PubMed PubMed Central Google Scholar
Zhou, Y. et al. A foundation model for generalizable disease detection from retinal images. Nature 622, 156–163 (2023).
Article CAS PubMed PubMed Central Google Scholar
Filiot, A. et al. Scaling self-supervised learning for histopathology with masked image modeling. Preprint at bioRxiv https://doi.org/10.1101/2023.07.21.23292757 (2023).
Campanella, G. et al. Computational pathology at health system scale—self-supervised foundation models from three billion images. Preprint at https://doi.org/10.48550/arXiv.2310.07033 (2023).
Vorontsov, E. et al. Virchow: a million-slide digital pathology foundation model. Preprint at https://doi.org/10.48550/arXiv.2309.07778 (2023).
Clusmann, J. et al. The future landscape of large language models in medicine. Commun. Med. 3, 141 (2023).
Article PubMed PubMed Central Google Scholar
Bubeck, S. et al. Sparks of artificial general intelligence: early experiments with GPT-4. Preprint at https://doi.org/10.48550/arXiv.2303.12712 (2023).
Truhn, D., Reis-Filho, J. S. & Kather, J. N. Large language models should be used as scientific reasoning engines, not knowledge databases. Nat. Med. 29, 2983–2984 (2023).
Article CAS PubMed Google Scholar
Adams, L. C. et al. Leveraging GPT-4 for post hoc transformation of free-text radiology reports into structured reporting: a multilingual feasibility study. Radiology 307, e230725 (2023).
Article PubMed Google Scholar
Truhn, D. et al. Extracting structured information from unstructured histopathology reports using generative pre-trained transformer 4 (GPT-4). J. Pathol. 262, 310–319 (2023).
Article PubMed Google Scholar
Wiest, I. C. et al. From text to tables: a local privacy preserving large language model for structured information retrieval from medical documents. Preprint at bioRxiv https://doi.org/10.1101/2023.12.07.23299648 (2023).
Singhal, K. et al. Large language models encode clinical knowledge. Nature 620, 172–180 (2023).
Article CAS PubMed PubMed Central Google Scholar
Truhn, D. et al. A pilot study on the efficacy of GPT-4 in providing orthopedic treatment recommendations from MRI reports. Sci. Rep. 13, 20159 (2023).
Article CAS PubMed PubMed Central Google Scholar
Wang, H. et al. Scientific discovery in the age of artificial intelligence. Nature 620, 47–60 (2023).
Article CAS PubMed Google Scholar
Derraz, B. et al. New regulatory thinking is needed for AI-based personalised drug and cell therapies in precision oncology. NPJ Precis. Oncol. https://doi.org/10.1038/s41698-024-00517-w (2024).
Extance, A. ChatGPT has entered the classroom: how LLMs could transform education. Nature 623, 474–477 (2023).
Article CAS PubMed Google Scholar
Thirunavukarasu, A. J. et al. Large language models in medicine. Nat. Med. 29, 1930–1940 (2023).
Article CAS PubMed Google Scholar
Webster, P. Six ways large language models are changing healthcare. Nat. Med. 29, 2969–2971 (2023).
Article CAS PubMed Google Scholar
Krishnan, R., Rajpurkar, P. & Topol, E. J. Self-supervised learning in medicine and healthcare. Nat. Biomed. Eng. 6, 1346–1352 (2022).
Article PubMed Google Scholar
Meskó, B. Prompt engineering as an important emerging skill for medical professionals: tutorial. J. Med. Internet Res. 25, e50638 (2023).
Article PubMed PubMed Central Google Scholar
Sushil, M. et al. CORAL: expert-curated oncology reports to advance language model inference. NEJM AI 1, 4 (2024).
Article Google Scholar
Brown, T. B. et al. Language models are few-shot learners. Preprint at https://doi.org/10.48550/arXiv.2005.01416 (2020).
Ferber, D. & Kather, J. N. Large language models in uro-oncology. Eur. Urol. Oncol. 7, 157–159 (2023).
Article PubMed Google Scholar
Jiang, L. Y. et al. Health system-scale language models are all-purpose prediction engines. Nature 619, 357–362 (2023).
Article CAS PubMed PubMed Central Google Scholar
Nori, H. et al. Can generalist foundation models outcompete special-purpose tuning? Case study in medicine. Preprint at https://doi.org/10.48550/arXiv.2311.16452 (2023).
Balaguer, A. et al. RAG vs fine-tuning: pipelines, tradeoffs, and a case study on agriculture. Preprint at https://doi.org/10.48550/arXiv.2401.08406 (2024).
Gemini Team et al. Gemini: a family of highly capable multimodal models. Preprint at https://doi.org/10.48550/arXiv.2312.11805 (2023).
Tisman, G. & Seetharam, R. OpenAI’s ChatGPT-4, BARD and YOU.Com (AI) and the cancer patient, for now, caveat emptor, but stay tuned. Digit. Med. Healthc. Technol. https://doi.org/10.5772/dmht.19 (2023).
Touvron, H. et al. LLaMA: open and efficient foundation language models. Preprint at https://doi.org/10.48550/arXiv.2302.13971 (2023).
Lipkova, J. et al. Artificial intelligence for multimodal data integration in oncology. Cancer Cell 40, 1095–1110 (2022).
Article CAS PubMed PubMed Central Google Scholar
Niehues, J. M. et al. Generalizable biomarker prediction from cancer pathology slides with self-supervised deep learning: a retrospective multi-centric study. Cell Rep. Med. 4, 100980 (2023).
Article CAS PubMed PubMed Central Google Scholar
Foersch, S. et al. Multistain deep learning for prediction of prognosis and therapy response in colorectal cancer. Nat. Med. 29, 430–439 (2023).
Article CAS PubMed Google Scholar
Boehm, K. M. et al. Multimodal data integration using machine learning improves risk stratification of high-grade serous ovarian cancer. Nat. Cancer 3, 723–733 (2022).
Article CAS PubMed PubMed Central Google Scholar
Vanguri, R. et al. Multimodal integration of radiology, pathology and genomics for prediction of response to PD-(L)1 blockade in patients with non-small cell lung cancer. Nat. Cancer 3, 1151–1164 (2022).
Article CAS PubMed PubMed Central Google Scholar
Shifai, N., van Doorn, R., Malvehy, J. & Sangers, T. E. Can ChatGPT vision diagnose melanoma? An exploratory diagnostic accuracy study. J. Am. Acad. Dermatol. 90, 1057–1059 (2024).
Article PubMed Google Scholar
Liu, H., Li, C., Wu, Q. & Lee, Y. J. Visual instruction tuning. Preprint at https://doi.org/10.48550/arXiv.2304.08485 (2023).
Li, C. et al. LLaVA-med: training a large language-and-vision assistant for biomedicine in one day. Preprint at https://doi.org/10.48550/arXiv.2306.00890 (2023).
Lu, M. Y. et al. A foundational multimodal vision language AI assistant for human pathology. Preprint at https://doi.org/10.48550/arXiv.2312.07814 (2023).
Adalsteinsson, V. A. et al. Scalable whole-exome sequencing of cell-free DNA reveals high concordance with metastatic tumors. Nat. Commun. 8, 1324 (2017).
Article PubMed PubMed Central Google Scholar
Zhang, Z. et al. Uniform genomic data analysis in the NCI Genomic Data Commons. Nat. Commun. 12, 1226 (2021).
Article CAS PubMed PubMed Central Google Scholar
Vega, D. M. et al. Aligning tumor mutational burden (TMB) quantification across diagnostic platforms: phase II of the Friends of Cancer Research TMB Harmonization Project. Ann. Oncol. 32, 1626–1636 (2021).
Article CAS PubMed Google Scholar
Anaya, J., Sidhom, J.-W., Mahmood, F. & Baras, A. S. Multiple-instance learning of somatic mutations for the classification of tumour type and the prediction of microsatellite status. Nat. Biomed. Eng. 8, 57–67 (2023).
Article PubMed PubMed Central Google Scholar
Chen, B. et al. Predicting HLA class II antigen presentation through integrated deep learning. Nat. Biotechnol. 37, 1332–1343 (2019).
Article CAS PubMed PubMed Central Google Scholar
Jumper, J. et al. Highly accurate protein structure prediction with AlphaFold. Nature 596, 583–589 (2021).
Article CAS PubMed PubMed Central Google Scholar
Callaway, E. What’s next for AlphaFold and the AI protein-folding revolution. Nature 604, 234–238 (2022).
Article CAS PubMed Google Scholar
Cheng, J. et al. Accurate proteome-wide missense variant effect prediction with AlphaMissense. Science 381, eadg7492 (2023).
Article CAS PubMed Google Scholar
Barrio-Hernandez, I. et al. Clustering predicted structures at the scale of the known protein universe. Nature 622, 637–645 (2023).
Article CAS PubMed PubMed Central Google Scholar
Yang, X., Wang, Y., Byrne, R., Schneider, G. & Yang, S. Concepts of artificial intelligence for computer-assisted drug discovery. Chem. Rev. 119, 10520–10594 (2019).
Article CAS PubMed Google Scholar
Mullowney, M. W. et al. Artificial intelligence for natural product drug discovery. Nat. Rev. Drug Discov. 22, 895–916 (2023).
Article CAS PubMed Google Scholar
Jayatunga, M. K. P., Xie, W., Ruder, L., Schulze, U. & Meier, C. AI in small-molecule drug discovery: a coming wave? Nat. Rev. Drug Discov. 21, 175–176 (2022).
Article CAS PubMed Google Scholar
Vert, J.-P. How will generative AI disrupt data science in drug discovery? Nat. Biotechnol. 41, 750–751 (2023).
Article CAS PubMed Google Scholar
Wong, F. et al. Discovery of a structural class of antibiotics with explainable deep learning. Nature 626, 177–185 (2023).
Article PubMed Google Scholar
Swanson, K. et al. Generative AI for designing and validating easily synthesizable and structurally novel antibiotics. Nat. Mach. Intell. 6, 338–353 (2024).
Article Google Scholar
Janizek, J. D. et al. Uncovering expression signatures of synergistic drug responses via ensembles of explainable machine-learning models. Nat. Biomed. Eng. 7, 811–829 (2023).
Article CAS PubMed Google Scholar
Savage, N. Drug discovery companies are customizing ChatGPT: here’s how. Nat. Biotechnol. 41, 585–586 (2023).
Article CAS PubMed Google Scholar
Boiko, D. A., MacKnight, R., Kline, B. & Gomes, G. Autonomous chemical research with large language models. Nature 624, 570–578 (2023).
Article CAS PubMed PubMed Central Google Scholar
Arnold, C. AlphaFold touted as next big thing for drug discovery—but is it? Nature 622, 15–17 (2023).
Article CAS PubMed Google Scholar
Mock, M., Edavettal, S., Langmead, C. & Russell, A. AI can help to speed up drug discovery—but only if we give it the right data. Nature 621, 467–470 (2023).
Article CAS PubMed Google Scholar
AI’s potential to accelerate drug discovery needs a reality check. Nature 622, 217 (2023).
Upswing in AI drug-discovery deals. Nat. Biotechnol. 41, 1361 (2023).
Hutson, M. AI for drug discovery is booming, but who owns the patents? Nat. Biotechnol. 41, 1494–1496 (2023).
Article CAS PubMed Google Scholar
Wong, C. H., Siah, K. W. & Lo, A. W. Estimation of clinical trial success rates and related parameters. Biostatistics 20, 273–286 (2019).
Article PubMed Google Scholar
Subbiah, V. The next generation of evidence-based medicine. Nat. Med. 29, 49–58 (2023).
Article CAS PubMed Google Scholar
Yuan, C. et al. Criteria2Query: a natural language interface to clinical databases for cohort definition. J. Am. Med. Inform. Assoc. 26, 294–305 (2019).
Article PubMed PubMed Central Google Scholar
Lu, L., Dercle, L., Zhao, B. & Schwartz, L. H. Deep learning for the prediction of early on-treatment response in metastatic colorectal cancer from serial medical imaging. Nat. Commun. 12, 6654 (2021).
Article CAS PubMed PubMed Central Google Scholar
Trebeschi, S. et al. Prognostic value of deep learning-mediated treatment monitoring in lung cancer patients receiving immunotherapy. Front. Oncol. 11, 609054 (2021).
Article CAS PubMed PubMed Central Google Scholar
Castelo-Branco, L. et al. ESMO guidance for reporting oncology real-world evidence (GROW). Ann. Oncol. 34, 1097–1112 (2023).
Article CAS PubMed Google Scholar
Morin, O. et al. An artificial intelligence framework integrating longitudinal electronic health records with real-world data enables continuous pan-cancer prognostication. Nat. Cancer 2, 709–722 (2021).
Article PubMed Google Scholar
Yang, X. et al. A large language model for electronic health records. NPJ Digit. Med. 5, 194 (2022).
Article PubMed PubMed Central Google Scholar
Huang, X., Rymbekova, A., Dolgova, O., Lao, O. & Kuhlwilm, M. Harnessing deep learning for population genetic inference. Nat. Rev. Genet. 25, 61–78 (2024).
Article CAS PubMed Google Scholar
Pawlicki, Lee, D.-S., Hull & Srihari. Neural network models and their application to handwritten digit recognition. In IEEE 1988 Int. Conf. Neural Networks (eds Pawlicki, T. F. et al.) 63–70 (1988).
Chui, M. et al. The economic potential of generative AI: the next productivity frontier. McKinsey https://www.mckinsey.com/capabilities/mckinsey-digital/our-insights/the-economic-potential-of-generative-ai-the-next-productivity-frontier (2023).
Dell’Acqua, F. et al. Navigating the jagged technological frontier: field experimental evidence of the effects of AI on knowledge worker productivity and quality. Harvard Business School https://www.hbs.edu/ris/Publication%20Files/24-013_d9b45b68-9e74-42d6-a1c6-c72fb70c7282.pdf (2023).
Boehm, K. M., Khosravi, P., Vanguri, R., Gao, J. & Shah, S. P. Harnessing multimodal data integration to advance precision oncology. Nat. Rev. Cancer 22, 114–126 (2022).
Article CAS PubMed Google Scholar
Gilbert, S., Harvey, H., Melvin, T., Vollebregt, E. & Wicks, P. Large language model AI chatbots require approval as medical devices. Nat. Med. 29, 2396–2398 (2023).
Article CAS PubMed Google Scholar
Mobadersany, P. et al. Predicting cancer outcomes from histology and genomics using convolutional networks. Proc. Natl Acad. Sci. USA 115, E2970–E2979 (2018).
Article CAS PubMed PubMed Central Google Scholar
Chang, Y. et al. A survey on evaluation of large language models. ACM Trans. Intell. Syst. Technol. 15, 1–45 (2024).
Article Google Scholar
Lin, T., Wang, Y., Liu, X. & Qiu, X. A survey of transformers. AI Open 3, 111–132 (2022).
Article Google Scholar

Download references

Acknowledgements

R.P.-L. is supported by LaCaixa Foundation, a CRIS Foundation Talent Award (TALENT19-05), the FERO Foundation, the Instituto de Salud Carlos III-Investigacion en Salud (PI18/01395 and PI21/01019), the Prostate Cancer Foundation (18YOUN19) and the Asociación Española Contra el Cancer (AECC) (PRYCO211023SERR). J.N.K. is supported by the German Cancer Aid (DECADE, 70115166), the German Federal Ministry of Education and Research (PEARL, 01KD2104C; CAMINO, 01EO2101; SWAG, 01KD2215A; TRANSFORM LIVER, 031L0312A; and TANGERINE, 01KT2302 through ERA-NET Transcan), the German Academic Exchange Service (SECAI, 57616814), the German Federal Joint Committee (TransplantKI, 01VSF21048), the European Union’s Horizon Europe and innovation programme (ODELIA, 101057091; and GENIAL, 101096312), the European Research Council (ERC; NADIR, 101114631) and the National Institute for Health and Care Research (NIHR; NIHR203331) Leeds Biomedical Research Centre. The views expressed are those of the authors and not necessarily those of the NHS, the NIHR or the Department of Health and Social Care. This work was funded by the European Union. Views and opinions expressed are, however, those of the authors only and do not necessarily reflect those of the European Union. Neither the European Union nor the granting authority can be held responsible for them.

Author information

Authors and Affiliations

Radiomics Group, Vall d’Hebron Institute of Oncology, Vall d’Hebron Barcelona Hospital Campus, Barcelona, Spain
Raquel Perez-Lopez
Else Kroener Fresenius Center for Digital Health, Technical University Dresden, Dresden, Germany
Narmin Ghaffari Laleh & Jakob Nikolas Kather
Department of Pathology, Brigham and Women’s Hospital, Harvard Medical School, Boston, MA, USA
Faisal Mahmood
Department of Pathology, Massachusetts General Hospital, Harvard Medical School, Boston, MA, USA
Faisal Mahmood
Cancer Program, Broad Institute of Harvard and MIT, Cambridge, MA, USA
Faisal Mahmood
Cancer Data Science Program, Dana-Farber Cancer Institute, Boston, MA, USA
Faisal Mahmood
Department of Biomedical Informatics, Harvard Medical School, Boston, MA, USA
Faisal Mahmood
Harvard Data Science Initiative, Harvard University, Cambridge, MA, USA
Faisal Mahmood
Department of Medicine I, University Hospital Dresden, Dresden, Germany
Jakob Nikolas Kather
Medical Oncology, National Center for Tumour Diseases (NCT), University Hospital Heidelberg, Heidelberg, Germany
Jakob Nikolas Kather

Authors

Raquel Perez-Lopez
View author publications
You can also search for this author in PubMed Google Scholar
Narmin Ghaffari Laleh
View author publications
You can also search for this author in PubMed Google Scholar
Faisal Mahmood
View author publications
You can also search for this author in PubMed Google Scholar
Jakob Nikolas Kather
View author publications
You can also search for this author in PubMed Google Scholar

Contributions

All authors contributed substantially to discussion of the content and reviewed and/or edited the manuscript before the submission. R.P.-L., N.G.L. and J.N.K. researched data for the article and wrote the article.

Corresponding author

Correspondence to Jakob Nikolas Kather.

Ethics declarations

Competing interests

J.N.K. declares consulting services for Owkin, DoMore Diagnostics, Panakeia, Scailyte, Mindpeak and MultiplexDx; holds shares in StratifAI GmbH; has received a research grant from GSK; and has received honoraria from AstraZeneca, Bayer, Eisai, Janssen, MSD, BMS, Roche, Pfizer and Fresenius. R.P.-L. declares research funding by AstraZeneca and Roche, and participates in the steering committee of a clinical trial sponsored by Roche, not related to this work. All other authors declare no competing interests.

Peer review

Peer review information

Nature Reviews Cancer thanks the anonymous reviewer(s) for their contribution to the peer review of this work.

Additional information

Publisher’s note Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Supplementary information

Supplementary Information

Glossary

Application programming interface: (API). A set of tools and protocols for building software and applications, enabling software to communicate with AI models.
Artificial neural networks: (ANNs). Computational models loosely inspired by the structure and function of the human brain, consisting of interconnected layers of nodes, called neurons, that process input data and learn to recognize patterns and make decisions.
Computational pathology: The use of algorithms, machine learning and image analysis techniques to extract information from digital pathology images.
Computer vision: A field of AI that focuses on enabling computers to analyse and interpret visual data, such as images and videos.
Convolutional neural networks: (CNNs). A type of deep neural network that is especially effective for analysing visual imagery and used in image analysis.
Deep learning: Deep learning is a subfield of machine learning that uses artificial neural networks with multiple layers, called deep neural networks, to learn and extract highly complex features and patterns from raw input data.
Digital images: Visual representations captured and stored in a digital format, consisting of a grid of pixels, with each pixel representing a colour intensity value.
Digital pathology: The practice of converting glass slides into digital slides that can be viewed, managed and analysed on a computer.
Explainability methods: Techniques in AI that provide insights and explanations on how the AI model arrived at its conclusions, thus making the decision-making process of the AI more transparent.
Generative AI: AI systems that can generate new content (text, images or music) that is similar to the content on which it was trained, often creating novel and coherent outputs.
Gigapixel images: Extremely high-resolution digital images consisting of 1 billion pixels, obtained by scanning tissue slides with a slide scanner.
Graphics processing units: (GPUs). Specialized hardware used to rapidly process large blocks of data simultaneously, used in computer gaming and AI.
Large language models: (LLMs). Advanced AI models trained on vast amounts of text data, capable of analysing, generating and manipulating human language, often at the human level¹⁷⁴.
Long short-term memory (LSTM) networks: A type of neural network particularly good at processing sequences of data (such as time series or language), with a capability to remember information for a certain time.
Machine learning: A subset of AI focusing on the development of algorithms and models that enable computers to learn and improve their performance on a specific task without being explicitly instructed how to achieve this.
Natural language processing: (NLP). A branch of AI that helps computers to analyse, interpret and respond to human language in a useful way.
Prompt engineering: Crafting inputs or questions in a way that guides AI models, particularly LLMs, to provide the most effective and accurate responses.
Transformers: Types of a neural network model that excel at processing sequences of data, such as sentences in text, by focusing on different parts of the sequence to make predictions¹⁷⁵.
Voxel: The three-dimensional equivalent of a pixel in images, representing a value on a regular grid in three-dimensional space, commonly used in medical imaging such as MRI and CT scans.

Rights and permissions

Springer Nature or its licensor (e.g. a society or other partner) holds exclusive rights to this article under a publishing agreement with the author(s) or other rightsholder(s); author self-archiving of the accepted manuscript version of this article is solely governed by the terms of such publishing agreement and applicable law.

Reprints and permissions

About this article

Cite this article

Perez-Lopez, R., Ghaffari Laleh, N., Mahmood, F. et al. A guide to artificial intelligence for cancer researchers. Nat Rev Cancer 24, 427–441 (2024). https://doi.org/10.1038/s41568-024-00694-7

Download citation

Accepted: 09 April 2024
Published: 16 May 2024
Issue Date: June 2024
DOI: https://doi.org/10.1038/s41568-024-00694-7

A guide to artificial intelligence for cancer researchers

Subjects

Abstract

Access options

Similar content being viewed by others

Artificial intelligence in histopathology: enhancing cancer research and clinical oncology

Guiding principles for the responsible development of artificial intelligence tools for healthcare

AI in health and medicine

References

Acknowledgements

Author information

Authors and Affiliations

Contributions

Corresponding author

Ethics declarations

Competing interests

Peer review

Peer review information

Additional information

Related links

Supplementary information

Supplementary Information

Glossary

Rights and permissions

About this article

Cite this article

Search

Quick links

Subjects

Abstract

Access options

Similar content being viewed by others

Artificial intelligence in histopathology: enhancing cancer research and clinical oncology

Guiding principles for the responsible development of artificial intelligence tools for healthcare

AI in health and medicine

References

Acknowledgements

Author information

Authors and Affiliations

Contributions

Corresponding author

Ethics declarations

Competing interests

Peer review

Peer review information

Additional information

Related links

Supplementary information

Supplementary Information

Glossary

Rights and permissions

About this article

Cite this article

Share this article

Search

Quick links