A visual–language foundation model for pathology image analysis using medical Twitter

Huang, Zhi; Bianchi, Federico; Yuksekgonul, Mert; Montine, Thomas J.; Zou, James

doi:10.1038/s41591-023-02504-3

Article
Published: 17 August 2023

A visual–language foundation model for pathology image analysis using medical Twitter

Nature Medicine volume 29, pages 2307–2316 (2023)Cite this article

27k Accesses
29 Citations
256 Altmetric
Metrics details

Subjects

Abstract

The lack of annotated publicly available medical images is a major barrier for computational research and education innovations. At the same time, many de-identified images and much knowledge are shared by clinicians on public forums such as medical Twitter. Here we harness these crowd platforms to curate OpenPath, a large dataset of 208,414 pathology images paired with natural language descriptions. We demonstrate the value of this resource by developing pathology language–image pretraining (PLIP), a multimodal artificial intelligence with both image and text understanding, which is trained on OpenPath. PLIP achieves state-of-the-art performances for classifying new pathology images across four external datasets: for zero-shot classification, PLIP achieves F1 scores of 0.565–0.832 compared to F1 scores of 0.030–0.481 for previous contrastive language–image pretrained model. Training a simple supervised classifier on top of PLIP embeddings also achieves 2.5% improvement in F1 scores compared to using other supervised model embeddings. Moreover, PLIP enables users to retrieve similar cases by either image or natural language search, greatly facilitating knowledge sharing. Our approach demonstrates that publicly shared medical information is a tremendous resource that can be harnessed to develop medical artificial intelligence for enhancing diagnosis, knowledge sharing and education.

Access through your institution

Buy or subscribe

This is a preview of subscription content, access via your institution

Access options

Access through your institution

Buy this article

Purchase on Springer Link
Instant access to full article PDF

Buy now

Prices may be subject to local taxes which are calculated during checkout

**Fig. 2: PLIP predicts new classes via zero-shot transfer learning.**

**Fig. 3: Image embedding analysis and linear probing results.**

**Fig. 4: Text-to-image retrieval for pathology images.**

**Fig. 5: Image-to-image retrieval for pathology images.**

A visual-language foundation model for computational pathology

Article 19 March 2024

Interpretable multimodal deep learning for real-time pan-tissue pan-disease pathology search on social media

Article Open access 28 May 2020

A medical multimodal large language model for future pandemics

Article Open access 02 December 2023

Data availability

All data in OpenPath are publicly available from Twitter and LAION-5B (https://laion.ai/blog/laion-5b/). The Twitter IDs used for training and validation can be accessed at https://tinyurl.com/openpathdata. The validation datasets are publicly available and can be accessed from the following: Kather colon dataset (https://zenodo.org/record/1214456); PanNuke (https://warwick.ac.uk/fac/cross_fac/tia/data/pannuke); DigestPath (https://digestpath2019.grand-challenge.org/); WSSS4LUAD (https://wsss4luad.grand-challenge.org/); PathPedia (https://www.pathpedia.com/Education/eAtlas/Default.aspx); PubMed and Books pathology collection (https://warwick.ac.uk/fac/cross_fac/tia/data/arch); KIMIA Path24C (https://kimialab.uwaterloo.ca/kimia/index.php/pathology-images-kimia-path24/). The ImageNet dataset (https://www.image-net.org/) was adopted for the pretrained ViT-B/32 model. The trained model, source codes and interactive results can also be accessed at https://tinyurl.com/webplip.

Code availability

The trained model and source codes can be accessed at https://tinyurl.com/webplip.

References

Huang, Z. et al. Artificial intelligence reveals features associated with breast cancer neoadjuvant chemotherapy responses from multi-stain histopathologic images. NPJ Precis. Oncol. 7, 14 (2023).
Article CAS PubMed PubMed Central Google Scholar
Campanella, G. et al. Clinical-grade computational pathology using weakly supervised deep learning on whole slide images. Nat. Med. 25, 1301–1309 (2019).
Article CAS PubMed PubMed Central Google Scholar
Dawood, M., Branson, K., Rajpoot, N. M. & Ul Amir Afsar Minhas, F. ALBRT: cellular composition prediction in routine histology images. In Proc. IEEE/CVF International Conference on Computer Vision Workshops 664–673 (IEEE, 2021).
Hegde, N. et al. Similar image search for histopathology: SMILY. NPJ Digit. Med. 2, 56 (2019).
Article PubMed PubMed Central Google Scholar
Chen, C. et al. Fast and scalable search of whole-slide images via self-supervised deep learning. Nat. Biomed. Eng. 6, 1420–1434 (2022).
Article PubMed PubMed Central Google Scholar
Gamper, J., Alemi Koohbanani, N., Benet, K., Khuram, A. & Rajpoot, N. PanNuke: An Open Pan-Cancer Histology Dataset for Nuclei Instance Segmentation and Classification. In Digital Pathology (eds Reyes-Aldasoro, C. et al.) 11–19 (Springer International Publishing, 2019).
Graham, S. et al. Lizard: a large-scale dataset for colonic nuclear instance segmentation and classification. In Proc. IEEE/CVF International Conference on Computer Vision Workshops 684–693 (IEEE, 2021).
Amgad, M. et al. Structured crowdsourcing enables convolutional segmentation of histology images. Bioinformatics 35, 3461–3467 (2019).
Article CAS PubMed PubMed Central Google Scholar
Singh, H. & Graber, M. L. Improving diagnosis in health care—the next imperative for patient safety. N. Engl. J. Med. 373, 2493–2495 (2015).
Article PubMed Google Scholar
Erickson, L. A., Mete, O., Juhlin, C. C., Perren, A. & Gill, A. J. Overview of the 2022 WHO classification of parathyroid tumors. Endocr. Pathol. 33, 64–89 (2022).
Article PubMed Google Scholar
van Rijthoven, M. et al. Few-shot weakly supervised detection and retrieval in histopathology whole-slide images. Medical Imaging 2021: Digital Pathology (eds Tomaszewski, J. E. & Ward, A. D.) 137–143 (Society of Photographic Instrumentation Engineers, 2021).
Chen, J., Jiao, J., He, S., Han, G. & Qin, J. Few-shot breast cancer metastases classification via unsupervised cell ranking. IEEE/ACM Trans. Comput. Biol. Bioinform. 18, 1914–1923 (2021).
Article PubMed Google Scholar
Schaumberg, A. J. et al. Interpretable multimodal deep learning for real-time pan-tissue pan-disease pathology search on social media. Mod. Pathol. 33, 2169–2185 (2020).
Article PubMed PubMed Central Google Scholar
Schukow, C. P., Booth, A. L., Mirza, K. M. & Jajosky, R. P. #PathTwitter: a positive platform where medical students can engage the pathology community. Arch. Pathol. Lab. Med. 147, 135–136 (2023).
Article PubMed Google Scholar
Crane, G. M. & Gardner, J. M. Pathology image-sharing on social media: recommendations for protecting privacy while motivating education. AMA J. Ethics 18, 817–825 (2016).
Article PubMed Google Scholar
El Hussein, S. et al. Next-generation scholarship: rebranding hematopathology using twitter: the MD Anderson experience. Mod. Pathol. 34, 854–861 (2021).
Article CAS PubMed Google Scholar
Mukhopadhyay, S. et al. The network that never sleeps. Lab. Med. 52, e83–e103 (2021).
Article PubMed Google Scholar
Allen, T. C. Social media: pathologists’ force multiplier. Arch. Pathol. Lab. Med. 138, 1000–1001 (2014).
Article PubMed Google Scholar
Misialek, M. J. & Allen, T. C. You’re on social media! So now what? Arch. Pathol. Lab. Med. 140, 393 (2016).
Article PubMed Google Scholar
Katz, M. S. et al. Disease-specific hashtags for online communication about cancer care. JAMA Oncol. 2, 392–394 (2016).
Article PubMed Google Scholar
Oltulu, P., Mannan, A. A. S. R. & Gardner, J. M. Effective use of Twitter and Facebook in pathology practice. Hum. Pathol. 73, 128–143 (2018).
Article PubMed Google Scholar
Schuhmann, C. et al. LAION-5B: An open large-scale dataset for training next generation image-text models. In Proc. 35th International Conference on Neural Information Processing Systems (eds Koyejo, S. et al.) 25278–25294 (2022).
Palatucci, M., Pomerleau, D., Hinton, G. & Mitchell, T. M. Zero-shot learning with semantic output codes. In Proc. 22nd International Conference on Neural Information Processing Systems (eds Bengio, Y. et al.) 1410–1418 (Curran Associates, 2009).
Pathology Tag Ontology. Symplur https://www.symplur.com/healthcare-hashtags/ontology/pathology/ (2023).
Radford, A. et al. Learning transferable visual models from natural language supervision. In Proc. 38th International Conference on Machine Learning (eds Meila, M. & Zhang, T.) 8748–8763 (PMLR, 2021).
Kather, J. N. et al. Predicting survival from colorectal cancer histology slides using deep learning: a retrospective multicenter study. PLoS Med. 16, e1002730 (2019).
Article PubMed PubMed Central Google Scholar
Da, Q. et al. DigestPath: a benchmark dataset with challenge review for the pathological detection and segmentation of digestive-system. Med. Image Anal. 80, 102485 (2022).
Article PubMed Google Scholar
Han, C. et al. WSSS4LUAD: grand challenge on weakly-supervised tissue semantic segmentation for lung adenocarcinoma. Preprint at https://doi.org/10.48550/arXiv.2204.06455 (2022).
Tiu, E. et al. Expert-level detection of pathologies from unannotated chest X-ray images via self-supervised learning. Nat. Biomed. Eng. 6, 1399–1406 (2022).
Article PubMed PubMed Central Google Scholar
Eslami, S., Meinel, C. & de Melo, G. PubMedCLIP: How much does CLIP benefit visual question answering in the medical domain? In Proceedings of the 17th Conference of the European Chapter of the Association for Computational Linguistics (eds Vlachos, A. & Augenstein, I.) 1181–1193 (EACL, 2023).
Wang, Z., Wu, Z., Agarwal, D. & Sun, J. MedCLIP: contrastive learning from unpaired medical images and text. Preprint at https://doi.org/10.48550/arXiv.2210.10163 (2022).
Mormont, R., Geurts, P. & Maree, R. Multi-task pre-training of deep neural networks for digital pathology. IEEE J. Biomed. Health Inform. 25, 412–421 (2021).
Article PubMed Google Scholar
Kherfi, M. L., Ziou, D. & Bernardi, A. Image retrieval from the world wide web: issues, techniques, and systems. ACM Comput. Surv. 36, 35–67 (2004).
Article Google Scholar
Gamper, J. & Rajpoot, N. Multiple instance captioning: learning representations from histopathology textbooks and articles. In Proc. 2021 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR) 16549–16559 (IEEE, 2021).
Shafiei, S., Babaie, M., Kalra, S. & Tizhoosh, H. R. Colored Kimia Path24 dataset: configurations and benchmarks with deep embeddings. Preprint at https://doi.org/10.48550/arXiv.2102.07611 (2021).
Madabhushi, A. & Lee, G. Image analysis and machine learning in digital pathology: challenges and opportunities. Med. Image Anal. 33, 170–175 (2016).
Article PubMed PubMed Central Google Scholar
Srinidhi, C. L., Kim, S. W., Chen, F.-D. & Martel, A. L. Self-supervised driven consistency training for annotation efficient histopathology image analysis. Med. Image Anal. 75, 102256 (2022).
Article PubMed Google Scholar
Tizhoosh, H. R. & Pantanowitz, L. Artificial intelligence and digital pathology: challenges and opportunities. J. Pathol. Inform. 9, 38 (2018).
Article PubMed PubMed Central Google Scholar
Zhou, C., He, J., Ma, X., Berg-Kirkpatrick, T. & Neubig, G. Prompt consistency for zero-shot task generalization. In Findings of the Association for Computational Linguistics (eds Goldberg, Y. et al.) 2613–2626 (EMNLP, 2022).
Radford, A. et al. Language models are unsupervised multitask learners. OpenAI blog 1, 9 (2019).
Google Scholar
van den Oord, A., Li, Y. & Vinyals, O. Representation learning with contrastive predictive coding. Preprint at https://doi.org/10.48550/arXiv.1807.03748 (2018).
Alain, G. & Bengio, Y. Understanding intermediate layers using linear classifier probes. In 5th International Conference on Learning Representations Workshop (2017).
Liang, Y., Zhu, L., Wang, X. & Yang, Y. A simple episodic linear probe improves visual recognition in the wild. In Proc. IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR) 9559–9569 (IEEE, 2022).
Pedregosa, F. et al. Scikit-learn: machine learning in Python.J. Mach. Learn. Res. 12, 2825–2830 (2011).
Google Scholar
Huang, G., Liu, Z., Maaten, L. van der & Weinberger, K. Q. Densely connected convolutional networks. In Proc. IEEE Conference on Computer Vision and Pattern Recognition (CVPR) 4700–4708 (IEEE, 2017).
Deng, J. et al. ImageNet: a large-scale hierarchical image database. In Proc. IEEE Conference on Computer Vision and Pattern Recognition 248–255 (IEEE, 2009).
Loshchilov, I. & Hutter, F. Decoupled weight decay regularization. Preprint at https://doi.org/10.48550/arXiv.1711.05101 (2017).
Zhang, S., Yang, M., Cour, T., Yu, K. & Metaxas, D. N. Query specific fusion for image retrieval. In Proc. European Conference on Computer Vision 2012 (eds Fitzgibbon, A. et al.) 660–673 (ECCV, 2012).

Download references

Acknowledgements

F.B. is supported by the Hoffman-Yee Research Grant Program and the Stanford Institute for Human-Centered Artificial Intelligence. J.Z. is supported by the Chan Zuckerberg Biohub.

Author information

These authors contributed equally: Zhi Huang, Federico Bianchi.

Authors and Affiliations

Department of Biomedical Data Science, Stanford University School of Medicine, Stanford, CA, USA
Zhi Huang & James Zou
Department of Pathology, Stanford University School of Medicine, Stanford, CA, USA
Zhi Huang & Thomas J. Montine
Department of Computer Science, Stanford University, Stanford, CA, USA
Federico Bianchi, Mert Yuksekgonul & James Zou

Authors

Zhi Huang
View author publications
You can also search for this author in PubMed Google Scholar
Federico Bianchi
View author publications
You can also search for this author in PubMed Google Scholar
Mert Yuksekgonul
View author publications
You can also search for this author in PubMed Google Scholar
Thomas J. Montine
View author publications
You can also search for this author in PubMed Google Scholar
James Zou
View author publications
You can also search for this author in PubMed Google Scholar

Contributions

Z.H., F.B. and J.Z. designed the study. Z.H. and F.B. carried out the data collection, data analysis, model construction, model validation and manuscript writing. M.Y. carried out the data analysis, model construction, model validation and manuscript writing. T.J.M. provided knowledge support, interpreted the findings and helped with manuscript writing. J.Z. provided knowledge support, interpreted the findings, helped with manuscript writing and supervised the study. All authors contributed to writing the manuscript and reviewed and approved the final version.

Corresponding author

Correspondence to James Zou.

Ethics declarations

Competing interests

The authors declare no competing interests.

Peer review

Peer review information

Nature Medicine thanks Geert Litjens, Lee Cooper and the other, anonymous, reviewer(s) for their contribution to the peer review of this work. Primary Handling Editor: Lorenzo Righetto, in collaboration with the Nature Medicine team.

Additional information

Publisher’s note Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Extended data

Extended Data Fig. 1 Cohort inclusion and exclusion criteria and flowcharts.

a, Twitter training dataset from 2006-03-21 to 2022-11-15. b, Twitter validation dataset for image retrieval from 2022-11-16 to 2023-01-15.

Extended Data Fig. 2 Confusion matrix from zero-shot learning in the Kather colon dataset.

a, Confusion matrix of PLIP model; b, Confusion matrix of CLIP model; c, Confusion matrix of the results from predicting the majority class (or Majority in short).

Extended Data Fig. 3 Comparison of image embeddings derived from models in the Kather colon dataset.

a, image embeddings derived from PLIP, b, image embeddings derived from baseline CLIP. c, image embeddings derived from MuDiPath. ADI: Adipose tissue, BACK: background, DEB: debris, LYM: lymphocytes, MUC: mucus, MUS: smooth muscle, NORM: normal colon mucosa, STR: cancer-associated stroma, TUM: colorectal adenocarcinoma epithelium.

Extended Data Fig. 4 Comparison of image embeddings between PLIP, CLIP, and MuDiPath models for the PanNuke dataset.

a, Image embeddings generated by the PLIP model, colored by benign and malignant. b, Image embeddings generated by the PLIP model, colored by 19 pathology subspecialties. Scatters with black edges indicate malignant images. c, Image embeddings generated by the CLIP model, colored by benign and malignant. d, Image embeddings generated by the CLIP model, colored by 19 pathology subspecialties. Scatters with black edges indicate malignant images. e, Image embeddings generated by the MuDiPath model, colored by benign and malignant. f, Image embeddings generated by the MuDiPath model, colored by 19 organs. Scatters with black edges indicate malignant images.

Extended Data Fig. 5 Comparison of image embeddings derived from models in the DigestPath dataset.

a, image embeddings derived from PLIP, b, image embeddings derived from baseline CLIP. c, image embeddings derived from MuDiPath.

Extended Data Fig. 6 Comparison of image embeddings derived from models in the WSSS4LUAD dataset.

a, image embeddings derived from PLIP, b, image embeddings derived from baseline CLIP. c, image embeddings derived from MuDiPath.

Extended Data Fig. 7 Cluster visualization of images in DigestPath dataset.

a, Image patches in low-dimensional space colored by different downsampling rates. b, Visualization of image patches on different clusters.

Extended Data Fig. 8 Comparison to supervised deep learning models.

The fine-tuning was conducted on a, Kather colon dataset training split, b, PanNuke dataset, c, DigestPath dataset, and d, WSSS4LUAD dataset, by comparing the PLIP image encoder to ViT-B/32 (pre-trained on ImageNet). In the line plots, mean values and 95% confidence intervals are presented by using 10 different random seeds for subsetting the data and running the models. The improvements for PLIP are particularly large for smaller datasets. For instance, when comparing the weighted F1 scores across the four datasets using only 1% of the training data: (i) for Kather training split, the PLIP image encoder achieved F1 = 0.952, while ViT-B/32 achieved F1 = 0.921; (ii) for PanNuke dataset, the PLIP image encoder achieved F1 = 0.715, while ViT-B/32 achieved F1 = 0.637; (iii) for DigestPath dataset, the PLIP image encoder achieved F1 = 0.933, while ViT-B/32 achieved F1 = 0.872; (iv) for WSSS4LUAD dataset, the PLIP image encoder achieved F1 = 0.816, while ViT-B/32 achieved F1 = 0.645. When comparing the weighted F1 scores across the four datasets using all of the training data: (i) for Kather training split, the PLIP image encoder achieved F1 = 0.994, while ViT-B/32 achieved F1 = 0.991; (ii) for PanNuke dataset, the PLIP image encoder achieved F1 = 0.962, while ViT-B/32 achieved F1 = 0.938; (iii) for DigestPath dataset, the PLIP image encoder achieved F1 = 0.977, while ViT-B/32 achieved F1 = 0.968; (iv) for WSSS4LUAD dataset, the PLIP image encoder achieved F1 = 0.958, while ViT-B/32 achieved F1 = 0.941.

Extended Data Fig. 9 Text-to-image retrieval performances for Recall@50.

a, Image retrieval performances for Recall@50 within each of the pathology subspecialty-specific hashtags. b, Two-sided Spearman correlations between the number of candidates and fold changes for Recall@50 when comparing the PLIP model with random and CLIP, respectively. In regression plots, the regression estimates are displayed with 95% confidence intervals in grey or purple colors.

Extended Data Table 1 List of pathology hashtags on Twitter used in this study

Full size table

Supplementary information

Supplementary Information

Supplementary Discussion, Tables 1–6 and Figs. 1 and 2.

Reporting Summary

Rights and permissions

Springer Nature or its licensor (e.g. a society or other partner) holds exclusive rights to this article under a publishing agreement with the author(s) or other rightsholder(s); author self-archiving of the accepted manuscript version of this article is solely governed by the terms of such publishing agreement and applicable law.

Reprints and permissions

About this article

Cite this article

Huang, Z., Bianchi, F., Yuksekgonul, M. et al. A visual–language foundation model for pathology image analysis using medical Twitter. Nat Med 29, 2307–2316 (2023). https://doi.org/10.1038/s41591-023-02504-3

Download citation

Received: 26 March 2023
Accepted: 18 July 2023
Published: 17 August 2023
Issue Date: September 2023
DOI: https://doi.org/10.1038/s41591-023-02504-3

This article is cited by

Large language models for preventing medication direction errors in online pharmacies
- Cristobal Pais
- Jianfeng Liu
- Mohsen Bayati
Nature Medicine (2024)
Vision–language foundation model for echocardiogram interpretation
- Matthew Christensen
- Milos Vukadinovic
- David Ouyang
Nature Medicine (2024)
Artificial intelligence applications in histopathology
- Cagla Deniz Bahadir
- Mohamed Omar
- Mert R. Sabuncu
Nature Reviews Electrical Engineering (2024)
Artificial intelligence in surgery
- Chris Varghese
- Ewen M. Harrison
- Eric J. Topol
Nature Medicine (2024)
Towards a general-purpose foundation model for computational pathology
- Richard J. Chen
- Tong Ding
- Faisal Mahmood
Nature Medicine (2024)