Abstract
The lack of annotated publicly available medical images is a major barrier for computational research and education innovations. At the same time, many de-identified images and much knowledge are shared by clinicians on public forums such as medical Twitter. Here we harness these crowd platforms to curate OpenPath, a large dataset of 208,414 pathology images paired with natural language descriptions. We demonstrate the value of this resource by developing pathology language–image pretraining (PLIP), a multimodal artificial intelligence with both image and text understanding, which is trained on OpenPath. PLIP achieves state-of-the-art performances for classifying new pathology images across four external datasets: for zero-shot classification, PLIP achieves F1 scores of 0.565–0.832 compared to F1 scores of 0.030–0.481 for previous contrastive language–image pretrained model. Training a simple supervised classifier on top of PLIP embeddings also achieves 2.5% improvement in F1 scores compared to using other supervised model embeddings. Moreover, PLIP enables users to retrieve similar cases by either image or natural language search, greatly facilitating knowledge sharing. Our approach demonstrates that publicly shared medical information is a tremendous resource that can be harnessed to develop medical artificial intelligence for enhancing diagnosis, knowledge sharing and education.
This is a preview of subscription content, access via your institution
Access options
Access Nature and 54 other Nature Portfolio journals
Get Nature+, our best-value online-access subscription
$29.99 / 30 days
cancel any time
Subscribe to this journal
Receive 12 print issues and online access
$209.00 per year
only $17.42 per issue
Buy this article
- Purchase on Springer Link
- Instant access to full article PDF
Prices may be subject to local taxes which are calculated during checkout
Similar content being viewed by others
Data availability
All data in OpenPath are publicly available from Twitter and LAION-5B (https://laion.ai/blog/laion-5b/). The Twitter IDs used for training and validation can be accessed at https://tinyurl.com/openpathdata. The validation datasets are publicly available and can be accessed from the following: Kather colon dataset (https://zenodo.org/record/1214456); PanNuke (https://warwick.ac.uk/fac/cross_fac/tia/data/pannuke); DigestPath (https://digestpath2019.grand-challenge.org/); WSSS4LUAD (https://wsss4luad.grand-challenge.org/); PathPedia (https://www.pathpedia.com/Education/eAtlas/Default.aspx); PubMed and Books pathology collection (https://warwick.ac.uk/fac/cross_fac/tia/data/arch); KIMIA Path24C (https://kimialab.uwaterloo.ca/kimia/index.php/pathology-images-kimia-path24/). The ImageNet dataset (https://www.image-net.org/) was adopted for the pretrained ViT-B/32 model. The trained model, source codes and interactive results can also be accessed at https://tinyurl.com/webplip.
Code availability
The trained model and source codes can be accessed at https://tinyurl.com/webplip.
References
Huang, Z. et al. Artificial intelligence reveals features associated with breast cancer neoadjuvant chemotherapy responses from multi-stain histopathologic images. NPJ Precis. Oncol. 7, 14 (2023).
Campanella, G. et al. Clinical-grade computational pathology using weakly supervised deep learning on whole slide images. Nat. Med. 25, 1301–1309 (2019).
Dawood, M., Branson, K., Rajpoot, N. M. & Ul Amir Afsar Minhas, F. ALBRT: cellular composition prediction in routine histology images. In Proc. IEEE/CVF International Conference on Computer Vision Workshops 664–673 (IEEE, 2021).
Hegde, N. et al. Similar image search for histopathology: SMILY. NPJ Digit. Med. 2, 56 (2019).
Chen, C. et al. Fast and scalable search of whole-slide images via self-supervised deep learning. Nat. Biomed. Eng. 6, 1420–1434 (2022).
Gamper, J., Alemi Koohbanani, N., Benet, K., Khuram, A. & Rajpoot, N. PanNuke: An Open Pan-Cancer Histology Dataset for Nuclei Instance Segmentation and Classification. In Digital Pathology (eds Reyes-Aldasoro, C. et al.) 11–19 (Springer International Publishing, 2019).
Graham, S. et al. Lizard: a large-scale dataset for colonic nuclear instance segmentation and classification. In Proc. IEEE/CVF International Conference on Computer Vision Workshops 684–693 (IEEE, 2021).
Amgad, M. et al. Structured crowdsourcing enables convolutional segmentation of histology images. Bioinformatics 35, 3461–3467 (2019).
Singh, H. & Graber, M. L. Improving diagnosis in health care—the next imperative for patient safety. N. Engl. J. Med. 373, 2493–2495 (2015).
Erickson, L. A., Mete, O., Juhlin, C. C., Perren, A. & Gill, A. J. Overview of the 2022 WHO classification of parathyroid tumors. Endocr. Pathol. 33, 64–89 (2022).
van Rijthoven, M. et al. Few-shot weakly supervised detection and retrieval in histopathology whole-slide images. Medical Imaging 2021: Digital Pathology (eds Tomaszewski, J. E. & Ward, A. D.) 137–143 (Society of Photographic Instrumentation Engineers, 2021).
Chen, J., Jiao, J., He, S., Han, G. & Qin, J. Few-shot breast cancer metastases classification via unsupervised cell ranking. IEEE/ACM Trans. Comput. Biol. Bioinform. 18, 1914–1923 (2021).
Schaumberg, A. J. et al. Interpretable multimodal deep learning for real-time pan-tissue pan-disease pathology search on social media. Mod. Pathol. 33, 2169–2185 (2020).
Schukow, C. P., Booth, A. L., Mirza, K. M. & Jajosky, R. P. #PathTwitter: a positive platform where medical students can engage the pathology community. Arch. Pathol. Lab. Med. 147, 135–136 (2023).
Crane, G. M. & Gardner, J. M. Pathology image-sharing on social media: recommendations for protecting privacy while motivating education. AMA J. Ethics 18, 817–825 (2016).
El Hussein, S. et al. Next-generation scholarship: rebranding hematopathology using twitter: the MD Anderson experience. Mod. Pathol. 34, 854–861 (2021).
Mukhopadhyay, S. et al. The network that never sleeps. Lab. Med. 52, e83–e103 (2021).
Allen, T. C. Social media: pathologists’ force multiplier. Arch. Pathol. Lab. Med. 138, 1000–1001 (2014).
Misialek, M. J. & Allen, T. C. You’re on social media! So now what? Arch. Pathol. Lab. Med. 140, 393 (2016).
Katz, M. S. et al. Disease-specific hashtags for online communication about cancer care. JAMA Oncol. 2, 392–394 (2016).
Oltulu, P., Mannan, A. A. S. R. & Gardner, J. M. Effective use of Twitter and Facebook in pathology practice. Hum. Pathol. 73, 128–143 (2018).
Schuhmann, C. et al. LAION-5B: An open large-scale dataset for training next generation image-text models. In Proc. 35th International Conference on Neural Information Processing Systems (eds Koyejo, S. et al.) 25278–25294 (2022).
Palatucci, M., Pomerleau, D., Hinton, G. & Mitchell, T. M. Zero-shot learning with semantic output codes. In Proc. 22nd International Conference on Neural Information Processing Systems (eds Bengio, Y. et al.) 1410–1418 (Curran Associates, 2009).
Pathology Tag Ontology. Symplur https://www.symplur.com/healthcare-hashtags/ontology/pathology/ (2023).
Radford, A. et al. Learning transferable visual models from natural language supervision. In Proc. 38th International Conference on Machine Learning (eds Meila, M. & Zhang, T.) 8748–8763 (PMLR, 2021).
Kather, J. N. et al. Predicting survival from colorectal cancer histology slides using deep learning: a retrospective multicenter study. PLoS Med. 16, e1002730 (2019).
Da, Q. et al. DigestPath: a benchmark dataset with challenge review for the pathological detection and segmentation of digestive-system. Med. Image Anal. 80, 102485 (2022).
Han, C. et al. WSSS4LUAD: grand challenge on weakly-supervised tissue semantic segmentation for lung adenocarcinoma. Preprint at https://doi.org/10.48550/arXiv.2204.06455 (2022).
Tiu, E. et al. Expert-level detection of pathologies from unannotated chest X-ray images via self-supervised learning. Nat. Biomed. Eng. 6, 1399–1406 (2022).
Eslami, S., Meinel, C. & de Melo, G. PubMedCLIP: How much does CLIP benefit visual question answering in the medical domain? In Proceedings of the 17th Conference of the European Chapter of the Association for Computational Linguistics (eds Vlachos, A. & Augenstein, I.) 1181–1193 (EACL, 2023).
Wang, Z., Wu, Z., Agarwal, D. & Sun, J. MedCLIP: contrastive learning from unpaired medical images and text. Preprint at https://doi.org/10.48550/arXiv.2210.10163 (2022).
Mormont, R., Geurts, P. & Maree, R. Multi-task pre-training of deep neural networks for digital pathology. IEEE J. Biomed. Health Inform. 25, 412–421 (2021).
Kherfi, M. L., Ziou, D. & Bernardi, A. Image retrieval from the world wide web: issues, techniques, and systems. ACM Comput. Surv. 36, 35–67 (2004).
Gamper, J. & Rajpoot, N. Multiple instance captioning: learning representations from histopathology textbooks and articles. In Proc. 2021 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR) 16549–16559 (IEEE, 2021).
Shafiei, S., Babaie, M., Kalra, S. & Tizhoosh, H. R. Colored Kimia Path24 dataset: configurations and benchmarks with deep embeddings. Preprint at https://doi.org/10.48550/arXiv.2102.07611 (2021).
Madabhushi, A. & Lee, G. Image analysis and machine learning in digital pathology: challenges and opportunities. Med. Image Anal. 33, 170–175 (2016).
Srinidhi, C. L., Kim, S. W., Chen, F.-D. & Martel, A. L. Self-supervised driven consistency training for annotation efficient histopathology image analysis. Med. Image Anal. 75, 102256 (2022).
Tizhoosh, H. R. & Pantanowitz, L. Artificial intelligence and digital pathology: challenges and opportunities. J. Pathol. Inform. 9, 38 (2018).
Zhou, C., He, J., Ma, X., Berg-Kirkpatrick, T. & Neubig, G. Prompt consistency for zero-shot task generalization. In Findings of the Association for Computational Linguistics (eds Goldberg, Y. et al.) 2613–2626 (EMNLP, 2022).
Radford, A. et al. Language models are unsupervised multitask learners. OpenAI blog 1, 9 (2019).
van den Oord, A., Li, Y. & Vinyals, O. Representation learning with contrastive predictive coding. Preprint at https://doi.org/10.48550/arXiv.1807.03748 (2018).
Alain, G. & Bengio, Y. Understanding intermediate layers using linear classifier probes. In 5th International Conference on Learning Representations Workshop (2017).
Liang, Y., Zhu, L., Wang, X. & Yang, Y. A simple episodic linear probe improves visual recognition in the wild. In Proc. IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR) 9559–9569 (IEEE, 2022).
Pedregosa, F. et al. Scikit-learn: machine learning in Python.J. Mach. Learn. Res. 12, 2825–2830 (2011).
Huang, G., Liu, Z., Maaten, L. van der & Weinberger, K. Q. Densely connected convolutional networks. In Proc. IEEE Conference on Computer Vision and Pattern Recognition (CVPR) 4700–4708 (IEEE, 2017).
Deng, J. et al. ImageNet: a large-scale hierarchical image database. In Proc. IEEE Conference on Computer Vision and Pattern Recognition 248–255 (IEEE, 2009).
Loshchilov, I. & Hutter, F. Decoupled weight decay regularization. Preprint at https://doi.org/10.48550/arXiv.1711.05101 (2017).
Zhang, S., Yang, M., Cour, T., Yu, K. & Metaxas, D. N. Query specific fusion for image retrieval. In Proc. European Conference on Computer Vision 2012 (eds Fitzgibbon, A. et al.) 660–673 (ECCV, 2012).
Acknowledgements
F.B. is supported by the Hoffman-Yee Research Grant Program and the Stanford Institute for Human-Centered Artificial Intelligence. J.Z. is supported by the Chan Zuckerberg Biohub.
Author information
Authors and Affiliations
Contributions
Z.H., F.B. and J.Z. designed the study. Z.H. and F.B. carried out the data collection, data analysis, model construction, model validation and manuscript writing. M.Y. carried out the data analysis, model construction, model validation and manuscript writing. T.J.M. provided knowledge support, interpreted the findings and helped with manuscript writing. J.Z. provided knowledge support, interpreted the findings, helped with manuscript writing and supervised the study. All authors contributed to writing the manuscript and reviewed and approved the final version.
Corresponding author
Ethics declarations
Competing interests
The authors declare no competing interests.
Peer review
Peer review information
Nature Medicine thanks Geert Litjens, Lee Cooper and the other, anonymous, reviewer(s) for their contribution to the peer review of this work. Primary Handling Editor: Lorenzo Righetto, in collaboration with the Nature Medicine team.
Additional information
Publisher’s note Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
Extended data
Extended Data Fig. 1 Cohort inclusion and exclusion criteria and flowcharts.
a, Twitter training dataset from 2006-03-21 to 2022-11-15. b, Twitter validation dataset for image retrieval from 2022-11-16 to 2023-01-15.
Extended Data Fig. 2 Confusion matrix from zero-shot learning in the Kather colon dataset.
a, Confusion matrix of PLIP model; b, Confusion matrix of CLIP model; c, Confusion matrix of the results from predicting the majority class (or Majority in short).
Extended Data Fig. 3 Comparison of image embeddings derived from models in the Kather colon dataset.
a, image embeddings derived from PLIP, b, image embeddings derived from baseline CLIP. c, image embeddings derived from MuDiPath. ADI: Adipose tissue, BACK: background, DEB: debris, LYM: lymphocytes, MUC: mucus, MUS: smooth muscle, NORM: normal colon mucosa, STR: cancer-associated stroma, TUM: colorectal adenocarcinoma epithelium.
Extended Data Fig. 4 Comparison of image embeddings between PLIP, CLIP, and MuDiPath models for the PanNuke dataset.
a, Image embeddings generated by the PLIP model, colored by benign and malignant. b, Image embeddings generated by the PLIP model, colored by 19 pathology subspecialties. Scatters with black edges indicate malignant images. c, Image embeddings generated by the CLIP model, colored by benign and malignant. d, Image embeddings generated by the CLIP model, colored by 19 pathology subspecialties. Scatters with black edges indicate malignant images. e, Image embeddings generated by the MuDiPath model, colored by benign and malignant. f, Image embeddings generated by the MuDiPath model, colored by 19 organs. Scatters with black edges indicate malignant images.
Extended Data Fig. 5 Comparison of image embeddings derived from models in the DigestPath dataset.
a, image embeddings derived from PLIP, b, image embeddings derived from baseline CLIP. c, image embeddings derived from MuDiPath.
Extended Data Fig. 6 Comparison of image embeddings derived from models in the WSSS4LUAD dataset.
a, image embeddings derived from PLIP, b, image embeddings derived from baseline CLIP. c, image embeddings derived from MuDiPath.
Extended Data Fig. 7 Cluster visualization of images in DigestPath dataset.
a, Image patches in low-dimensional space colored by different downsampling rates. b, Visualization of image patches on different clusters.
Extended Data Fig. 8 Comparison to supervised deep learning models.
The fine-tuning was conducted on a, Kather colon dataset training split, b, PanNuke dataset, c, DigestPath dataset, and d, WSSS4LUAD dataset, by comparing the PLIP image encoder to ViT-B/32 (pre-trained on ImageNet). In the line plots, mean values and 95% confidence intervals are presented by using 10 different random seeds for subsetting the data and running the models. The improvements for PLIP are particularly large for smaller datasets. For instance, when comparing the weighted F1 scores across the four datasets using only 1% of the training data: (i) for Kather training split, the PLIP image encoder achieved F1 = 0.952, while ViT-B/32 achieved F1 = 0.921; (ii) for PanNuke dataset, the PLIP image encoder achieved F1 = 0.715, while ViT-B/32 achieved F1 = 0.637; (iii) for DigestPath dataset, the PLIP image encoder achieved F1 = 0.933, while ViT-B/32 achieved F1 = 0.872; (iv) for WSSS4LUAD dataset, the PLIP image encoder achieved F1 = 0.816, while ViT-B/32 achieved F1 = 0.645. When comparing the weighted F1 scores across the four datasets using all of the training data: (i) for Kather training split, the PLIP image encoder achieved F1 = 0.994, while ViT-B/32 achieved F1 = 0.991; (ii) for PanNuke dataset, the PLIP image encoder achieved F1 = 0.962, while ViT-B/32 achieved F1 = 0.938; (iii) for DigestPath dataset, the PLIP image encoder achieved F1 = 0.977, while ViT-B/32 achieved F1 = 0.968; (iv) for WSSS4LUAD dataset, the PLIP image encoder achieved F1 = 0.958, while ViT-B/32 achieved F1 = 0.941.
Extended Data Fig. 9 Text-to-image retrieval performances for Recall@50.
a, Image retrieval performances for Recall@50 within each of the pathology subspecialty-specific hashtags. b, Two-sided Spearman correlations between the number of candidates and fold changes for Recall@50 when comparing the PLIP model with random and CLIP, respectively. In regression plots, the regression estimates are displayed with 95% confidence intervals in grey or purple colors.
Supplementary information
Supplementary Information
Supplementary Discussion, Tables 1–6 and Figs. 1 and 2.
Rights and permissions
Springer Nature or its licensor (e.g. a society or other partner) holds exclusive rights to this article under a publishing agreement with the author(s) or other rightsholder(s); author self-archiving of the accepted manuscript version of this article is solely governed by the terms of such publishing agreement and applicable law.
About this article
Cite this article
Huang, Z., Bianchi, F., Yuksekgonul, M. et al. A visual–language foundation model for pathology image analysis using medical Twitter. Nat Med 29, 2307–2316 (2023). https://doi.org/10.1038/s41591-023-02504-3
Received:
Accepted:
Published:
Issue Date:
DOI: https://doi.org/10.1038/s41591-023-02504-3
This article is cited by
-
Large language models for preventing medication direction errors in online pharmacies
Nature Medicine (2024)
-
Vision–language foundation model for echocardiogram interpretation
Nature Medicine (2024)
-
Artificial intelligence applications in histopathology
Nature Reviews Electrical Engineering (2024)
-
Artificial intelligence in surgery
Nature Medicine (2024)
-
Towards a general-purpose foundation model for computational pathology
Nature Medicine (2024)