Abstract
High-throughput single-cell sequencing technologies hold tremendous potential for defining cell types in an unbiased fashion using gene expression and epigenomic state. A key challenge in realizing this potential is integrating single-cell datasets from multiple protocols, biological contexts, and data modalities into a joint definition of cellular identity. We previously developed an approach, called linked inference of genomic experimental relationships (LIGER), that uses integrative nonnegative matrix factorization to address this challenge. Here, we provide a step-by-step protocol for using LIGER to jointly define cell types from multiple single-cell datasets. The main stages of the protocol are data preprocessing and normalization, joint factorization, quantile normalization and joint clustering, and visualization. We describe how to jointly define cell types from single-cell RNA-seq (scRNA-seq) and single-nucleus ATAC-seq (snATAC-seq) data, but similar steps apply across a wide range of other settings and data types, including cross-species analysis, single-nucleus DNA methylation, and spatial transcriptomics. Our protocol contains examples of expected results, describes common pitfalls, and relies only on our freely available, open-source R implementation of LIGER. We also provide R Markdown tutorials showing the outputs from each individual code segment. The analysis process can be performed in 1–4 h, depending on dataset size, and assumes no specialized bioinformatics training.
This is a preview of subscription content, access via your institution
Access options
Access Nature and 54 other Nature Portfolio journals
Get Nature+, our best-value online-access subscription
$29.99 / 30 days
cancel any time
Subscribe to this journal
Receive 12 print issues and online access
$259.00 per year
only $21.58 per issue
Buy this article
- Purchase on Springer Link
- Instant access to full article PDF
Prices may be subject to local taxes which are calculated during checkout
Similar content being viewed by others
Data availability
The datasets used in this paper are all previously published and publicly available:
• scRNA-seq and snATAC-seq data from human BMMCs, from Granja et al.24, GEO accession code GSE139369.
• scRNA-seq data composed of two datasets of interneurons and oligodendrocytes from the mouse frontal cortex, from Saunders et al.1. Data available at http://dropviz.org/.
• scRNA-seq data from control and interferon-stimulated PBMCs, from Kang et al.18, GEO accession code GSE96583.
Code availability
The code is freely available at https://github.com/MacoskoLab/liger. The code is also available through an assigned DOI at https://doi.org/10.5281/zenodo.3765403.
References
Saunders, A. et al. Molecular diversity and specializations among the cells of the adult mouse brain. Cell 174, 1015–1030.e16 (2018).
Welch, J. D. et al. Single-cell multi-omic integration compares and contrasts features of brain cell identity. Cell 177, 1873–1887.e17 (2019).
Yang, Z. & Michailidis, G. A non-negative matrix factorization method for detecting modules in heterogeneous omics multi-modal data. Bioinformatics 32, 1–8 (2016).
Zhang, A. W. et al. Probabilistic cell-type assignment of single-cell RNA-seq for tumor microenvironment profiling. Nat. Methods 16, 1007–1015 (2019).
Cao, J. et al. Joint profiling of chromatin accessibility and gene expression in thousands of single cells. Science 361, 1380–1385 (2018).
Chen, S., Lake, B. B. & Zhang, K. High-throughput sequencing of the transcriptome and chromatin accessibility in the same cell. Nat. Biotechnol. 37, 1452–1457 (2019).
Clark, S. J. et al. scNMT-seq enables joint profiling of chromatin accessibility DNA methylation and transcription in single cells. Nat. Commun. 9, 781 (2018).
Wang, X. et al. Three-dimensional intact-tissue sequencing of single-cell transcriptional states. Science 361, aat5691 (2018).
Yao, Z. et al. An integrated transcriptomic and epigenomic atlas of mouse primary motor cortex cell types. Preprint at bioRxiv https://doi.org/10.1101/2020.02.29.970558 (2020).
Tran, N. M. et al. Single-cell profiles of retinal ganglion cells differing in resilience to injury reveal neuroprotective genes. Neuron 104, 1039–1055.e12 (2019).
Krienen, F. M. et al. Innovations in primate interneuron repertoire. Preprint at bioRxiv https://doi.org/10.1101/709501 (2019).
Hie, B., Bryson, B. & Berger, B. Efficient integration of heterogeneous single-cell transcriptomes using Scanorama. Nat. Biotechnol. 37, 685–691 (2019).
Barkas, N. et al. Joint analysis of heterogeneous single-cell RNA-seq dataset collections. Nat. Methods 16, 695–698 (2019).
Korsunsky, I. et al. Fast, sensitive and accurate integration of single-cell data with Harmony. Nat. Methods 16, 1289–1296 (2019).
Stuart, T. et al. Comprehensive integration of single-cell data. Cell 177, 1888–1902.e21 (2019).
Tran, H. T. N. et al. A benchmark of batch-effect correction methods for single-cell RNA sequencing data. Genome Biol. 21, 12 (2020).
Büttner, M., Miao, Z., Wolf, F. A., Teichmann, S. A. & Theis, F. J. A test metric for assessing single-cell RNA-seq batch correction. Nat. Methods 16, 43–49 (2019).
Kang, H. M. et al. Multiplexed droplet single-cell RNA-sequencing using natural genetic variation. Nat. Biotechnol. 36, 89–94 (2018).
Svensson, V., da Veiga Beltrame, E. & Pachter, L. Quantifying the tradeoff between sequencing depth and cell number in single-cell RNA-seq. Preprint at bioRxiv https://doi.org/10.1101/762773 (2019).
Montoro, D. T. et al. A revised airway epithelial hierarchy includes CFTR-expressing ionocytes. Nature 560, 319–324 (2018).
Plasschaert, L. W. et al. A single-cell atlas of the airway epithelium reveals the CFTR-rich pulmonary ionocyte. Nature 560, 377–381 (2018).
Welch, J. D., Hu, Y. & Prins, J. F. Robust detection of alternative splicing in a population of single cells. Nucleic Acids Res 44, e73 (2016).
Gao, C. et al. Iterative refinement of cellular identity from single-cell data using online learning. Preprint at bioRxiv https://doi.org/10.1101/2020.01.16.909861 (2020).
Granja, J. M. et al. Single-cell multiomic analysis identifies regulatory programs in mixed-phenotype acute leukemia. Nat. Biotechnol. 37, 1458–1465 (2019).
Acknowledgements
This work was supported by NIH grants R01 AI149669 and R01 HG010883 (J.D.W.) and U19 1U19MH114821 (E.Z.M.).
Author information
Authors and Affiliations
Contributions
J.L., C.G., J.S., and J.D.W. performed the data analysis. J.L., C.G., J.S., and J.D.W. wrote the paper, with input from E.Z.M. and V.K. All authors read and approved the final manuscript.
Corresponding author
Ethics declarations
Competing interests
A patent application on LIGER has been submitted by The Broad Institute, Inc., and The General Hospital Corporation with E.Z.M., J.D.W. and V.K. as inventors.
Additional information
Peer review information Nature Protocols thanks Andrew Adey, Jinmiao Chen and Sarah Teichmann for their contribution to the peer review of this work.
Publisher’s note Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
Related links
Key references using this protocol
Welch, J. D. et al. Cell 177, 1873–1887.e17 (2019): https://doi.org/10.1016/j.cell.2019.05.006
Tran, N. M. et al. Neuron 104, 1039–1055.e12 (2019): https://doi.org/10.1016/j.neuron.2019.11.006
Yao, Z. et al. Preprint at bioRxiv (2020): https://doi.org/10.1101/2020.02.29.970558
Krienen, F. M. et al. Preprint at bioRxiv (2019): https://doi.org/10.1101/709501
Supplementary information
Rights and permissions
About this article
Cite this article
Liu, J., Gao, C., Sodicoff, J. et al. Jointly defining cell types from multiple single-cell datasets using LIGER. Nat Protoc 15, 3632–3662 (2020). https://doi.org/10.1038/s41596-020-0391-8
Received:
Accepted:
Published:
Issue Date:
DOI: https://doi.org/10.1038/s41596-020-0391-8
This article is cited by
-
The covariance environment defines cellular niches for spatial inference
Nature Biotechnology (2024)
-
Single-nucleus multi-omic profiling of human placental syncytiotrophoblasts identifies cellular trajectories during pregnancy
Nature Genetics (2024)
-
A machine learning one-class logistic regression model to predict stemness for single cell transcriptomics and spatial omics
BMC Genomics (2023)
-
Single-cell multi-omics integration for unpaired data by a siamese network with graph-based contrastive loss
BMC Bioinformatics (2023)
-
Benchmarking algorithms for joint integration of unpaired and paired single-cell RNA-seq and ATAC-seq data
Genome Biology (2023)
Comments
By submitting a comment you agree to abide by our Terms and Community Guidelines. If you find something abusive or that does not comply with our terms or guidelines please flag it as inappropriate.