Rapid genotype imputation from sequence without reference panels

Davies, Robert W; Flint, Jonathan; Myers, Simon; Mott, Richard

doi:10.1038/ng.3594

Technical Report
Published: 04 July 2016

Rapid genotype imputation from sequence without reference panels

Robert W Davies¹,
Jonathan Flint²,
Simon Myers^1,3^na1 &
…
Richard Mott ORCID: orcid.org/0000-0002-1022-9330^1,4^na1

Nature Genetics volume 48, pages 965–969 (2016)Cite this article

10k Accesses
96 Citations
153 Altmetric
Metrics details

Subjects

Abstract

Inexpensive genotyping methods are essential for genetic studies requiring large sample sizes. In human studies, array-based microarrays and high-density haplotype reference panels allow efficient genotype imputation for this purpose. However, these resources are typically unavailable in non-human settings. Here we describe a method (STITCH) for imputation based only on sequencing read data, without requiring additional reference panels or array data. We demonstrate its applicability even in settings of extremely low sequencing coverage, by accurately imputing 5.7 million SNPs at a mean r² value of 0.98 in 2,073 outbred laboratory mice (0.15× sequencing coverage). In a sample of 11,670 Han Chinese (1.7× coverage), we achieve accuracy similar to that of alternative approaches that require a reference panel, demonstrating that our approach can work for genetically diverse populations. Our method enables straightforward progression from low-coverage sequence to imputed genotypes, overcoming barriers that at present restrict the application of genome-wide association study technology outside humans.

Access through your institution

Buy or subscribe

This is a preview of subscription content, access via your institution

Access options

Access through your institution

Buy this article

Purchase on Springer Link
Instant access to full article PDF

Buy now

Prices may be subject to local taxes which are calculated during checkout

**Figure 2: Performance of STITCH on CFW mice in comparison to external validation.**

**Figure 3: Performance of STITCH on CONVERGE humans in comparison to external validation.**

**Figure 4: Effects of reduced sequence coverage.**

Exome-wide analysis implicates rare protein-altering variants in human handedness

Article Open access 02 April 2024

Dick Schijven, Sourena Soheili-Nezhad, … Clyde Francks

The variation and evolution of complete human centromeres

Article Open access 03 April 2024

Glennis A. Logsdon, Allison N. Rozanski, … Evan E. Eichler

Genome-wide association studies

Article 26 August 2021

Emil Uffelmann, Qin Qin Huang, … Danielle Posthuma

References

Welter, D. et al. The NHGRI GWAS Catalog, a curated resource of SNP–trait associations. Nucleic Acids Res. 42, D1001–D1006 (2014).
Article CAS Google Scholar
International HapMap Consortium. A second generation human haplotype map of over 3.1 million SNPs. Nature 449, 851–861 (2007).
1000 Genomes Project Consortium. An integrated map of genetic variation from 1,092 human genomes. Nature 491, 56–65 (2012).
Delaneau, O., Zagury, J.-F. & Marchini, J. Improved whole-chromosome phasing for disease and population genetic studies. Nat. Methods 10, 5–6 (2013).
Article CAS Google Scholar
Howie, B.N., Donnelly, P. & Marchini, J. A flexible and accurate genotype imputation method for the next generation of genome-wide association studies. PLoS Genet. 5, e1000529 (2009).
Article Google Scholar
Li, Y., Willer, C.J., Ding, J., Scheet, P. & Abecasis, G.R. MaCH: using sequence and genotype data to estimate haplotypes and unobserved genotypes. Genet. Epidemiol. 34, 816–834 (2010).
Article Google Scholar
Browning, S.R. & Browning, B.L. Rapid and accurate haplotype phasing and missing-data inference for whole-genome association studies by use of localized haplotype clustering. Am. J. Hum. Genet. 81, 1084–1097 (2007).
Article CAS Google Scholar
Howie, B., Fuchsberger, C., Stephens, M., Marchini, J. & Abecasis, G.R. Fast and accurate genotype imputation in genome-wide association studies through pre-phasing. Nat. Genet. 44, 955–959 (2012).
Article CAS Google Scholar
Swarts, K. et al. Novel methods to optimize genotypic imputation for low-coverage, next-generation sequence data in crop plants. Plant Genome http://dx.doi.org/10.3835/plantgenome2014.05.0023 (2014).
Huang, B.E. & George, A.W. R/mpMap: a computational platform for the genetic analysis of multiparent recombinant inbred lines. Bioinformatics 27, 727–729 (2011).
Article CAS Google Scholar
Sargolzaei, M., Chesnais, J.P. & Schenkel, F.S. A new approach for efficient genotype imputation using information from relatives. BMC Genomics 15, 478 (2014).
Article Google Scholar
VanRaden, P.M., Sun, C. & O'Connell, J.R. Fast imputation using medium or low-coverage sequence data. BMC Genet. 16, 82 (2015).
Article Google Scholar
Didion, J.P. et al. Discovery of novel variants in genotyping arrays improves genotype retention and reduces ascertainment bias. BMC Genomics 13, 34 (2012).
Article CAS Google Scholar
Pasaniuc, B. et al. Extremely low-coverage sequencing and imputation increases power for genome-wide association studies. Nat. Genet. 44, 631–635 (2012).
Article CAS Google Scholar
CONVERGE Consortium. Sparse whole-genome sequencing identifies two loci for major depressive disorder. Nature 523, 588–591 (2015).
Scheet, P. & Stephens, M. A fast and flexible statistical model for large-scale population genotype data: applications to inferring missing genotypes and haplotypic phase. Am. J. Hum. Genet. 78, 629–644 (2006).
Article CAS Google Scholar
Nicod, J. et al. Genome-wide association of multiple complex traits in outbred mice by ultra-low-coverage sequencing. Nat. Genet. http://dx.doi.org/10.1038/ng.3595 (2016).
Yalcin, B. et al. Commercially available outbred mice for genome-wide association studies. PLoS Genet. 6, e1001085 (2010).
Article Google Scholar
Keane, T.M. et al. Mouse genomic variation and its effect on phenotypes and gene regulation. Nature 477, 289–294 (2011).
Article CAS Google Scholar
DePristo, M.A. et al. A framework for variation discovery and genotyping using next-generation DNA sequencing data. Nat. Genet. 43, 491–498 (2011).
Article CAS Google Scholar
Freedman, A.H. et al. Genome sequencing highlights the dynamic early history of dogs. PLoS Genet. 10, e1004016 (2014).
Article Google Scholar
Bovine HapMap Consortium. Genome-wide survey of SNP variation uncovers the genetic structure of cattle breeds. Science 324, 528–532 (2009).
Daetwyler, H.D. et al. Whole-genome sequencing of 234 bulls facilitates mapping of monogenic and complex traits in cattle. Nat. Genet. 46, 858–865 (2014).
Article CAS Google Scholar
VanBuren, R. et al. Single-molecule sequencing of the desiccation-tolerant grass Oropetium thomaeum. Nature 527, 508–511 (2015).
Article CAS Google Scholar
Li, H. & Durbin, R. Fast and accurate short read alignment with Burrows–Wheeler transform. Bioinformatics 25, 1754–1760 (2009).
Article CAS Google Scholar
Lunter, G. & Goodson, M. Stampy: a statistical algorithm for sensitive and fast mapping of Illumina sequence reads. Genome Res. 21, 936–939 (2011).
Article CAS Google Scholar
McKenna, A. et al. The Genome Analysis Toolkit: a MapReduce framework for analyzing next-generation DNA sequencing data. Genome Res. 20, 1297–1303 (2010).
Article CAS Google Scholar

Download references

Acknowledgements

R.W.D. is supported by a grant from the Wellcome Trust (097308/Z/11/Z). S.M. is supported by Investigator Award 098387/Z/12/Z. This work was funded by the Wellcome Trust (WT090532/Z/09/Z, WT083573/Z/07/Z, WT089269/Z/09/Z and WT098387/Z/12/Z).

Author information

Simon Myers and Richard Mott: These authors contributed equally to this work.

Authors and Affiliations

Wellcome Trust Centre for Human Genetics, University of Oxford, Oxford, UK
Robert W Davies, Simon Myers & Richard Mott
Center for Neurobehavioral Genetics, Semel Institute for Neuroscience and Human Behavior, University of California, Los Angeles, Los Angeles, California, USA
Jonathan Flint
Department of Statistics, University of Oxford, Oxford, UK
Simon Myers
UCL Genetics Institute, University College London, London, UK
Richard Mott

Authors

Robert W Davies
View author publications
You can also search for this author in PubMed Google Scholar
Jonathan Flint
View author publications
You can also search for this author in PubMed Google Scholar
Simon Myers
View author publications
You can also search for this author in PubMed Google Scholar
Richard Mott
View author publications
You can also search for this author in PubMed Google Scholar

Contributions

R.W.D., S.M., and R.M. developed the method. R.W.D. wrote the algorithm and performed analyses. J.F. and R.M. conceived and managed the CFW and CONVERGE projects. All authors contributed to study design, drafted the paper, and reviewed and contributed to the final manuscript.

Corresponding author

Correspondence to Robert W Davies.

Ethics declarations

Competing interests

The authors declare no competing financial interests.

Supplementary information

Supplementary Text and Figures

Supplementary Tables 1–8 and Supplementary Note. (PDF 1385 kb)

Rights and permissions

Reprints and permissions

About this article

Cite this article

Davies, R., Flint, J., Myers, S. et al. Rapid genotype imputation from sequence without reference panels. Nat Genet 48, 965–969 (2016). https://doi.org/10.1038/ng.3594

Download citation

Received: 08 December 2015
Accepted: 24 May 2016
Published: 04 July 2016
Issue Date: August 2016
DOI: https://doi.org/10.1038/ng.3594

This article is cited by

Germline variants associated with toxicity to immune checkpoint blockade
- Stefan Groha
- Sarah Abou Alaiwi
- Alexander Gusev
Nature Medicine (2022)
Dissecting indirect genetic effects from peers in laboratory mice
- Amelie Baud
- Francesco Paolo Casale
- Oliver Stegle
Genome Biology (2021)
Estimation of cell-free fetal DNA fraction from maternal plasma based on linkage disequilibrium information
- Jia Ju
- Jia Li
- Xin Jin
npj Genomic Medicine (2021)
Impact of pre- and post-variant filtration strategies on imputation
- Céline Charon
- Rodrigue Allodji
- Jean-François Deleuze
Scientific Reports (2021)
Genome-Wide Identification of Cis-acting Expression QTLs in Large Yellow Croaker
- Dan Jiang
- Wanbo Li
- Ming Fang
Marine Biotechnology (2021)