Skip to main content

Thank you for visiting nature.com. You are using a browser version with limited support for CSS. To obtain the best experience, we recommend you use a more up to date browser (or turn off compatibility mode in Internet Explorer). In the meantime, to ensure continued support, we are displaying the site without styles and JavaScript.

  • Resource
  • Published:

Near telomere-to-telomere genome of the model plant Physcomitrium patens

Abstract

The model plant Physcomitrium patens has played a pivotal role in enhancing our comprehension of plant evolution and development. However, the current genome harbours numerous regions that remain unfinished and erroneous. To address these issues, we generated an assembly using Oxford Nanopore reads and Hi-C mapping. The assembly incorporates telomeric and centromeric regions, thereby establishing it as a near telomere-to-telomere genome except a region in chromosome 1 that is not fully assembled due to its highly repetitive nature. This near telomere-to-telomere genome resolves the chromosome number at 26 and provides a gap-free genome assembly as well as updated gene models to aid future studies using this model organism.

This is a preview of subscription content, access via your institution

Access options

Buy this article

Prices may be subject to local taxes which are calculated during checkout

Fig. 1: T2T assembly strategy for the P. patens V6 genome.
Fig. 2: Chromosomal features of the V6 genome.
Fig. 3: Structural conflicts between two distinct versions of particular chromosomes.
Fig. 4: Characterization of centromeres.
Fig. 5: 3D genome attributes of the protonema and gametophore.

Similar content being viewed by others

Data availability

The genome assembly and de novo annotations have been deposited in Figshare at https://doi.org/10.6084/m9.figshare.22975925.v2. The Illumina reads (genomic sequencing and Hi-C) and Nanopore reads generated in this study were deposited into the NCBI SRA with the BioProject ID PRJNA742485. The V5 genome assembly was submitted to NCBI with the WGS accession ABEU00000000 under BioProject ID PRJNA13064. The updated gene model lookup table and merged annotation are available for download from https://peatmoss.plantcode.cup.uni-freiburg.de/ppatens_db/downloads.php. The ChIP-seq data generated in this study are available through NGDC (https://ngdc.cncb.ac.cn) with accession PRJCA016808. The V6 genome and annotation data have been submitted to Phytozome (https://phytozome-next.jgi.doe.gov/) and will be made available in their upcoming release.

References

  1. Cove, D. The moss Physcomitrella patens. Annu. Rev. Genet. 39, 339–358 (2005).

    Article  CAS  PubMed  Google Scholar 

  2. Engel, P. The induction of biochemical and morphological mutants in the moss Physcomitrella patens. Am. J. Bot. 55, 438–446 (1968).

    Article  CAS  Google Scholar 

  3. Frank, W., Ratnadewi, D. & Reski, R. Physcomitrella patens is highly tolerant against drought, salt and osmotic stress. Planta 220, 384–394 (2005).

    Article  CAS  PubMed  Google Scholar 

  4. Schaefer, D. A new moss genetics: targeted mutagenesis in Physcomitrella patens. Annu. Rev. Plant Biol. 53, 477–501 (2001).

    Article  Google Scholar 

  5. Xu, B. et al. Contribution of NAC transcription factors to plant adaptation to land. Science 343, 1505–1508 (2014).

    Article  ADS  CAS  PubMed  Google Scholar 

  6. Rensing, S. A., Goffinet, B., Meyberg, R., Wu, S. & Bezanilla, M. The moss Physcomitrium (Physcomitrella) patens: a model organism for non-seed plants. Plant Cell 32, 1361–1376 (2020).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  7. Vidali, L. & Bezanilla, M. Physcomitrella patens: a model for tip cell growth and differentiation. Curr. Opin. Plant Biol. 15, 625–631 (2012).

    Article  CAS  PubMed  Google Scholar 

  8. Ishikawa, M. et al. Physcomitrella STEMIN transcription factor induces stem cell formation with epigenetic reprogramming. Nat. Plants 5, 681–690 (2019).

    Article  CAS  PubMed  Google Scholar 

  9. Reski, R., Bae, H. & Toft, H. Physcomitrella patens, a versatile synthetic biology chassis. Plant Cell Rep. 37, 1409–1417 (2018).

    Article  CAS  PubMed  Google Scholar 

  10. Rensing, S. et al. The Physcomitrella genome reveals evolutionary insights into the conquest of land by plants. Science 319, 64–69 (2008).

    Article  ADS  CAS  PubMed  Google Scholar 

  11. The Arabidopsis Genome Initiative. Analysis of the genome sequence of the flowering plant Arabidopsis thaliana. Nature 61, 796–815 (2014).

    Google Scholar 

  12. Yu, J. et al. A draft sequence of the rice genome (Oryza sativa L. ssp. indica). Science 296, 79–92 (2002).

    Article  ADS  CAS  PubMed  Google Scholar 

  13. Merchant, S. et al. The Chlamydomonas genome reveals the evolution of key animal and plant functions. Science 318, 245–250 (2007).

    Article  ADS  CAS  PubMed  PubMed Central  Google Scholar 

  14. Lang, D. et al. The Physcomitrella patens chromosome-scale assembly reveals moss genome structure and evolution. Plant J. 93, 515–533 (2018).

    Article  CAS  PubMed  Google Scholar 

  15. Zimmer, A. D. et al. Reannotation and extended community resources for the genome of the non-seed plant Physcomitrella patens provide insights into the evolution of plant gene structures and functions. BMC Genomics 14, 498 (2013).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  16. Nurk, S. et al. The complete sequence of a human genome. Science 376, 44–53 (2022).

    Article  ADS  CAS  PubMed  PubMed Central  Google Scholar 

  17. Song, J. et al. Two gap-free reference genomes and a global view of the centromere architecture in rice. Mol. Plant 21, 1674–2052 (2021).

    Google Scholar 

  18. Li, K. et al. Gapless indica rice genome reveals synergistic contributions of active transposable elements and segmental duplications to rice genome evolution. Mol. Plant 21, 1674–2052 (2021).

    Google Scholar 

  19. Han, X. et al. Two haplotype-resolved, gap-free genome assemblies of Actinidia latifolia and Actinidia chinensis shed light on regulation mechanisms of vitamin C and sucrose metabolism in kiwifruit. Mol. Plant 16, 452–470 (2022).

    Article  PubMed  Google Scholar 

  20. Yue, J. et al. Telomere-to-telomere and gap-free reference genome assembly of the kiwifruit Actinidia chinensis. Hortic. Res. 10, uhac264 (2023).

    Article  PubMed  Google Scholar 

  21. Deng, Y. et al. A telomere-to-telomere gap-free reference genome of watermelon and its mutation library provide important resources for gene discovery and breeding. Mol. Plant 15, 1268–1284 (2022).

    Article  CAS  PubMed  Google Scholar 

  22. Payne, Z. L. et al. A gap-free genome assembly of Chlamydomonas reinhardtii and detection of translocations induced by CRISPR-mediated mutagenesis. Plant Commun. 4, 100493 (2022).

    Article  PubMed  PubMed Central  Google Scholar 

  23. Hu, J. et al. An efficient error correction and accurate assembly tool for noisy long reads. Preprint at bioRxiv https://doi.org/10.1101/2023.03.09.531669 (2023).

  24. Kolmogorov, M., Yuan, J., Lin, Y. & Pevzner, P. Assembly of long, error-prone reads using repeat graphs. Nat. Biotechnol. 37, 540–546 (2019).

    Article  CAS  PubMed  Google Scholar 

  25. Podlevsky, J. D. et al. The telomerase database. Nucleic Acids Res. 36, D339–D3343 (2007).

    Article  PubMed  PubMed Central  Google Scholar 

  26. Ou, S., Chen, J. & Jiang, N. Assessing genome assembly quality using the LTR assembly index (LAI). Nucleic Acids Res. 46, e126 (2018).

    PubMed  PubMed Central  Google Scholar 

  27. Kurtz, S. et al. Versatile and open software for comparing large genomes. Genome Biol. 5, R12 (2003).

    Article  Google Scholar 

  28. Alonge, M. et al. RaGOO: fast and accurate reference-guided scaffolding of draft genomes. Genome Biol. 20, 1 (2019).

    Article  Google Scholar 

  29. Goel, M., Sun, H., Jiao, W. & Schneeberger, K. SyRI: finding genomic rearrangements and local sequence differences from whole-genome assemblies. Genome Biol. 20, 277 (2019).

    Article  PubMed  PubMed Central  Google Scholar 

  30. Garrison, E. & Marth, G. Haplotype-based variant detection from short-read sequencing. Preprint at https://doi.org/10.48550/arXiv.1207.3907 (2012).

  31. Haas, F. B. et al. Single nucleotide polymorphism charting of P. patens reveals accumulation of somatic mutations during in vitro culture on the scale of natural variation by selfing. Front. Plant Sci. 11, 813 (2020).

    Article  PubMed  PubMed Central  Google Scholar 

  32. Zhou, Y. & Song, B.-L. An urgent call on revisions to current genome annotation strategies. Sci. China Life Sci. 66, 1942–1943 (2023).

    Article  PubMed  Google Scholar 

  33. Parry, G. The plant nuclear envelope and regulation of gene expression. J. Exp. Bot. 66, 1673–1685 (2015).

    Article  CAS  PubMed  Google Scholar 

  34. Imaizumi, T. et al. Cryptochrome light signals control development to suppress auxin sensitivity in the moss Physcomitrella patens. Plant Cell 14, 373–386 (2002).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  35. Prigge, M. J. et al. Physcomitrella patens auxin-resistant mutants affect conserved elements of an auxin-signaling pathway. Curr. Biol. 20, 1907–1912 (2010).

    Article  CAS  PubMed  Google Scholar 

  36. Bryan, V. S. Cytotaxonomic studies in the Ephemeraceae and Funariaceae. Bryologist 60, 103–126 (1957).

    Article  Google Scholar 

  37. Reski, R., Faust, M. & Wang, X. Genome analysis of the moss Physcomitrella patens (Hedw.) B.S.G. Mol. Gen. Genet. 244, 352–359 (1994).

    Article  CAS  PubMed  Google Scholar 

  38. Neumann, P. et al. Plant centromeric retrotransposons: a structural and cytogenetic perspective. Mob. DNA 2, 4 (2011).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  39. Zhang, R.-G. et al. TEsorter: an accurate and fast method to classify LTR-retrotransposons in plant genomes. Hortic. Res. 9, uhac017 (2022).

    Article  PubMed  PubMed Central  Google Scholar 

  40. Flynn, J. M. et al. RepeatModeler2 for automated genomic discovery of transposable element families. Proc. Natl Acad. Sci. USA 117, 9451–9457 (2020).

    Article  ADS  CAS  PubMed  PubMed Central  Google Scholar 

  41. Carey, S. B. et al. Gene-rich UV sex chromosomes harbor conserved regulators of sexual development. Sci. Adv. 7, eabh2488 (2021).

    Article  ADS  CAS  PubMed  PubMed Central  Google Scholar 

  42. McClintock, B. The stability of broken ends of chromosomes in Zea mays. Genetics 26, 234–282 (1941).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  43. Bryant, P. & Slijepcevic, P. E. Chromosome healing, telomere capture and mechanisms of radiation-induced chromosome breakage. Int. J. Radiat. Biol. 73, 1 (1998).

    Article  PubMed  Google Scholar 

  44. Kurzhals, R. L. et al. Chromosome healing is promoted by the telomere cap component Hiphop in Drosophila. Genetics 207, 949–959 (2017).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  45. Lieberman-Aiden, E. et al. Comprehensive mapping of long-range interactions reveals folding principles of the human genome. Science 326, 289–293 (2009).

    Article  ADS  CAS  PubMed  PubMed Central  Google Scholar 

  46. Fortin, J.-P. & Kasper, D. H. Reconstructing A/B compartments as revealed by Hi-C using long-range correlations in epigenetic data. Genome Biol. 16, 180 (2015).

    Article  PubMed  PubMed Central  Google Scholar 

  47. Nothjunge, S. et al. DNA methylation signatures follow preformed chromatin compartments in cardiac myocytes. Nat. Commun. 8, 1667 (2017).

    Article  ADS  PubMed  PubMed Central  Google Scholar 

  48. Bannister, A. J. & Kouzarides, T. Regulation of chromatin by histone modifications. Cell Res. 21, 381–395 (2011).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  49. Bian, Q. et al. Histone H3K9 methylation promotes formation of genome compartments in Caenorhabditis elegans via chromosome compaction and perinuclear anchoring. Proc. Natl Acad. Sci. USA 117, 11459–11470 (2020).

    Article  ADS  CAS  PubMed  PubMed Central  Google Scholar 

  50. Yung, W.-S. et al. Histone modifications and chromatin remodelling in plants in response to salt stress. Physiol. Plant. 173, 1495–1513 (2021).

    Article  CAS  PubMed  Google Scholar 

  51. Widiez, T. et al. The chromatin landscape of the moss Physcomitrella patens and its dynamics during development and drought stress. Plant J. 79, 67–81 (2014).

    Article  CAS  PubMed  Google Scholar 

  52. Ashton, N. W. & Cove, D. J. The isolation and preliminary characterisation of auxotrophic and analogue resistant mutants of the moss, Physcomitrella patens. Mol. Gen. Genet. 154, 87–95 (1977).

    Article  Google Scholar 

  53. Schlink, K. & Reski, R. Preparing high-quality DNA from moss (Physcomitrella patens). Plant Mol. Biol. Report. 20, 423–423 (2002).

    Article  Google Scholar 

  54. Chen, S., Zhou, Y., Chen, Y. & Gu, J. fastp: an ultra-fast all-in-one FASTQ preprocessor. Bioinformatics 34, 884–890 (2018).

    Article  Google Scholar 

  55. Vurture, G. W. et al. GenomeScope: fast reference-free genome profiling from short reads. Bioinformatics 33, 2202–2204 (2017).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  56. Marçais, G. & Kingsford, C. A fast, lock-free approach for efficient parallel counting of occurrences of k-mers. Bioinformatics 27, 64–770 (2011).

    Article  Google Scholar 

  57. Danecek, P. et al. Twelve years of SAMtools and BCFtools. Gigascience 10, giab008 (2021).

    Article  PubMed  PubMed Central  Google Scholar 

  58. Ensembl/treebest. Ensembl. https://github.com/Ensembl/treebest (2016).

  59. Chen, Y. et al. Efficient assembly of nanopore reads via highly accurate and intact error correction. Nat. Commun. 12, 60 (2021).

    Article  ADS  CAS  PubMed  PubMed Central  Google Scholar 

  60. Li, H. Minimap2: pairwise alignment for nucleotide sequences. Bioinformatics 34, 3094–3100 (2018).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  61. Langmead, B. & Salzberg, S. Fast gapped-read alignment with Bowtie 2. Nat. Methods 9, 357–359 (2012).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  62. Servant, N. et al. HiC-Pro: an optimized and flexible pipeline for Hi-C data processing. Genome Biol. 16, 259 (2015).

    Article  PubMed  PubMed Central  Google Scholar 

  63. Durand, N. C. et al. Juicer provides a one-click system for analyzing loop-resolution Hi-C experiments. Cell Syst. 3, 95–98 (2016).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  64. Dudchenko, O. et al. De novo assembly of the Aedes aegypti genome using Hi-C yields chromosome-length scaffolds. Science 356, 92–95 (2017).

    Article  ADS  CAS  PubMed  PubMed Central  Google Scholar 

  65. Boratyn, G. M. et al. BLAST: a more efficient report with usability improvements. Nucleic Acids Res. 41, 29–33 (2013).

    Article  Google Scholar 

  66. Wick, R., Schultz, M., Zobel, J. & Holt, K. Bandage: interactive visualization of de novo genome assemblies. Bioinformatics 31, 3350–3352 (2015).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  67. Li, H., Feng, X. & Chu, C. The design and construction of reference pangenome graphs with minigraph. Genome Biol. 21, 265 (2020).

    Article  PubMed  PubMed Central  Google Scholar 

  68. Vaser, R. et al. Fast and accurate de novo genome assembly from long uncorrected reads. Genome Res. 27, 737–746 (2017).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  69. Aury, J.-M. & Istace, B. Hapo-G, haplotype-aware polishing of genome assemblies with accurate reads. NAR Genom. Bioinform. 3, lqab034 (2021).

    Article  PubMed  PubMed Central  Google Scholar 

  70. Kim, D., Paggi, J. M., Park, C., Bennett, C. & Salzberg, S. Graph-based genome alignment and genotyping with HISAT2 and HISAT-genotype. Nat. Biotechnol. 37, 907–915 (2019).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  71. Li, H. & Durbin, R. Fast and accurate short read alignment with Burrows–Wheeler transform. Bioinformatics 25, 1754–1760 (2009).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  72. Li, H. et al. The sequence alignment/map format and SAMtools. Bioinformatics 25, 2078–2079 (2019).

    Article  Google Scholar 

  73. Bankevich, A. et al. SPAdes: a new genome assembly algorithm and its applications to single-cell sequencing. J. Comput. Biol. 19, 455–477 (2012).

    Article  MathSciNet  CAS  PubMed  PubMed Central  Google Scholar 

  74. Pruitt, K., Tatusova, T. & Maglott, D. NCBI reference sequences (RefSeq): a curated non-redundant sequence database of genomes, transcripts and proteins. Nucleic Acids Res. 33, 501–504 (2007).

    Article  Google Scholar 

  75. Beier, S., Tappu, R. & Huson, D. H. in Functional Metagenomics: Tools and Applications (eds Charles, T. C. et al.) 65–74 (Springer Cham, 2017).

  76. Rhie, A. et al. Merqury: reference-free quality, completeness, and phasing assessment for genome assemblies. Genome Biol. 21, 245 (2020).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  77. Zimin, A. V. & Salzberg, S. L. The genome polishing tool POLCA makes fast and accurate corrections in genome assemblies. PLoS Comput. Biol. 16, e1007981 (2020).

    Article  ADS  CAS  PubMed  PubMed Central  Google Scholar 

  78. Zimin, A. et al. The MaSuRCA genome assembler. Bioinformatics 29, 2669–2677 (2013).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  79. Davey, J., Davis, S., Mottram, J. & Ashton, P. Tapestry: validate and edit small eukaryotic genome assemblies with long reads. Preprint at bioRxiv https://doi.org/10.1101/2020.04.24.059402 (2020).

  80. Simão, F. R., Waterhouse, R., Ioannidis, P., Kriventseva, E. & Zdobnov, E. BUSCO: assessing genome assembly and annotation completeness with single-copy orthologs. Bioinformatics 31, 3210–3212 (2015).

    Article  PubMed  Google Scholar 

  81. Tarailo-Graovac, M. & Chen, N. Using RepeatMasker to identify repetitive elements in genomic sequences. Curr. Protoc. Bioinform. 5, 4–10 (2004).

    Google Scholar 

  82. Jurka, J. et al. Repbase Update, a database of eukaryotic repetitive elements. Cytogenet. Genome Res. 110, 462–467 (2005).

    Article  CAS  PubMed  Google Scholar 

  83. Benson, G. Tandem repeats finder: a program to analyze DNA sequences. Nucleic Acids Res. 27, 573–580 (1999).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  84. Edgar, R. & Myers, E. PILER: identification and classification of genomic repeats. Bioinformatics 21, 152–158 (2005).

    Article  Google Scholar 

  85. Price, A., Jones, N. C. & Pevzner, P. De novo identification of repeat families in large genomes. Bioinformatics 21, 351–358 (2005).

    Article  Google Scholar 

  86. Xu, Z. & Wang, H. LTR_FINDER: an efficient tool for the prediction of full-length LTR retrotransposons. Nucleic Acids Res. 35, 265–268 (2007).

    Article  Google Scholar 

  87. Ellinghaus, D., Kurtz, S. & Willhoeft, U. LTRharvest, an efficient and flexible software for de novo detection of LTR retrotransposons. BMC Bioinform. 9, 18 (2007).

    Article  Google Scholar 

  88. Yang, Z. PAML 4: phylogenetic analysis by maximum likelihood. Mol. Biol. Evol. 24, 1586–1591 (2007).

    Article  CAS  PubMed  Google Scholar 

  89. Rensing, S. et al. An ancient genome duplication contributed to the abundance of metabolic genes in the moss Physcomitrella patens. BMC Evol. Biol. 7, 130 (2007).

    Article  PubMed  PubMed Central  Google Scholar 

  90. Sun, P. et al. WGDI: a user-friendly toolkit for evolutionary analyses of whole-genome duplications and ancestral karyotypes. Mol. Plant 15, 1841–1851 (2022).

    Article  CAS  PubMed  Google Scholar 

  91. Filippova, D., Patro, R., Duggal, G. & Kingsford, C. Identification of alternative topological domains in chromatin. Algorithms Mol. Biol. 9, 14 (2014).

    Article  PubMed  PubMed Central  Google Scholar 

  92. Lopez-Delisle, L. et al. pyGenomeTracks: reproducible plots for multivariate genomic data sets. Bioinformatics 37, 422–423 (2021).

    Article  CAS  PubMed  Google Scholar 

  93. Ramírez, F. et al. High-resolution TADs reveal DNA sequences underlying genome organization in flies. Nat. Commun. 9, 189 (2018).

    Article  ADS  PubMed  PubMed Central  Google Scholar 

  94. Paulsen, J., Ali, T. M. & Collas, P. Computational 3D genome modeling using Chrom3D. Nat. Protoc. 13, 1137–1152 (2018).

    Article  CAS  PubMed  Google Scholar 

  95. Pettersen, E. et al. UCSF Chimera—a visualization system for exploratory research and analysis. J. Comput. Chem. 25, 1605–1612 (2004).

    Article  CAS  PubMed  Google Scholar 

  96. Haas, B. et al. Automated eukaryotic gene structure annotation using EVidenceModeler and the Program to Assemble Spliced Alignments. Genome Biol. 9, 7 (2007).

    Article  Google Scholar 

  97. Pertea, M. et al. StringTie enables improved reconstruction of a transcriptome from RNA-seq reads. Nat. Biotechnol. 33, 290–295 (2015).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  98. Stanke, M. et al. AUGUSTUS: ab initio prediction of alternative transcripts. Nucleic Acids Res. 34, 435–439 (2006).

    Article  Google Scholar 

  99. Korf, I. Gene finding in novel genomes. BMC Bioinform. 5, 59 (2004).

    Article  Google Scholar 

  100. Lomsadze, A., Ter-Hovhannisyan, V., Chernoff, Y. & Borodovsky, M. Gene identification in novel eukaryotic genomes by self-training algorithm. Nucleic Acids Res. 33, 6494–6506 (2005).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  101. Keilwagen, J., Hartung, F. & Grau, J. in Gene Prediction: Methods and Protocols (ed. Kollmar, M.) 161–177 (Humana, 2019).

  102. Boeckmann, B. et al. The SWISS-PROT protein knowledgebase and its supplement TrEMBL in 2003. Nucleic Acids Res. 31, 365–370 (2003).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  103. Kanehisa, M., Furumichi, M., Tanabe, M., Sato, Y. & Morishima, K. KEGG: new perspectives on genomes, pathways, diseases and drugs. Nucleic Acids Res. 45, 353–361 (2017).

    Article  Google Scholar 

  104. Aramaki, T. et al. KofamKOALA: KEGG ortholog assignment based on profile HMM and adaptive score threshold. Bioinformatics 36, 2251–2252 (2020).

    Article  CAS  PubMed  Google Scholar 

  105. Huerta-Cepas, J. et al. eggNOG 5.0: a hierarchical, functionally and phylogenetically annotated orthology resource based on 5090 organisms and 2502 viruses. Nucleic Acids Res. 47, 309–314 (2019).

    Article  Google Scholar 

  106. Mitchell, A. et al. InterPro in 2019: improving coverage, classification and access to protein sequence annotations. Nucleic Acids Res. 47, 351–360 (2019).

    Article  Google Scholar 

  107. El-Gebali, S. et al. The Pfam protein families database in 2019. Nucleic Acids Res. 47, 427–432 (2019).

    Article  Google Scholar 

  108. Lu, S. et al. CDD/SPARCLE: the conserved domain database in 2020. Nucleic Acids Res. 48, 265–268 (2020).

    Article  Google Scholar 

  109. Törönen, P., Medlar, A. & Holm, L. PANNZER2: a rapid functional annotation web server. Nucleic Acids Res. 46, 84–88 (2018).

    Article  Google Scholar 

  110. Conesa, A. et al. Blast2GO: a universal tool for annotation, visualization and analysis in functional genomics research. Bioinformatics 21, 3674–3676 (2005).

    Article  CAS  PubMed  Google Scholar 

  111. Chan, P. & Lowe, T. tRNAscan-SE: searching for tRNA genes in genomic sequences. In Gene Prediction: Methods and Protocols Vol. 1962 (ed. Kollman, M.) 1–14 (Humana, 2019).

  112. Nawrocki, E. P. & Eddy, S. Infernal 1.1: 100-fold faster RNA homology searches. Bioinformatics 29, 2933–2935 (2013).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  113. Shumate, A. & Steven, L. S. Liftoff: accurate mapping of gene annotations. Bioinformatics 37, 1639–1643 (2021).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  114. Wu, T. D. et al. in Statistical Genomics: Methods and Protocols (eds Mathé, E. & Davis, S.) 283–334 (Humana, 2016).

  115. Gremme, G., Steinbiss, S. & Kurtz, S. GenomeTools: a comprehensive software library for efficient processing of structured genome annotations. IEEE/ACM Trans. Comput. Biol. Bioinform. 10, 645–656 (2013).

    Article  PubMed  Google Scholar 

  116. Pertea, G. & Pertea, M. GFF utilities: GffRead and GffCompare. F1000Res. 9, 304 (2020).

    Article  Google Scholar 

  117. Quinlan, A. & Hall, I. BEDTools: a flexible suite of utilities for comparing genomic features. Bioinformatics 26, 841–842 (2010).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  118. Li, G. et al. ChIA-PET tool for comprehensive chromatin interaction analysis with paired-end tag sequencing. Genome Biol. 11, R22 (2009).

    Article  Google Scholar 

  119. Tamura, K. et al. MEGA5: molecular evolutionary genetics analysis using maximum likelihood, evolutionary distance, and maximum parsimony methods. Mol. Biol. Evol. 28, 2731–2739 (2011).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  120. Vollger, M. R. et al. StainedGlass: interactive visualization of massive tandem repeat structures with identity heatmaps. Bioinformatics 38, 2049–2051 (2022).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  121. Liu, Y. & Vidali, L. Efficient polyethylene glycol (PEG) mediated transformation of the moss Physcomitrella patens. J. Vis. Exp. 50, e2560 (2011).

    Google Scholar 

  122. Gendrel, A.-V. et al. Profiling histone modification patterns in plants using genomic tiling microarrays. Nat. Methods 2, 213–218 (2005).

    Article  CAS  PubMed  Google Scholar 

  123. Bolger, A. M., Lohse, M. & Usadel, B. Trimmomatic: a flexible trimmer for Illumina sequence data. Bioinformatics 30, 2114–2120 (2014).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  124. Feng, J. et al. Identifying ChIP-seq enrichment using MACS. Nat. Protoc. 7, 1728–1740 (2012).

    Article  CAS  PubMed  Google Scholar 

  125. Gel, B. & Serra, E. karyoploteR: an R/Bioconductor package to plot customizable genomes displaying arbitrary data. Bioinformatics 33, 3088–3090 (2017).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

Download references

Acknowledgements

We thank C. Chen from Huazhong Agricultural University and C. Yu from the Agricultural Genomics Institute at Shenzhen for their advice on chromosome analysis. We also thank H. Chen at Tsinghua University for providing assistance with P. patens genetics. This work was supported by grants from the National Key Research and Development Program of China (no. 2019YFA0906200 to J. Yan), the National Natural Science Foundation of China (nos. 31725002, 32150025 and 32030004 to J.D.), the Bureau of International Cooperation, Chinese Academy of Sciences (no. 172644KYSB20180022 to J.D.), the Shenzhen Science and Technology Program (no. KQTD20180413181837372 to J.D.), the Science Technology and Innovation Commission of Shenzhen Municipality of China (no. ZDSYS20200811142605017 to J. Yan) and the Shenzhen Outstanding Talents Training Fund to J.D. J. Yan acknowledges funding from the Innovation Program of the Chinese Academy of Agricultural Sciences and the Elite Young Scientists Program of CAAS. The gene annotation was carried out in the framework of MAdLand (http://madland.science, DFG priority program 2237). S.A.R. is grateful for funding from the DFG (RE 1697/15–1, 20–1).

Author information

Authors and Affiliations

Authors

Contributions

J. Yan, Y.M. and J.D. conceived the study. J. Yan and J.D. managed the major scientific objectives. G.B. designed the T2T genome assembly, evaluation and data analysis. J. Yao generated the plant materials. H.W. helped with the assembly and annotation. S.Z., M.Z., Y.S. and X.H. collected the sequenced samples. J. Yao, Y.J., Y.M. and J.D. designed and performed ChIP. F.B.H., D.V., M.P. and S.A.R. combined and quality-checked the gene annotations. J. Yan and G.B. led the article preparation, together with J. Yao, H.W. and J.D. All authors read and approved the final article.

Corresponding authors

Correspondence to Jianbin Yan or Junbiao Dai.

Ethics declarations

Competing interests

The authors declare no competing interests.

Peer review

Peer review information

Nature Plants thanks the anonymous reviewers for their contribution to the peer review of this work.

Additional information

Publisher’s note Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Extended data

Extended Data Fig. 1 The process of graph-based gap-filling for the remaining 16 gaps.

a, Fifteen chromosome karyotype maps exhibit the remaining 16 gaps, which are marked with numerical values. The position of the centromere is the region where the chromosome constriction is situated. b, Upstream and downstream sequences of breakpoints (or gaps) that are aligned with the corresponding graphs. The Hi-C heatmaps accurately reveal gaps (indicated by the intersection of green lines) at a resolution of 1 kb. The upstream sequences (blue band) and downstream sequences (green band) of each gap are aligned with the graphs. On the corresponding pathway, the precise locations of 14 intervals where gaps occur on the graph are highlighted (blue and green bands). The gray region between the two bands represents the sequence that requires filling. Overlapping between the two bands suggests sequence redundancy, necessitating trimming and merging. In the case of gap number 11, its existence is attributed to tandem repeats, demanding estimation of the corresponding copy number before initiating gap repair. c, Two gaps occur in the complex region. The upstream and downstream intervals of two gaps are labeled using four distinct colors. The area is characterized by brief repetitive sequences, and to determine the precise paths for gap-filling, the process of mapping nanopore reads onto this region is utilized. Long reads that are capable of traversing the repetitive structure are then extracted to facilitate path building.

Extended Data Fig. 2 Genome assembly validation achieved by analyzing sequencing coverage and depth in relation to the 26 chromosomes in P. patens.

The coverage (0-100%) and depth information of Illumina and ONT sequencing reads on 26 chromosomes are illustrated in the left and right images, respectively. The statistical analysis was performed using a nonoverlapping window of 50 kbp. Except for the repetitive region adjacent to the centromere of Chr01, which was deliberately omitted from the secondary mapping findings, the ONT reads demonstrated comprehensive coverage of all other chromosomal regions. Furthermore, the sequencing depth of the multicopy rRNA region was markedly elevated, exceeding the typical chromosome sequencing depth of 66x, which further supports the notion of the presence of several copies of rRNA.

Extended Data Fig. 3 Taxonomy distribution obtained by analyzing the assembled unmapped short reads against the NR database using MEGAN6.

The tree represents the taxonomic classification of the matched sequences at the class level, with node size indicating the number of matched sequences. The word cloud displays the sequence matching results at the phylum level, with larger words indicating a greater number of matched sequences.

Extended Data Fig. 4 A fundamental overview of the main content presented in this work.

a, Distribution of 17-23 K-mer frequencies in the P. patens genome. b, A radar chart was utilized to show the quality disparity between the V6 genome and its antecedent, V3. The evaluation was based on six distinct indicators, and the findings were scrutinized to identify any discrepancies in quality between the two versions. c, A concise diagram illustrating the process of V6 assembly. d, Results of SyRI analysis showing genome sequence collinearity and structural variants. To ensure the utmost precision in capturing the genuine discrepancies between the two genome versions, the V3 sequence was fragmented into contigs (where N bases were interrupted). Then, using RaGOO software, 26 pseudochromosomes were created to align with V6.

Extended Data Fig. 5 The neighbor-joining cladogram tree of five P. patens accessions built by SNPs derived from Haas et al.31.

The genome sequencing material used in this study is denoted on the tree by an arrow. Bootstrap values under 100 replicates are shown on nodes.

Extended Data Fig. 6 Assembly accuracy validation for Chr25 in V6 by read mapping.

The above depiction aims to compare the level of collinearity displayed by Chr25 in the V3 and V6 versions. The top section of the diagram portrays the amalgamation of two pseudochromosomes in V3, with their boundaries demarcated by a solid black line. The position of the breakpoint in V6 is indicated by a dashed black line. The middle segment of the diagram illustrates the mapping results of nanopore reads (above 10 kbp). The bottom section of the illustration offers a more comprehensive view of the 5 kbp interval encompassing the breakpoint for detailed scrutiny.

Extended Data Fig. 7 Whole genome-wide Hi-C heatmap.

Hi-C interactions among 26 chromosomes at a 500 kbp resolution.

Supplementary information

Supplementary Information

Supplementary Figs. 1–30 and Note.

Reporting Summary

Supplementary Data 1

Supplementary Tables 1–23.

Rights and permissions

Springer Nature or its licensor (e.g. a society or other partner) holds exclusive rights to this article under a publishing agreement with the author(s) or other rightsholder(s); author self-archiving of the accepted manuscript version of this article is solely governed by the terms of such publishing agreement and applicable law.

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Bi, G., Zhao, S., Yao, J. et al. Near telomere-to-telomere genome of the model plant Physcomitrium patens. Nat. Plants 10, 327–343 (2024). https://doi.org/10.1038/s41477-023-01614-7

Download citation

  • Received:

  • Accepted:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1038/s41477-023-01614-7

This article is cited by

Search

Quick links

Nature Briefing: Translational Research

Sign up for the Nature Briefing: Translational Research newsletter — top stories in biotechnology, drug discovery and pharma.

Get what matters in translational research, free to your inbox weekly. Sign up for Nature Briefing: Translational Research