Skip to main content

Thank you for visiting nature.com. You are using a browser version with limited support for CSS. To obtain the best experience, we recommend you use a more up to date browser (or turn off compatibility mode in Internet Explorer). In the meantime, to ensure continued support, we are displaying the site without styles and JavaScript.

  • Article
  • Published:

Endosymbiotic origin and differential loss of eukaryotic genes

Abstract

Chloroplasts arose from cyanobacteria, mitochondria arose from proteobacteria. Both organelles have conserved their prokaryotic biochemistry, but their genomes are reduced, and most organelle proteins are encoded in the nucleus. Endosymbiotic theory posits that bacterial genes in eukaryotic genomes entered the eukaryotic lineage via organelle ancestors. It predicts episodic influx of prokaryotic genes into the eukaryotic lineage, with acquisition corresponding to endosymbiotic events. Eukaryotic genome sequences, however, increasingly implicate lateral gene transfer, both from prokaryotes to eukaryotes and among eukaryotes, as a source of gene content variation in eukaryotic genomes, which predicts continuous, lineage-specific acquisition of prokaryotic genes in divergent eukaryotic groups. Here we discriminate between these two alternatives by clustering and phylogenetic analysis of eukaryotic gene families having prokaryotic homologues. Our results indicate (1) that gene transfer from bacteria to eukaryotes is episodic, as revealed by gene distributions, and coincides with major evolutionary transitions at the origin of chloroplasts and mitochondria; (2) that gene inheritance in eukaryotes is vertical, as revealed by extensive topological comparison, sparse gene distributions stemming from differential loss; and (3) that continuous, lineage-specific lateral gene transfer, although it sometimes occurs, does not contribute to long-term gene content evolution in eukaryotic genomes.

This is a preview of subscription content, access via your institution

Access options

Rent or buy this article

Prices vary by article type

from$1.95

to$39.95

Prices may be subject to local taxes which are calculated during checkout

Figure 1: Distribution of taxa in EPCs.
Figure 2: Occurrence in the sister group versus proteome size.
Figure 3: Comparison of sets of trees for single-copy genes in eukaryotic groups.
Figure 4: Eukaryote–prokaryote sequence identities for genes with a tip distribution in eukaryotes versus those whose distributions trace their presence to a more ancient ancestor.

Similar content being viewed by others

References

  1. Koonin, E. V., Makarova, K. S. & Aravind, L. Horizontal gene transfer in prokaryotes: quantification and classification. Annu. Rev. Microbiol. 55, 709–742 (2001)

    CAS  PubMed  PubMed Central  Google Scholar 

  2. Doolittle, W. F. Phylogenetic classification and the universal tree. Science 284, 2124–2128 (1999)

    CAS  PubMed  Google Scholar 

  3. Ochman, H., Lawrence, J. G. & Groisman, E. A. Lateral gene transfer and the nature of bacterial innovation. Nature 405, 299–304 (2000)

    ADS  CAS  PubMed  Google Scholar 

  4. Lang, A. S., Zhaxybayeva, O. & Beatty, J. T. Gene transfer agents: phage-like elements of genetic exchange. Nature Rev. Microbiol. 10, 472–482 (2012)

    CAS  Google Scholar 

  5. Rasko, D. A. et al. The pangenome structure of Escherichia coli: comparative genomic analysis of E. coli commensal and pathogenic isolates. J. Bacteriol. 190, 6881–6893 (2008)

    CAS  PubMed  PubMed Central  Google Scholar 

  6. Lobkovsky, A. E., Wolf, Y. I. & Koonin, E. V. Gene frequency distributions reject a neutral model of genome evolution. Genome Biol. Evol. 5, 233–242 (2013)

    PubMed  PubMed Central  Google Scholar 

  7. Szathmáry, E. & Maynard Smith, J. The major evolutionary transitions. Nature 374, 227–232 (1995)

    ADS  PubMed  Google Scholar 

  8. Nei, M. Mutation-Driven Evolution (Oxford Univ. Press, 2013)

    Google Scholar 

  9. Timmis, J. N., Ayliffe, M. A., Huang, C. Y. & Martin, W. Endosymbiotic gene transfer: organelle genomes forge eukaryotic chromosomes. Nature Rev. Genet. 5, 123–135 (2004)

    CAS  PubMed  Google Scholar 

  10. Lane, C. E. & Archibald, J. M. The eukaryotic tree of life: endosymbiosis takes its TOL. Trends Ecol. Evol. 23, 268–275 (2008)

    PubMed  Google Scholar 

  11. Archibald, J. M. One plus One Equals One: Symbiosis and the Evolution of Complex Life (Oxford Univ. Press, 2014)

    Google Scholar 

  12. Andersson, J. O. Lateral gene transfer in eukaryotes. Cell. Mol. Life Sci. 62, 1182–1197 (2005)

    ADS  CAS  PubMed  Google Scholar 

  13. Keeling, P. J. & Palmer, J. D. Horizontal gene transfer in eukaryotic evolution. Nature Rev. Genet. 9, 605–618 (2008)

    CAS  PubMed  Google Scholar 

  14. Price, D. C. et al. Cyanophora paradoxa genome elucidates origin of photosynthesis in algae and plants. Science 335, 843–847 (2012)

    ADS  CAS  PubMed  Google Scholar 

  15. Boto, L. Horizontal gene transfer in the acquisition of novel traits by metazoans. Proc. R. Soc. B 281, 20132450 (2014)

    PubMed  PubMed Central  Google Scholar 

  16. Huang, J. L. Horizontal gene transfer in eukaryotes: the weak-link model. Bioessays 35, 868–875 (2013)

    CAS  PubMed  PubMed Central  Google Scholar 

  17. Crisp, A., Boschetti, C., Perry, M., Tunnacliffe, A. & Micklem, G. Expression of multiple horizontally acquired genes is a hallmark of both vertebrate and invertebrate genomes. Genome Biol. 16, 50 (2015)

    PubMed  PubMed Central  Google Scholar 

  18. Gould, S. B., Waller, R. R. & McFadden, G. I. Plastid evolution. Annu. Rev. Plant Biol. 59, 491–517 (2008)

    CAS  PubMed  Google Scholar 

  19. Curtis, B. A. et al. Algal genomes reveal evolutionary mosaicism and the fate of nucleomorphs. Nature 492, 59–65 (2012)

    ADS  CAS  PubMed  Google Scholar 

  20. Alsmark, C. et al. Patterns of prokaryotic lateral gene transfers affecting parasitic microbial eukaryotes. Genome Biol. 14, R19 (2013)

    PubMed  PubMed Central  Google Scholar 

  21. Keeling, P. J. & Inagaki, Y. A class of eukaryotic GTPase with a punctate distribution suggesting multiple functional replacements of translation elongation factor 1α. Proc. Natl Acad. Sci. USA 101, 15380–15385 (2004)

    ADS  CAS  PubMed  PubMed Central  Google Scholar 

  22. Steel, M., Penny, D. & Lockhart, P. J. Confidence in evolutionary trees from biological sequence data. Nature 364, 440–442 (1993)

    ADS  CAS  PubMed  Google Scholar 

  23. Lockhart, P. J. et al. A covariotide model explains apparent phylogenetic structure of oxygenic photosynthetic lineages. Mol. Biol. Evol. 15, 1183–1188 (1998)

    CAS  PubMed  Google Scholar 

  24. Guo, Z. H. & Stiller, J. W. Comparative genomics and evolution of proteins associated with RNA polymerase II C-terminal domain. Mol. Biol. Evol. 22, 2166–2178 (2005)

    CAS  PubMed  Google Scholar 

  25. Semple, C. & Steel, M. Phylogenetics (Oxford Univ. Press, 2003)

    MATH  Google Scholar 

  26. Hughes, A. L. & Friedman, R. Loss of ancestral genes in the genomic evolution of Ciona intestinalis . Evol. Dev. 7, 196–200 (2005)

    CAS  PubMed  Google Scholar 

  27. Müller, M. et al. Biochemistry and evolution of anaerobic energy metabolism in eukaryotes. Microbiol. Mol. Biol. Rev. 76, 444–495 (2012)

    PubMed  PubMed Central  Google Scholar 

  28. Kondo, N., Nikoh, N., Ijichi, N., Shimada, M. & Fukatsu, T. Genome fragment of Wolbachia endosymbiont transferred to X chromosome of host insect. Proc. Natl Acad. Sci. USA 99, 14280–14285 (2002)

    ADS  CAS  PubMed  PubMed Central  Google Scholar 

  29. Husnik, F. et al. Horizontal gene transfer from diverse bacteria to an insect genome enables a tripartite nested mealybug symbiosis. Cell 153, 1567–1578 (2013)

    CAS  PubMed  Google Scholar 

  30. Mi, S. et al. Syncytin is a captive retroviral envelope protein involved in human placental morphogenesis. Nature 403, 785–789 (2000)

    ADS  CAS  PubMed  Google Scholar 

  31. Derelle, R. et al. Bacterial proteins pinpoint a single eukaryotic root. Proc. Natl Acad. Sci. USA 112, E693–E699 (2015)

    CAS  PubMed  PubMed Central  Google Scholar 

  32. Rivera, M. C., Jain, R., Moore, J. E. & Lake, J. A. Genomic evidence for two functionally distinct gene classes. Proc. Natl Acad. Sci. USA 95, 6239–6244 (1998)

    ADS  CAS  PubMed  PubMed Central  Google Scholar 

  33. Lane, N. & Martin, W. The energetics of genome complexity. Nature 467, 929–934 (2010)

    ADS  CAS  PubMed  Google Scholar 

  34. Williams, T. A., Foster, P. G., Cox, C. J. & Embley, T. M. An archaeal origin of eukaryotes supports only two primary domains of life. Nature 504, 231–236 (2013)

    ADS  CAS  PubMed  Google Scholar 

  35. Guy, L., Saw, J. H. & Ettema, T. J. G. The archaeal legacy of eukaryotes: a phylogenomic perspective. Cold Spring Harb. Perspect. Biol. 6, a016022 (2014)

    PubMed  PubMed Central  Google Scholar 

  36. Koonin, E. V. & Yutin, N. The dispersed archaeal eukaryome and the complex archaeal ancestor of eukaryotes. Cold Spring Harb. Perspect. Biol. 6, a016188 (2014)

    PubMed  PubMed Central  Google Scholar 

  37. Cotton, J. A. & McInerney, J. O. Eukaryotic genes of archaebacterial origin are more important than the more numerous eubacterial genes, irrespective of function. Proc. Natl Acad. Sci. USA 107, 17252–17255 (2010)

    ADS  CAS  PubMed  PubMed Central  Google Scholar 

  38. Moran, N. A., McCutcheon, J. P. & Nakabachi, A. Genomics and evolution of heritable bacterial symbionts. Annu. Rev. Genet. 42, 165–190 (2008)

    CAS  PubMed  Google Scholar 

  39. John, P. & Whatley, F. R. Paracoccus denitrificans and the evolutionary origin of the mitochondrion. Nature 254, 495–498 (1975)

    ADS  CAS  PubMed  Google Scholar 

  40. Koonin, E. V. & Wolf, Y. I. Genomics of bacteria and archaea: the emerging dynamic view of the prokaryotic world. Nucleic Acids Res. 36, 6688–6719 (2008)

    CAS  PubMed  PubMed Central  Google Scholar 

  41. Parfrey, L. W., Lahr, D. J. G., Knoll, A. H. & Katz, L. A. Estimating the timing of early eukaryotic diversification with multigene molecular clocks. Proc. Natl Acad. Sci. USA 108, 13624–13629 (2011)

    ADS  CAS  PubMed  PubMed Central  Google Scholar 

  42. Margulis, L., Dolan, M. F. & Guerrero, R. The chimeric eukaryote: origin of the nucleus from the karyomastigont in amitochondriate protists. Proc. Natl Acad. Sci. USA 97, 6954–6959 (2000)

    ADS  CAS  PubMed  PubMed Central  Google Scholar 

  43. Fuerst, J. A. & Sagulenko, E. Keys to eukaryality: Planctomycetes and ancestral evolution of cellular complexity. Front. Microbiol. 3, 167 (2012)

    PubMed  PubMed Central  Google Scholar 

  44. Domman, D., Horn, M., Embley, T. M. & Williams, T. A. Plastid establishment did not require a chlamydial partner. Nature Commun. 6, 6421 (2015)

    ADS  Google Scholar 

  45. Hug, L. A., Stechmann, A. & Roger, A. J. Phylogenetic distributions and histories of proteins involved in anaerobic pyruvate metabolism in eukaryotes. Mol. Biol. Evol. 27, 311–324 (2010)

    CAS  PubMed  Google Scholar 

  46. Kleine, T., Maier, U. G. & Leister, D. DNA transfer from organelles to the nucleus: the idiosyncratic genetics of endosymbiosis. Annu. Rev. Plant Biol. 60, 115–138 (2009)

    CAS  PubMed  Google Scholar 

  47. Yue, J. P., Hu, X. Y., Sun, H., Yang, Y. P. & Huang, J. L. Widespread impact of horizontal gene transfer on plant colonization of land. Nature Commun. 3, 1152 (2012)

    ADS  Google Scholar 

  48. Wolf, Y. I. & Koonin, E. V. Genome reduction as the dominant mode of evolution. Bioessays 35, 829–837 (2013)

    PubMed  PubMed Central  Google Scholar 

  49. Hao, W. L. & Golding, G. B. The fate of laterally transferred genes: life in the fast lane to adaptation or death. Genome Res. 16, 636–643 (2006)

    CAS  PubMed  PubMed Central  Google Scholar 

  50. Treangen, T. J. & Rocha, E. P. C. Horizontal transfer, not duplication, drives the expansion of protein families in prokaryotes. PLoS Genet. 7, e1001284 (2011)

    CAS  PubMed  PubMed Central  Google Scholar 

  51. Nelson-Sathi, S. et al. Origins of major archaeal clades correspond to gene acquisitions from bacteria. Nature 517, 77–80 (2015)

    ADS  CAS  PubMed  Google Scholar 

  52. Altschul, S. F. et al. Gapped BLAST and PSI-BLAST: a new generation of protein database search programs. Nucleic Acids Res. 25, 3389–3402 (1997)

    CAS  PubMed  PubMed Central  Google Scholar 

  53. Tatusov, R. L., Koonin, E. V. & Lipman, D. J. A genomic perspective on protein families. Science 278, 631–637 (1997)

    ADS  CAS  PubMed  Google Scholar 

  54. Rice, P., Longden, I. & Bleasby, A. EMBOSS: the European Molecular Biology Open Software Suite. Trends Genet. 16, 276–277 (2000)

    CAS  PubMed  Google Scholar 

  55. Enright, A. J., Van Dongen, S. & Ouzounis, C. A. An efficient algorithm for large-scale detection of protein families. Nucleic Acids Res. 30, 1575–1584 (2002)

    CAS  PubMed  PubMed Central  Google Scholar 

  56. Apic, G., Gough, J. & Teichmann, S. A. Domain combinations in archaeal, eubacterial and eukaryotic proteomes. J. Mol. Biol. 310, 311–325 (2001)

    CAS  PubMed  Google Scholar 

  57. Powell, S. et al. eggNOG v4.0: nested orthology inference across 3686 organisms. Nucleic Acids Res. 42, D231–D239 (2014)

    CAS  PubMed  Google Scholar 

  58. Tatusov, R. et al. The COG database: an updated version includes eukaryotes. BMC Bioinform. 4, 41 (2003)

    Google Scholar 

  59. Yoon, H. S., Muller, K. M., Sheath, R. G., Ott, F. D. & Bhattacharya, D. Defining the major lineages of red algae (Rhodophyta). J. Phycol. 42, 482–492 (2006)

    CAS  Google Scholar 

  60. James, T. Y. et al. Reconstructing the early evolution of fungi using a six-gene phylogeny. Nature 443, 818–822 (2006)

    ADS  CAS  PubMed  Google Scholar 

  61. Okamoto, N., Chantangsi, C., Horak, A., Leander, B. S. & Keeling, P. J. Molecular phylogeny and description of the novel katablepharid Roombia truncata gen. et sp. nov., and establishment of the Hacrobia taxon nov. PLoS ONE 4, e7080 (2009)

    ADS  PubMed  PubMed Central  Google Scholar 

  62. Hampl, V. et al. Phylogenomic analyses support the monophyly of Excavata and resolve relationships among eukaryotic “supergroups”. Proc. Natl Acad. Sci. USA 106, 3859–3864 (2009)

    ADS  CAS  PubMed  PubMed Central  Google Scholar 

  63. Janouškovec, J., Horák, A., Oborník, M., Lukeš, J. & Keeling, P. J. A common red algal origin of the apicomplexan, dinoflagellate, and heterokont plastids. Proc. Natl Acad. Sci. USA 107, 10949–10954 (2010)

    ADS  PubMed  PubMed Central  Google Scholar 

  64. Lahr, D. J. G., Grant, J., Nguyen, T., Lin, J. H. & Katz, L. A. Comprehensive phylogenetic reconstruction of Amoebozoa based on concatenated analyses of SSU-rDNA and actin genes. PLoS ONE 6, e22780 (2011)

    ADS  CAS  PubMed  PubMed Central  Google Scholar 

  65. Adl, S. M. et al. The revised classification of eukaryotes. J. Eukaryot. Microbiol. 59, 429–493 (2012)

    PubMed  PubMed Central  Google Scholar 

  66. Leliaert, F. et al. Phylogeny and molecular evolution of the green algae. Crit. Rev. Plant Sci. 31, 1–46 (2012)

    Google Scholar 

  67. Keeling, P. J. The number, speed, and impact of plastid endosymbioses in eukaryotic evolution. Annu. Rev. Plant Biol. 64, 583–607 (2013)

    CAS  PubMed  Google Scholar 

  68. Jackson, C. J. & Reyes-Prieto, A. The mitochondrial genomes of the glaucophytes Gloeochaete wittrockiana and Cyanoptyche gloeocystis: multilocus phylogenetics suggests a monophyletic Archaeplastida. Genome Biol. Evol. 6, 2774–2785 (2014)

    CAS  PubMed  PubMed Central  Google Scholar 

  69. Katoh, K. & Standley, D. M. MAFFT multiple sequence alignment software version 7: improvements in performance and usability. Mol. Biol. Evol. 30, 772–780 (2013)

    CAS  PubMed  PubMed Central  Google Scholar 

  70. Stamatakis, A. RAxML-VI-HPC: maximum likelihood-based phylogenetic analyses with thousands of taxa and mixed models. Bioinformatics 22, 2688–2690 (2006)

    CAS  PubMed  Google Scholar 

  71. Yutin, N. & Galperin, M. Y. A genomic update on clostridial phylogeny: Gram-negative spore formers and other misplaced clostridia. Environ. Microbiol. 15, 2631–2641 (2013)

    CAS  PubMed  PubMed Central  Google Scholar 

  72. Landan, G. & Graur, D. Heads or tails: a simple reliability check for multiple sequence alignments. Mol. Biol. Evol. 24, 1380–1383 (2007)

    CAS  PubMed  Google Scholar 

  73. Landan, G. & Graur, D. Local reliability measures from sets of co-optimal multiple sequence alignments. Pacif. Symp. Biocomput. 13, 15–24 (2008)

    Google Scholar 

  74. Shimodaira, H. & Hasegawa, M. CONSEL: for assessing the confidence of phylogenetic tree selection. Bioinformatics 17, 1246–1247 (2001)

    CAS  PubMed  Google Scholar 

  75. Robinson, D. F. & Foulds, L. R. Comparison of phylogenetic trees. Math. Biosci. 53, 131–147 (1981)

    MathSciNet  MATH  Google Scholar 

  76. Felsenstein, J. Inferring phylogenies from protein sequences by parsimony, distance, and likelihood methods. Methods Enzymol. 266, 418–427 (1996)

    CAS  PubMed  Google Scholar 

  77. Guindon, S. et al. New algorithms and methods to estimate maximum-likelihood phylogenies: assessing the performance of PhyML 3.0. Syst. Biol. 59, 307–321 (2010)

    CAS  PubMed  Google Scholar 

  78. Whelan, S. & Goldman, N. A general empirical model of protein evolution derived from multiple protein families using a maximum-likelihood approach. Mol. Biol. Evol. 18, 691–699 (2001)

    CAS  PubMed  Google Scholar 

  79. Ku, C. et al. Endosymbiotic gene transfer from prokaryotic pangenomes: inherited chimerism in eukaryotes. Proc. Natl. Acad. Sci. USA (2015)

  80. Benjamini, Y. & Hochberg, Y. Controlling the false discovery rate: a practical and powerful approach to multiple testing. J. Roy. Stat. Soc. B 57, 289–300 (1995)

    MathSciNet  MATH  Google Scholar 

  81. Benjamini, Y. & Yekutieli, D. The control of the false discovery rate in multiple testing under dependency. Ann. Stat. 29, 1165–1188 (2001)

    MathSciNet  MATH  Google Scholar 

  82. Zar, J. H. Biostatistical Analysis Ch. 22 (Pearson, 2014)

    Google Scholar 

  83. Dagan, T. & Martin, W. Ancestral genome sizes specify the minimum rate of lateral gene transfer during prokaryote evolution. Proc. Natl Acad. Sci. USA 104, 870–875 (2007)

    ADS  CAS  PubMed  PubMed Central  Google Scholar 

  84. Petitjean, C., Deschamps, P., Lopez-Garcia, P. & Moreira, D. Rooting the domain Archaea by phylogenomic analysis supports the foundation of the new kingdom Proteoarchaeota. Genome Biol. Evol. 7, 191–204 (2015)

    Google Scholar 

  85. Camacho, C. et al. BLAST+: architecture and applications. BMC Bioinform. 10, 421 (2009)

    Google Scholar 

  86. Hazkani-Covo, E. & Graur, D. A comparative analysis of numt evolution in human and chimpanzee. Mol. Biol. Evol. 24, 13–18 (2007)

    CAS  PubMed  Google Scholar 

  87. Martin, W. & Schnarrenberger, C. The evolution of the Calvin cycle from prokaryotic to eukaryotic chromosomes: a case study of functional redundancy in ancient pathways through endosymbiosis. Curr. Genet. 32, 1–18 (1997)

    CAS  PubMed  Google Scholar 

  88. Maier, U. G. et al. Massively convergent evolution for ribosomal protein gene content in plastid and mitochondrial genomes. Genome Biol. Evol. 5, 2318–2329 (2013)

    PubMed  PubMed Central  Google Scholar 

  89. de Vries, J. & Wackernagel, W. Integration of foreign DNA during natural transformation of Acinetobacter sp. by homology-facilitated illegitimate recombination. Proc. Natl Acad. Sci. USA 99, 2094–2099 (2002)

    ADS  CAS  PubMed  PubMed Central  Google Scholar 

  90. Putnam, N. H. et al. Sea anemone genome reveals ancestral eumetazoan gene repertoire and genomic organization. Science 317, 86–94 (2007)

    ADS  CAS  PubMed  Google Scholar 

  91. Artamonova, I. I. & Mushegian, A. R. Genome sequence analysis indicates that the model eukaryote Nematostella vectensis harbors bacterial consorts. Appl. Environ. Microbiol. 79, 6868–6873 (2013)

    CAS  PubMed  PubMed Central  Google Scholar 

  92. Srivastava, M. et al. The Amphimedon queenslandica genome and the evolution of animal complexity. Nature 466, 720–726 (2010)

    ADS  CAS  PubMed  PubMed Central  Google Scholar 

  93. Hentschel, U., Piel, J., Degnan, S. M. & Taylor, M. W. Genomic insights into the marine sponge microbiome. Nature Rev. Microbiol. 10, 641–654 (2012)

    CAS  Google Scholar 

  94. McCutcheon, J. P. & Moran, N. A. Extreme genome reduction in symbiotic bacteria. Nature Rev. Microbiol. 10, 13–26 (2012)

    CAS  Google Scholar 

  95. Wenger, Y. & Galliot, B. RNAseq versus genome-predicted transcriptomes: a large population of novel transcripts identified in an Illumina-454 Hydra transcriptome. BMC Genom. 14, 204 (2013)

    CAS  Google Scholar 

  96. Langdon, W. B. Mycoplasma contamination in the 1000 Genomes Project. BioData Min. 7, 3 (2014)

    Google Scholar 

  97. Lang, D., Zimmer, A. D., Rensing, S. A. & Reski, R. Exploring plant biodiversity: the Physcomitrella genome and beyond. Trends Plant Sci. 13, 542–549 (2008)

    CAS  PubMed  Google Scholar 

  98. Maere, S. et al. Modeling gene and genome duplications in eukaryotes. Proc. Natl Acad. Sci. USA 102, 5454–5459 (2005)

    ADS  CAS  PubMed  PubMed Central  Google Scholar 

  99. Lockhart, P. J., Larkum, A. W. D., Steel, M. A., Waddell, P. J. & Penny, D. Evolution of chlorophyll and bacteriochlorophyll: the problem of invariant sites in sequence analysis. Proc. Natl Acad. Sci. USA 93, 1930–1934 (1996)

    ADS  CAS  PubMed  PubMed Central  Google Scholar 

  100. Lockhart, P. J. et al. How molecules evolve in eubacteria. Mol. Biol. Evol. 17, 835–838 (2000)

    CAS  PubMed  Google Scholar 

  101. Pruitt, K. D., Tatusova, T. & Maglott, D. R. NCBI Reference Sequence (RefSeq): a curated non-redundant sequence database of genomes, transcripts and proteins. Nucleic Acids Res. 33, D501–D504 (2005)

    CAS  PubMed  Google Scholar 

  102. Zwickl, D. J. & Hillis, D. M. Increased taxon sampling greatly reduces phylogenetic error. Syst. Biol. 51, 588–598 (2002)

    PubMed  Google Scholar 

  103. Alvarez-Ponce, D., Lopez, P., Bapteste, E. & McInerney, J. O. Gene similarity networks provide tools for understanding eukaryote origins and evolution. Proc. Natl Acad. Sci. USA 110, E1594–E1603 (2013)

    ADS  CAS  PubMed  PubMed Central  Google Scholar 

Download references

Acknowledgements

We thank the following funding agencies: the European Research Council grants 232975, 666053 (W.F.M.) and 281357 (G.L.; to T. Dagan); the Templeton Foundation grant 48177 (J.O.M.); the Open University of Israel Research Fund (E.H.-C.); the German-Israeli Foundation grant I-1321-203.13/2015 (E.H.-C., W.F.M.), the New Zealand BioProtection CoRE (P.J.L.); the German Academic Exchange Service PhD stipend 57076385 (C.K.); an Alexander von Humboldt Foundation fellowship (D.B.). Computational support of the Zentrum für Informations- und Medientechnologie at the Heinrich-Heine University is acknowledged.

Author information

Authors and Affiliations

Authors

Contributions

C.K., G.L., S.N.-S., E.H.-C., D.B., M.R., P.J.L., J.O.M., and W.F.M. designed experiments. C.K., G.L., S.N.-S., M.R., F.L.S., and E.H.-C. performed analyses. C.K., S.N.S., F.L.S., P.J.L., D.B., E.H.-C., J.O.M., G.L., and W.F.M. wrote the paper.

Corresponding author

Correspondence to William F. Martin.

Ethics declarations

Competing interests

The authors declare no competing financial interests.

Extended data figures and tables

Extended Data Figure 1 Additional gene distribution patterns.

a, Distribution of ESCs. Each black tick indicates the presence of a cluster in a taxon. The 26,117 ESCs (x axis) from 55 eukaryotic genomes (Supplementary Table 1) are sorted according to their distribution across the six eukaryotic supergroups. b, Distribution of taxa in EPCs and monophyly of eukaryotes. Each black tick indicates the presence of a cluster in a taxon. The 2,585 EPCs (x axis) are separated into three sets according to the monophyly of eukaryotes and the results of the AUT and, within each set, are ordered according to their distribution across the six eukaryotic supergroups. Clusters where eukaryotes were resolved as non-monophyletic in the maximum likelihood tree tend to occur more frequently in bacterial taxa. Archaep., Archaeplastida; Opisth., Opisthokonta; Chl., Chloroplastida; Rho., Rhodophyta; Gla., Glaucophyta; Str., Stramenopila; De., Deinococcus-Thermus; oP., other Proteobacteria; Ch., Chlamydiae; Pl., Planctomycetes; Ve., Verrucomicrobia; Spi., Spirochaetae; The., Thermotogae; oB., other Bacteria. For abbreviations of eukaryotes, see Supplementary Table 1.

Extended Data Figure 2 Clustering, monophyly, and gene sharing.

a, b, Monophyly of eukaryotes in maximum likelihood trees, cluster size, and alignment quality. Cumulative frequency of clusters with different cluster size (a) or different HoT72 column scores (b) is plotted for three sets of EPCs that differ in terms of the monophyly of eukaryotes in the maximum likelihood trees (monophyletic: resolved as monophyletic in the original tree; passed AUT: resolved as non-monophyletic in the original tree, but at least one alternative tree with eukaryote monophyly (see Methods) was as likely at P = 0.05 in an AUT; failed AUT: alternative trees were not as likely as the original tree where eukaryotes were resolved as non-monophyletic). One-sided Kolmogorov–Smirnov two-sample goodness-of-fit test (cluster size/HoT column scores): monophyletic versus passed AUT, 1.04 × 10−13/7.9 × 10−3; monophyletic versus failed AUT, 1.45 × 10−61/2.04 × 10−10; passed AUT versus failed AUT, 3.40 × 10−13/4.00 × 10−3. c, d, Prokaryotic monophyly and gene sharing. c, Proportion of trees showing monophyly for taxonomic group. Prokaryotic phyla and classes (Supplementary Tables 3 and 4) that are monophyletic in the reference trees and that have at least five taxa (genomes in archaea or species in bacteria) are plotted according to the number of taxa and the proportion of EPC trees with at least two sequences from a prokaryotic group where it forms a monophyletic group. The proportion of eukaryote monophyly trees is higher than that of any prokaryotic group, including those with many fewer taxa. d, Gene sharing between a prokaryotic group and other prokaryotes. Using the same procedure for the generation of EPCs, 55 genomes were randomly sampled from a group of bacteria and the number of clusters (EPCs) they shared with prokaryotes not from this group was counted. The average number of shared clusters was mapped for each taxonomic group with 55–150 genomes (error bar, s.d.; number of genomes in parentheses). For E. coli and the eukaryotes (shown for comparison), there was only one sample. Colour coding for taxonomic levels: red, phylum; blue, class; green, order; magenta, family; cyan, genus; orange, species.

Extended Data Figure 3 Effect of taxon sampling on eukaryote monophyly in phylogenetic trees.

After ten sequences (bold) were added to the original data set (EPC E1689_B206_A295), the relationships among Archaeplastida taxa (highlighted in green) changed from non-monophyly (a) to monophyly (b). Abbreviations are shown for eukaryotic sequences (Supplementary Table 2) and NCBI GI numbers for cyanobacterial sequences (Supplementary Table 3; RefSeq accessions are shown for the added sequences).

Extended Data Figure 4 Distribution of prokaryotic taxa in the sister group to eukaryotes, with EPCs sorted by eukaryotic supergroups.

Top: each black tick indicates the presence of a eukaryote taxon in one of the 2,585 EPCs. Bottom: each red tick indicates the presence of a prokaryote taxon in the sister group to eukaryotes in one of the 1,933 EPC maximum likelihood trees where eukaryotes were resolved to be monophyletic. The 2,585 EPCs, proteome size, and cluster size are as in Fig. 1. The number of EPCs present and the frequency of occurrence in the sister group to eukaryotes (‘clusters’) are shown for eukaryotes and prokaryotes, respectively. Archaep., Archaeplastida; Opisth., Opisthokonta; Chl., Chloroplastida; Rho., Rhodophyta; Gla., Glaucophyta; Str., Stramenopila; De., Deinococcus-Thermus; oP., other Proteobacteria; Ch., Chlamydiae; Pl., Planctomycetes; Ve., Verrucomicrobia; Spi., Spirochaetae; The., Thermotogae; oB., other Bacteria. For abbreviations of eukaryotes, see Supplementary Table 1.

Extended Data Figure 5 Distribution of prokaryotic taxa in the sister group to eukaryotes, with EPCs sorted by prokaryotic groups.

Top: each black tick indicates the presence of a eukaryote taxon in one of the 1,933 EPC maximum likelihood trees where eukaryotes were resolved to be monophyletic. Bottom: each red tick indicates the presence of a prokaryote taxon in the sister group to eukaryotes in one of those 1,933 EPC trees. The EPCs (x axis) are ordered according to the taxonomic groups to which the prokaryotes in the sister group to eukaryotes belong (separated into three blocks where only bacteria (1,586 EPCs), only archaea (314 EPCs), or both bacteria and archaea (33 EPCs) are found in the sister group). There are 16 bacterial groups (including ‘other Bacteria’; Firmicutes, Proteobacteria, and the PVC superphylum (Planctomycetes, Verrucomicrobia, and Chlamydiae) are regarded as single groups) and five archaeal groups (the five phyla). The number of EPCs present and the frequency of occurrence in the sister group to eukaryotes are shown for eukaryotes and prokaryotes, respectively. Archaep., Archaeplastida; Opisth., Opisthokonta; Chl., Chloroplastida; Rho., Rhodophyta; Gla., Glaucophyta; Str., Stramenopila; De., Deinococcus-Thermus; oP., other Proteobacteria; Ch., Chlamydiae; Pl., Planctomycetes; Ve., Verrucomicrobia; Spi., Spirochaetae; The., Thermotogae; oB., other Bacteria. For abbreviations of eukaryotes, see Supplementary Table 1.

Extended Data Figure 6 Distribution of taxa in the sister groups consisting purely of cyanobacteria, alphaproteobacteria, or archaea.

Each black tick indicates the presence of a prokaryotic taxon in the sister group to eukaryotes in an EPC tree. a–c, Distributions of taxa in all pure-cyanobacterial (a), pure-alphaproteobacterial (b), and pure-archaeal (c) sister groups. The clusters are ordered alphanumerically according to the eukaryotic cluster numbers (Supplementary Table 5), whereas for archaea (c) the taxa are further sorted by the five archaeal phyla.

Extended Data Figure 7 Comparison of sets of trees for single-copy genes in eukaryotic groups, with more inclusive criteria.

af,Cumulative distribution functions (y axis) for scores of minimal tree compatibility with the vertical reference data set (x axis). Values are number of species, sample sizes, and P values of the two-tailed Kolmogorov–Smirnov two-sample goodness-of-fit test in the comparison of the ESC (blue) data sets against the EPC (green) data set and a synthetic data set simulating one LGT (red). Dashed lines delineate the range of distributions in 100 replicates of random down-sampling. The criteria for tree inclusion were less stringent than those for Fig. 3 (see Methods).

Extended Data Figure 8 Overview of eukaryote gene content evolution.

a, Eukaryotic evolution by gene loss. Genome sizes (number of EPCs present) were mapped onto the eukaryotic reference tree. Ancestral genome size in each eukaryotic ancestral node was calculated using a loss-only model, with all EPCs in blocks A–C and those in blocks D and E (Fig. 1) entering the eukaryotic lineage via the plastid ancestor (green) or the eukaryote ancestor (wheat colour). Plastid-derived genes are not shown for the ancestral nodes within SAR and Hacrobia, because of current debates about the number and nature of secondary symbioses, but are indicated by the greenish shading. b, Endosymbiotic gene transfer network. The network connecting apparent gene donors to the common ancestor of eukaryotes and Archaeplastida is mapped onto the reference phylogeny (vertical edges) of bacteria (left), eukaryotes (middle), and archaea (right). Grey shading (white to black) in the prokaryote reference trees (70 for archaea and 32 for bacteria) indicates how often a branch associated with a particular node was recovered within the trees of individual genes that were concatenated for inferring the reference topology. Lateral edges indicate gene influx at the origin of eukaryotes and at the origin of plastids. Edge colour corresponds to the frequencies with which a prokaryotic group appears in the sister group to eukaryotes. The archaeal reference tree was rooted between euryarchaeotes and other taxa, and the bacterial tree with Thermotogae. Secondary endosymbiotic transfers are indicated in light green and red. That members of both the Crenarchaeota and the Euryarcheaota are implicated as host relatives is probably because of the small archaeon sample34,35,36.

Extended Data Figure 9 Apparent gene transfers and eukaryote–prokaryote sequence identities.

a, Patterns suggestive of LGT from prokaryotes inferred from EPC trees. All EPC trees were searched for phylogenetic patterns suggestive of gene acquisitions by the common ancestor of each eukaryote lineage within the six supergroups (see Methods). The size of each circle is proportional to the number of such putative acquisitions, with the total number of putative acquisitions shown for each supergroup. The colour shows the age of nodes according to a eukaryotic time tree (blue, younger than 800 million years; red, older than 800 million years). For the four lineages with an asterisk, phylogenetic patterns where SAR/Hacrobia are nested within a clade formed by Archaeplastida were also counted as putative acquisitions to take into account secondary plastid endosymbioses. The numbers of acquisitions without such patterns are indicated in parentheses (and shown as inner circles). b, Eukaryote–prokaryote sequence identities for genes apparently acquired more recently and more anciently in eukaryotes (a). The mean of the average pairwise identities is shown in parentheses. At P = 0.05, a two-sided Wilcoxon rank-sum test either did not reject the null hypotheses that the two sets of genes are not different or suggested the tip-specific eukaryotic genes are less similar to their prokaryotic homologues.

Extended Data Figure 10 Distribution of ESCs and EPCs across eukaryotes under different criteria.

Different thresholds were applied to find eukaryote clusters with prokaryote homologues, including BLAST local identity for each eukaryote–prokaryote hit (30% or 20%) and levels of best-hit correspondence (10–50%) for identifying reciprocal pairs of eukaryote and prokaryote clusters. Distributions of ESCs and EPCs are drawn as in Extended Data Fig. 1a and Fig. 1, respectively.

Related audio

Supplementary information

Supplementary Table 1

List of 55 eukaryote genomes organized by six eukaryotic supergroups, sources of genome sequences, 28,702 eukaryotic protein clusters with at least two sequences and maximum-likelihood trees of eukaryote-specific clusters (ESCs) with at least four sequences. (XLSX 4107 kb)

Supplementary Table 2

List of 956,053 eukaryotic protein sequences in the sequence abbreviations used in this study and the original headers in the downloaded files. (XLSX 28596 kb)

Supplementary Table 3

List of 1,847 bacterial genomes, their taxonomical groupings and 102,089 clusters with at least five sequences, as well as a maximum likelihood reference tree based on 32 nearly universal single-copy genes. (XLSX 22467 kb)

Supplementary Table 4

List of 134 archaeal genomes, their taxonomical groupings and 11,992 clusters with at least five sequences. (XLSX 1303 kb)

Supplementary Table 5

List of 2,585 eukaryote-prokaryote clusters (EPCs). (XLSX 5328 kb)

Supplementary Table 6

Annotations of the functions of the 28,702 eukaryotic clusters and eukaryote monophyly in EPC trees. (XLSX 1659 kb)

Supplementary Table 7

Maximum-likelihood trees with at least four sequences reconstructed from eukaryote-prokaryote clusters (EPCs). (TXT 28189 kb)

Supplementary Table 8

Frequency of occurrence of prokaryotic taxa in the sister group to eukaryotes and a two-sided Wilcoxon signed rank test comparing the original frequencies and those after randomizations. (XLSX 68 kb)

Supplementary Table 9

BLAST analysis of bacterial, mitochondrial and plastid genomes against the nuclear genomes. (PDF 100 kb)

PowerPoint slides

Source data

Rights and permissions

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Ku, C., Nelson-Sathi, S., Roettger, M. et al. Endosymbiotic origin and differential loss of eukaryotic genes. Nature 524, 427–432 (2015). https://doi.org/10.1038/nature14963

Download citation

  • Received:

  • Accepted:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1038/nature14963

This article is cited by

Comments

By submitting a comment you agree to abide by our Terms and Community Guidelines. If you find something abusive or that does not comply with our terms or guidelines please flag it as inappropriate.

Search

Quick links

Nature Briefing

Sign up for the Nature Briefing newsletter — what matters in science, free to your inbox daily.

Get the most important science stories of the day, free in your inbox. Sign up for Nature Briefing