Main

Birds (Aves) are the most diverse lineage of extant tetrapod vertebrates. They comprise over 10,000 living species2, and exhibit an extraordinary diversity in morphology, ecology, and behaviour3. Substantial progress has been made in resolving the phylogenetic history of birds. Phylogenetic analyses of both molecular and morphological data support the monophyletic Palaeognathae (the tinamous and flightless ratites) and Galloanserae (gamebirds and waterfowl) as successive, monophyletic sister groups to the Neoaves—a diverse clade including all other living birds4. Resolving neoavian phylogeny has proven to be a difficult challenge because this radiation was very rapid and deep in time, resulting in very short internodes4.

In the last decade, phylogenetic analyses of large, multilocus data sets have resulted in the proposal of numerous, novel neoavian relationships. For example, a clade consisting of diving and wading birds has been consistently recovered, as well as a large landbird clade in which falcons and parrots are successive sister groups to the perching birds4,5,6,7,8. Recently, phylogenetic analyses of 48 whole avian genomes resulted in the proposal of a novel phylogenetic resolution of the initial branching sequence within Neoaves1. Although this genomic study provided much needed corroboration of many neoavian clades, the limited taxon sampling precluded further insights into the evolutionary history of birds.

It has long been recognized that phylogenetic confidence depends not only on the number of characters analysed and their rate of evolution, but also on the number and relationships of the taxa sampled relative to the nodes of interest9,10,11. Theory predicts that sampling a single taxon that diverges close to a node of interest will have a far greater effect on phylogenetic resolution than will adding more characters11. Despite using an alignment of >40 million base pairs, sparse sampling of 48 species in the recent avian genomic analysis may not have been sufficient to confidently resolve the deep divergences among major lineages of Neoaves. Thus, expanded taxon sampling is required to test the monophyly of neoavian clades, and to further resolve the phylogenetic relationships within Neoaves.

Here, we present a phylogenetic analysis of 198 bird species and 2 crocodilians (Supplementary Table 1) based on loci captured using anchored enrichment12. Our sample includes species of 122 avian families in all 40 extant avian orders2, with denser representation of non-oscine birds (108 families) than of oscine songbirds (14 families). Effort was made to include taxa that would break up long phylogenetic branches, and provide the highest likelihood of resolving short internodes at the base of Neoaves11. We also sampled multiple species within groups whose monophyly or phylogenetic interrelationships have been controversial—that is, tinamous, nightjars, hummingbirds, turacos, cuckoos, pigeons, sandgrouse, mesites, rails, storm petrels, petrels, storks, herons, hawks, hornbills, mousebirds, trogons, kingfishers, barbets, seriemas, falcons, parrots, and suboscine passerines.

We targeted 394 loci centred on conserved anchor regions of the genome that are flanked by more variable regions12. We performed all phylogenetic analyses on a data set of 259 genes with the highest quality assemblies. The average locus was 1,524 bases in length (361–2,316 base pairs (bp)), and the total percentage of missing data was 1.84%. The concatenated alignment contained 394,684 sites. To minimize overall model complexity while accurately accounting for substitution processes, we performed a partition model sensitivity analysis with PartitionFinder13,14, and compared a complex partition model (one partition per locus) to a heuristically optimized (rclust) partition model. Phylogenetic informativeness (PI) approaches15,16 provided strong evidence that the phylogenetic utility of our data set was high, with low declines in PI profiles for individual loci, data set partitions, and the concatenated matrix (Supplementary Fig. 4). We estimated concatenated trees in ExaBayes17 and RAxML18 using a 75 partition model. Coalescent species trees were estimated with the gene tree summation methods in STAR19, NJst20, and ASTRAL21 from gene trees estimated with RAxML (see Methods.)

Our concatenated Bayesian analyses resulted in a completely resolved, well supported phylogeny. All clades had a posterior probability (PP) of 1, except for a single clade including shoebill (Balaeniceps) and pelican (PP = 0.54) (Fig. 1). The concatenated maximum likelihood analysis recovered a single topology that was identical to the Bayesian tree except for three clades, all of which are far from the base of Neoaves: the relationships among pigeons; among skimmers, gulls, and terns; and among pelicans, shoebill, and waders (Supplementary Fig. 1). Almost all clades in the maximum likelihood tree were maximally supported with bootstrap scores (BS) of 1.00, but nine clades within Neoaves (including four of the most inclusive neoavian clades) received support <0.70 (Supplementary Fig. 1). Coalescent species tree analyses produced substantially different hypotheses for neoavian relationships (Supplementary Fig. 3), but most of the discordant clades received conspicuously lower bootstrap support values (0.07 < BS < 0.30). Quantifying the phylogenetic informativeness of individual loci15,16 revealed that these low support values were not due to homoplasy driven by saturation of nucleotide states, but rather by the low power of individual loci to resolve the entire range of internode lengths across the depth of the tree (Supplementary Figs 4 and 5; see Methods). This result was not unexpected. The low phylogenetic information content of individual genes at deep timescales has been demonstrated to impede phylogenetic resolution in a coalescent species tree framework22,23. Furthermore, when clades with <0.75 bootstrap support values in the species trees are collapsed, the resulting topology is exactly congruent with the concatenated Bayesian tree (except for the relationships of tinamous among palaeognaths; Supplementary Fig. 3). Although coalescent species trees account for incomplete lineage sorting, simulations show that species tree methods based on gene tree summation may not provide significantly better performance over concatenation methods22.

Figure 1: Phylogeny of birds.
figure 1

Time-calibrated phylogeny of 198 species of birds inferred from a concatenated, Bayesian analysis of 259 anchored phylogenomic loci using ExaBayes17. Figure continues on the lower panel from the green arrow at the bottom of the top panel. Complete taxon data in Supplementary Table 1. Higher taxon names appear at right. All clades are supported with posterior probability (PP) of 1.0, except for the Balaeniceps–Pelecanus clade (PP = 0.54; clade 109). The five major, successive, neoavian sister clades are: Strisores (brown), Columbaves (purple), Gruiformes (yellow), Aequorlitornithes (blue), and Inopinaves (green). Background colours mark geological periods. Ma, million years ago; Ple, Pleistocene; Pli, Pliocene; Q., Quaternary. Clade numbers refer to the plot of estimated divergence dates (Supplementary Fig. 7). Fossil age-calibrated nodes are shown in grey. Illustrations of representative bird species30 are depicted by their lineages. See Supplementary Information for details and further discussion.

PowerPoint slide

Our phylogeny identifies many new clades, and supports many phylogenetic relationships proposed in previous studies (see detailed phylogenetic discussion in the Supplementary Information). Congruent with all recent studies, the phylogeny places palaeognaths as the sister group to the rest of birds, and the flying tinamous (Tinamidae) within the flightless ratites. This tree, however, places tinamous as the sister group to cassowary and emu alone (Fig. 1, grey). The phylogeny of Galloanserae is exactly congruent with previous studies4 (Fig. 1, red).

Within the monophyletic Neoaves, we recover five major clades, each of which is the successive sister group to the remaining clades in the series (Fig. 1). The Strisores includes the nightjars and their nocturnal relatives with the diurnal swifts and hummingbirds (Fig. 1, brown). Four nocturnal lineages—nightjars, a neotropical oilbird-potoo clade, frogmouths, and owlet-nightjars—form successive sister groups to the diurnal swift and hummingbird clade.

The Columbaves is a novel clade that consists of two monophyletic groups recently identified by Jarvis et al.1 (Fig. 1, purple). A clade consisting of turacos, bustards, and cuckoos (Otidimorphae) is sister to a clade consisting of pigeons as the sister group to sandgrouse and mesites (Columbimorphae). The third neoavian clade consists of a well recognized monophyletic group of core gruiform birds (Gruiformes; Fig. 1, yellow), with interrelationships that are consistent with previous phylogenies4.

The Aequorlitornithes is a novel, comprehensive clade of waterbirds, including all shorebirds, diving birds, and wading birds (Fig. 1, blue). Within this group, the flamingos and grebes1,4,5,6 are the sister group to shorebirds, and the sunbittern and tropicbirds1,4,6 are the sister group to the wading and diving birds (Fig. 1, blue). Other interrelationships within these groups are extensively congruent with the results in ref. 4 and the work of others (see Supplementary Information).

The fifth major neoavian clade, which we name Inopinaves, is a very diverse landbird clade with the same composition as previously recognized (Telluraves)1,4,5,6, but with the enigmatic, neotropical hoatzin (Opisthocomus hoazin) as the sister group to all other landbirds (Fig. 1, green). The phylogeny of the landbirds shares many points of congruence with earlier hypotheses, including the relationships of seriemas, falcons, parrots, and perching birds1,4,5,6, and the interrelationships among oscine songbirds24. However, we find that hawks (Accipitriformes) are the sister group to a new clade including the rest of the landbirds, to be called Eutelluraves (see Supplementary Information).

Our divergence time analyses employed 19 phylogenetically and geologically well-constrained fossil calibrations (following recently proposed best practices25), documenting many deep divergences within the avian crown group (Fig. 1, grey nodes; see Supplementary Information). Our analysis supports an extremely rapid radiation of the avian crown group in the wake of the K–Pg mass extinction event (Fig. 1, Supplementary Figs. 6 and 7). Although the post-K–Pg radiation hypothesis has long been strongly supported by the avian fossil record26,27, it has so far received little support from molecular divergence time analyses4,28. The tempo and mode of the extant avian radiation remains contentious. For example, an alternative calibration analysis including the fossil Vegavis did not support significantly different dates of divergence outside of the Galloanserae (see Supplementary Information and Supplementary Figs 10, 11, 12). Confident determination of the age of crown Aves will have to await discoveries of Mesozoic stem neognaths and palaeognaths, and detailed assessments of the influence of soft maximum bound parameterization on the age of the deepest avian divergences.

Our results indicate that the recent genome phylogeny1 may contain some erroneous relationships induced by long branch attraction from sparse taxon sampling. Maximum likelihood analysis of our sequence data pruned down to a phylogenetically equivalent subsample of 48 species produces relationships along the neoavian ‘backbone’ (Supplementary Fig. 8) that are entirely discordant with the phylogeny based on our full data set (Fig. 1). This reduced taxon analysis recovers some of the specific features of the recent genome phylogeny by Jarvis et al.1 (Supplementary Fig. 8): for example, the placement of the pigeons, mesites, and sandgrouse (a subclade of Columbea1) outside of the rest of the Neoaves. Differences in tree topology when taxa are excluded are to be expected if early internodes in Neoaves are very short. Adding taxa that have diverged near nodes of interest has been theoretically demonstrated to constrain the possible historical substitution patterns, and increase the accuracy of phylogenetic inference11. By increasing our taxon sampling to include all major avian lineages, we have minimized the possibility that additional taxon sampling alone will alter the relationships in our tree.

Jarvis et al.1 also identified a well supported clade consisting of the hoatzin (Opisthocomus) as the sister group to a crane (Grus) and a plover (Charadrius) (total evidence nucleotide tree, BS = 0.91, 0.96, respectively). However, Grus and Charadrius were the only species sampled from two very diverse neoavian orders: Gruiformes, 185 species; and Charadriiformes, 385 species2. Our results indicate that Opisthocomus is the most ancient bird lineage ( 64 million years) consisting of only a single, extant species. Thus, the three taxa placed in this assemblage by Jarvis et al.1 comprise three of the most ancient, and under-sampled lineages within all birds, indicating the strong possibility of long branch attraction artefacts. By contrast, these same groups are represented by 26 species in our analysis, and they do not form an exclusive clade (Fig. 1).

In addition to providing a new backbone for comprehensive avian supertrees and comparative evolutionary analyses28, this new avian phylogeny supports many interesting hypotheses about avian evolution. This phylogeny upholds the hypothesis that the ancestor of the diurnal swifts and hummingbirds evolved from a clade that had been predominantly nocturnal for 10 million years. Although hummingbirds have acute near-ultraviolet vision29, the effect of extended ancestral nocturnality on the evolution of the visual system in this group of birds is unknown. Our findings also support the emerging pattern that landbirds evolved from a raptorial grade1. The sister group relationships of hawks to the rest of the landbirds, of owls to the diverse coraciimorph clade, and of seriemas and falcons to the parrots and passerines indicate the persistence of a raptorial ecology among ancestral landbirds. Lastly, the identification of a new, broadly comprehensive waterbird–shorebird clade indicates a striking, and previously unappreciated, level of evolutionary constraint on the ecological diversification of birds that will be exciting to investigate in the future.

Methods

Locus selection and probe design

Anchor loci described in ref. 12 were extended such that each contained approximately 1,350 bp. In some cases neighbouring loci were joined to form a single locus. Also, loci that performed poorly in ref. 12 were removed from the locus set. This process produced 394 loci (referred to as the version 2 vertebrate loci). Genome coordinates corresponding to these regions in the Gallus gallus genome (galGal3, UCSC genome browser) were identified and sequences corresponding to this region were extracted (coordinates are available in the Zenodo archive (http://dx.doi.org/10.5281/zenodo.28343)). In order to improve the capture efficiency for passerines, we also obtained homologous sequences for Taeniopygia guttata. After aligning the Gallus and Taeniopygia sequences using MAFFT31, alignments were trimmed to produce the final probe region alignments (alignments available in the Zenodo archive), and probes were tiled at approximately 1.5 × tiling density (probe specification will be made available upon publication).

Data collection

Data were collected following the general methods of ref. 12 through the Center for Anchored Phylogenomics at Florida State University (http://www.anchoredphylogeny.com). Briefly, each genomic DNA sample was sonicated to a fragment size of 150–350 bp using a Covaris E220 focused-ultrasonicator with Covaris microTUBES. Subsequently, library preparation and indexing were performed on a Beckman-Coulter Biomek FXp liquid-handling robot following a protocol modified from ref. 32. One important modification is a size-selection step after blunt-end repair using SPRIselect beads (Beckman-Coulter; 0.9 × ratio of bead to sample volume). Indexed samples were then pooled at equal quantities (typically 12–16 samples per pool), and enrichments were performed on each multi-sample pool using an Agilent Custom SureSelect kit (Agilent Technologies), designed as specified above. After enrichment, the 12 enrichment pools were pooled in groups of three in equal quantities for sequencing on four PE150 Illumina HiSeq2000 lanes (three enrichment pools per lane). Sequencing was performed in the Translational Science Laboratory in the College of Medicine at Florida State University.

Data processing

Paired-read merging (Merge.java). Typically, between 50% and 75% of sequenced library fragments had an insert size between 150 bp and 300 bp. As 150 bp paired-end sequencing was performed, this means that the majority of the paired reads overlap and thus should be merged before assembly. The overlapping reads were identified and merged following the methods of ref. 33. In short, for each degree of overlap for each read we computed the probability of obtaining the observed number of matches by chance, and selected degree of overlap that produced the lowest probability, with a P value less than 10−10 required to merge reads. When reads are merged, mismatches are reconciled using base-specific quality scores, which were combined to form the new quality scores for the merged read (see ref. 33 for details). Reads failing to meet the probability criterion were kept separate but still used in the assembly. The merging process produces three files: one containing merged reads and two containing the unmerged reads.

Assembly (Assembler.java). The reads were assembled into contigs using an assembler that makes use of both a divergent reference assembly approach to map reads to the probe regions and a de novo assembly approach to extend the assembly into the flanks. The reference assembler uses a library of spaced 20-mers derived from the conserved sites of the alignments used during probe design. A preliminary match was called if at least 17 of 20 matches exist between a spaced kmer and the corresponding positions in a read. Reads obtaining a preliminary match were then compared to an appropriate reference sequence used for probe design to determine the maximum number of matches out of 100 consecutive bases (all possible gap-free alignments between the read and the reference ware considered). The read was considered mapped to the given locus if at least 55 matches were found. Once a read is mapped, an approximate alignment position was estimated using the position of the spaced 20-mer, and all 60-mers existing in the read were stored in a hash table used by the de novo assembler. The de novo assembler identifies exact matches between a read and one of the 60-mers found in the hash table. Simultaneously using the two levels of assembly described above, the three read files were traversed repeatedly until an entire pass through the reads produced no additional mapped reads.

For each locus, mapped reads were then clustered into clusters using 60-mer pairs observed in the reads mapped to that locus. In short, a list of all 60-mers found in the mapped reads was compiled, and the 60-mers were clustered if found together in at least two reads. The 60-mer clusters were then used to separate the reads into clusters for contig estimation. Relative alignment positions of reads within each cluster were then refined in order to increase the agreement across the reads. Up to one gap was also inserted per read if needed to improve the alignment. Note that given sufficient coverage and an absence of contamination, each single-copy locus should produce a single assembly cluster. Low coverage (leading to a break in the assembly), contamination, and gene duplication, can all lead to an increased number of assembly clusters. A whole-genome duplication, for example, would increase the number of clusters to two per locus.

Consensus bases were called from assembly clusters as follows. For each site an unambiguous base was called if the bases present were identical or if the polymorphism of that site could be explained as sequencing error, assuming a binomial probability model with the probability of error equal to 0.1 and alpha equal to 0.05. If the polymorphism could not be explained as sequencing error, the ambiguous base was called that corresponded to all of the observed bases at that site (for example, ‘R’ was used if ‘A’ and ‘G’ were observed). Called bases were soft-masked (made lowercase) for sites with coverage lower than five. A summary of the assembly results is presented in a spreadsheet in the electronic data archive (http://dx.doi.org/10.5281/zenodo.28343; Prum_AssemblySummary_Summary.xlsx).

Contamination filtering (IdentifyGoodSeqsViaReadsMapped.r, GatherALLConSeqsWithOKCoverage.java). In order to filter out possible low-level contaminants, consensus sequences derived from very low coverage assembly clusters (<10 reads) were removed from further analysis. After filtering, consensus sequences were grouped by locus (across individuals) in order to produce sets of homologues.

Orthology (GetPairwiseDistanceMeasures.java, plotMDS5.r). Orthology was then determined for each locus as follows. First, a pairwise distance measure was computed for pairs of homologues. To compute the pairwise distance between two sequences, we computed the percent of 20-mers observed in the two sequences that were found in both sequences. Note that the list of 20-mers was constructed from consecutive 20-mers as well as spaced 20-mers (every third base), in order to allow increased levels of sequence divergence. Using the distance matrix, we clustered the sequences using a neighbour-joining algorithm, but allowing at most one sequence per species to be in a given cluster. Clusters containing fewer than 50% of the species were removed from downstream processing.

Alignment (MAFFT). Sequences in each orthologous set were aligned using MAFFT v7.023b31 with “–genafpair” and “–maxiterate 1000” flags.

Alignment Trimming (TrimAndMaskRawAlignments3). The alignment for each locus was then trimmed/masked using the following procedure. First, each alignment site was identified as ‘good’ if the most common character observed was present in >40% of the sequences. Second, 20 bp regions of each sequence that contained <10 good sites were masked. Third, sites with fewer than 12 unmasked bases were removed from the alignment. Lastly, entire loci were removed if both outgroups or more than 40 taxa were missing. This filter yielded 259 trimmed loci containing fewer than 2.5% missing characters overall.

Model selection and phylogenetic inference

To minimize the overall model complexity while accurately accounting for substitution processes, we performed a partition-model sensitivity analysis with the development version of PartitionFinder v2.0 (ref. 13), sensu14, and compared a complex partition-model (one partition per gene) to a heuristically optimized (relaxed clustering with the RAxML option for accelerated model selection) partition-model using BIC. Based on a candidate pool of potential partitioning strategies that spanned a single partition for the entire data set to a model allowing each locus to represent a unique partition, the latter approach suggested that 75 partitions of our data set represented the best-fitting partitioning scheme, which reduced the number of necessary model parameters by 71%, and hugely decreased computation time.

We analysed each individual locus in RAxML v8.0.20 (ref. 18), and then the concatenated alignment, using the two partitioning strategies identified above with both maximum likelihood and Bayesian based approaches in RAxML v8.0.20, and ExaBayes v1.4.2 9 (ref. 34). For each RAxML analysis, we executed 100 rapid bootstrap inferences and thereafter a thorough ML search using a GTR+Γ4 model of nucleotide substitution for each data set partition. Although this may potentially over-parameterize a partition with respect to substitution model, the influence of this form of model over-parameterization has been found to be negligible in phylogenetic inference35. For the Bayesian analyses, we ran four Metropolis-coupled ExaBayes replicates for 10 million generations, each with three heated chains, and sampling every 1,000 generations (default tuning and branch swap parameters; branch lengths among partitions were linked). Convergence and proper sampling of the posterior distribution of parameter values were assessed by checking that the effective sample sizes of all estimated parameters and branch lengths were greater than 200 in the Tracer v1.6 software36 (most were greater than 1,000), and by using the ‘sdsf’ and ‘postProcParam’ tools included with the ExaBayes package to ensure the average standard deviation of split frequencies and potential scale reduction factors across runs were close to zero and one, respectively. Finally, to check for convergence in topology and clade posterior probabilities, we summarized a greedily refined majority-rule consensus tree (default) from 10,000 post burn-in trees using the ExaBayes ‘consense’ tool for each run independently and then together. Analyses of the reduced data set referenced in the main text were conducted using the same partition-model as the full data set.

To explore variation in gene tree topology and to look for outliers that might influence combined analysis, we calculated pairwise Robinson-Foulds37 (RF) and Matching Splits (MS) tree distances implemented in TreeCmp38. We then visualized histograms of tree distances and multidimensional scaling plots in R, and estimated neighbour-joining ‘trees-of-trees’ in the Phangorn R package sensu lato39,40. Using RF and MS distances, outlier loci were identified as those that occurred in the top 10% of pairwise distances for >30 comparisons to other loci (10%) in the data set. We also identified putative outlier loci using the kdetrees.complete function of the kdetrees R package41. All three methods identified 13 of the same loci as potential outliers; however removal of these loci from the analysis had no effect on estimating topology or branch lengths.

Coalescent species tree analyses

Although fully parametric estimation (for example, *BEAST, see ref. 42) of a coalescent species tree with hundreds of genes and hundreds of taxa is not currently possible, we estimated species trees using three gene-tree summation methods that have been shown to be statistically consistent under the multispecies coalescent model43. First, we used the STRAW web server44 to estimate bootstrapped species trees using the STAR19 and NJ-ST20 algorithms (also available through STRAW). The popular MP-EST45 method cannot currently work for more than 50 taxa. STAR takes rooted gene trees and uses the average ranks of coalescence times19 to build a distance matrix from which a species tree is computed with the neighbour-joining method46. By contrast, NJst applies the neighbour-joining method to a distance matrix computed from average gene-tree internode distances, and relaxes the requirement for input gene trees to be rooted20.

We also summarized a species tree with the ASTRAL 4.7.6 algorithm. With simulated data, ASTRAL has been shown to outperform concatenation or other summary methods under certain amounts of incomplete lineage sorting21. For very large numbers of taxa and genes, ASTRAL uses a heuristic search to find the species tree that agrees with the largest number of quartet trees induced by the set of input gene trees. For analysis with ASTRAL, we also attempted to increase the resolution of individual gene trees (Supplementary Fig. 2) by generating supergene alignments using the weighted statistical binning pipeline of refs 47, 48 with a bootstrap score of 0.75 as a bin threshold.

STAR, NJst (not shown), and the binned ASTRAL (Supplementary Fig. 3) analysis produced virtually identical inferences when low support branches (<0.75) were collapsed, and differed only with respect to the resolution of a few branches. NJst resolved the Passeroidea (Fringilla plus Spizella) as the sister group to a paraphyletic sample of Sylvioidea (Calandrella, Pycnonotus, and Sylvia), while STAR does not resolve this branch. Comparing STAR/NJst to ASTRAL, we find five additional differences: (1) within tinamous, STAR/NJst resolves Crypturellus as sister to the rest of the tinamous, whereas ASTRAL resolves Crypturellus as sister to Tinamus (similar to ExaBayes/RAxML); (2) STAR/NJst resolves pigeons as sister to a clade containing Mesitornithiformes and Pteroclidiformes, while ASTRAL does not resolve these relationships; (3), STAR/NJst fails to resolve Oxyruncus and Myiobius as sister genera, while ASTRAL does (similar to RAxML/ExaBayes); (4), in STAR/NJst, bee-eaters (Merops) are resolved as the sister group to coraciiforms (congruent with ref. 4), while ASTRAL resolves bee-eaters as sister to the rollers (Coracias) (similar to RAxML/ExaBayes); (5) lastly, in STAR/NJst, buttonquail (Turnix) is resolved as sister to the most inclusive clade of Charadriiformes not including Burhinus, Charadrius, Haematopus, and Recurvirostra, while in ASTRAL, buttonquail is resolved as sister to a clade containing Glareola, Uria, Rynchops, Sterna, and Chroicocephalus (similar to RAxML/ExaBayes).

Although lower level relationships detected with concatenation are generally recapitulated in the species trees, few of the higher level, or interordinal, relationships are resolved. This lack of resolution of the gene-tree species-tree based inferences relative to the inferences based on concatenation are not surprising, as it is increasingly recognized that the phylogenetic information content required to resolve the gene-tree histories of individual loci becomes scant at deep timescales47. Despite our extensive taxon sampling and the slow rate of nucleotide substitution that characterizes loci captured using anchored enrichment12, no single locus was able to fully resolve a topology, and this lack of information will challenge the accuracy of any coalescent-based summary approach relative to concatenation49,50,51,52,53,54. Finally, all summation methods tested here assume a priori that the only source of discordance among gene trees is deep coalescence, and violations of this assumption may introduce systematic error in phylogeny estimation54.

Phylogenetic informativeness

Site-specific evolutionary rates, λi…j, were calculated for each locus using the program HyPhy55 in the PhyDesign web interface56 in conjunction with a guide chronogram generated by a nonparametric rate smoothing algorithm57 applied to our concatenated RAxML tree. Using these rates to predict whether an alignment will yield correct, incorrect, or no resolution of a given node, we quantified the probability of phylogenetically informative changes (ψ)16 contributing to the resolution of the earliest divergences in Neoaves. Estimates generated under a three character state model58 reveal that the majority of loci have a strong probability of ψ, and suggest a high potential for most loci and partitions containing multiple loci (assigned by PartitionFinder) to correctly resolve this internode. The potential for resolution as a consequence of phylogenetic signal is therefore high relative to the potential for saturation and misleading inference induced by stochastic changes along the subtending lineages (Supplementary Fig. 4a).

To assess the information content of the loci across the entire topology, we profiled their phylogenetic informativeness (PI)15, (Supplementary Fig. 4b). There was considerable variation in PI across loci (Supplementary Fig. 4). In all cases, the loci with the lowest values of ψ are categorized by substantially lower (60–90%) values of PI, rather than sharp declines in their PI profiles. The absence of a sharp decline in the PI profile suggests that a lack of phylogenetic information, rather than rapid increases in homoplasious sites, underlie low values of the probability of signal ψ59.

Because declines in PI can be attributed to increases in homoplasious site patterns59, we further assessed the phylogenetic utility of data set partitions by quantifying the ratio of PI at the most recent common ancestor of Neoaves to the PI at the most recent common ancestor of Aves (Supplementary Fig. 4c). Values of this ratio that are less than 1 correspond to a rise in PI towards the root. Values close to 1 correspond to fairly uniform PI. Values greater than 1 correspond to a decline in PI towards the root. Sixty-six out of 75 partitions demonstrated less than a 50% percent decline in PI, and only six partitions demonstrated a decline of PI greater than 75% (Supplementary Fig. 4c). As all but a few nodes in this study represent divergences younger than the crown of Neoaves, these ratios of PI suggest that the predicted impact of homoplasy on our topological inferences should be minimal.

As PI profiles do not directly predict the impact of homoplasious site patterns on topological resolution16,60, we evaluated probabilities of ψ for focal nodes using both the concatenated data set as well as individual loci that span the variance in locus lengths. Concordant with expectations from the PI profiles, all quantifications strongly support the prediction that homoplasy will have a minimal impact on topological resolution for the concatenated data set across a range of tree depths and internode distances (ψ = 1.0 for all nodes), while individual loci vary in their predicted utility (Supplementary Fig. 4d). As the guide tree does not represent a true known tree, we additionally quantified ψ across a range of tree depths and internode distances to test if our predictions of utility are in line with general trends in the data. Concordant with our results above, the concatenated data set is predicted to be of high phylogenetic utility at all timescales (ψ = 1.0 for all nodes), while the utility of individual loci begins to decline for small internodes at deep tree depths (Supplementary Fig. 5).

Estimating a time-calibrated phylogeny

We estimated a time-calibrated tree with a node dating approach in BEAST 1.8.1 (ref. 42) that used 19 well justified fossil calibrations phylogenetically placed by rigorous, apomorphy-based diagnoses (see the descriptions of avian calibration fossils in the Supplementary Information). We used a starting tree topology based on the ExaBayes inference (Fig. 1), and prior node age calibrations that followed a lognormal parametric distribution based on occurrences of fossil taxa. To prevent BEAST from exploring topology space and only allow estimation of branch lengths, we turned off the subtree-slide, Wilson–Balding, and narrow and wide exchange operators61,62. Finally, we applied a birth–death speciation model with default priors.

As rates of molecular evolution are significantly variable across certain bird lineages63,64,65, we applied an uncorrelated relaxed clock (UCLN) to each partition of the data set where rates among branches are distributed according to a lognormal distribution66. All dating analyses were performed without crocodilian outgroups to reduce the potential of extreme substitution rate heterogeneity to bias rate and consequent divergence time estimates of the UCLN model67.

All calibrations were modelled using soft maximum age bounds to allow for the potential of our data to overwhelm our user-specified priors68. Soft maximum bounds are the preferred method for assigning upper limits on the age of phylogenetic divergences69. As effective priors necessarily reflect interactions between user specified priors, topology, and the branching-model, they may not precisely reflect the user-specified priors70. To correct for this potential source of error, we carefully examined the effective calibration priors by first running the prepared BEAST XML without any nucleotide data (until all ESS values were above 200). We then iteratively adjusted our user-defined priors until all of the effective priors (as examined in the Tracer software) reflected the intended calibration densities. Finally, using the compare.phylo function in the Phyloch R package, we examined how the inclusion of molecular data influenced the divergence time estimates relative to the effective prior (Supplementary Fig. 9; see below).

Defining priors

Our initial approach was to set a prior’s offset to the age of its associated fossil; the mean was then manually adjusted such that 95% of the calibration density fell more recently than the K–Pg boundary at 65 Ma (million years ago) (the standard deviation was fixed at 1 Ma). In general, priors constructed this way generated calibration densities that specified their highest density peak (their mode) about 3–5 million years older than the age of the offset.

We applied a loose gamma prior to the node reflecting the most recent common ancestor of crown birds—we used an offset of 60.5 Ma (the age of the oldest known definitive, uncontroversial crown bird fossil; the stem penguin Waimanu), and adjusted the scale and shape of the prior such that 97.5% of the calibration density fell more recently than 86.5 Ma71 (see below and Supplementary Information for discussion of the >65 Ma putative crown avian Vegavis). This date (86.5 Ma) reflects the upper bound age estimate of the Niobrara Formation—one of many richly fossiliferous Mesozoic deposits exhibiting many crownward Mesozoic stem birds, without any trace of avian crown group representatives. The Niobrara, in particular, has produced hundreds of stem birds and other fragile skeletons, without yielding a single crown bird fossil, and therefore represents a robust choice for a soft upper bound for the root divergence of the avian crown71,72,73. Previous soft maxima employed for this divergence have arbitrarily selected the age of other Mesozoic stem avians (that is, Gansus yumenensis, 110 Ma) that are phylogenetically stemward of the Niobrara taxa28. Although the implementation of very ancient soft maxima such as the age of Gansus are often done in the name of conservatism, the extremely ancient divergence dates yielded by such analyses illustrate the misleading influence of assigning soft maxima that are vastly too old to be of relevance to the divergence of crown group birds74. However, this problem has been eliminated in some more recent analyses75.

All of the fossil calibrations employed in our analysis represent neognaths; rootward divergences within Aves (for example the divergence between Palaeognathae and Neognathae, and Galloanserae and Neoaves) cannot be confidently calibrated due to a present lack of fossils representing the palaeognath, neognath, galloanserine, and neoavian stem groups. As such, the K–Pg soft bound was only applied to comparatively apical divergences within neognaths. Although the question of whether major neognath divergences occurred during the Mesozoic has been the source of controversy76,77,78, renewed surveys of Mesozoic sediments for definitive crown avians or even possible crown neoavians have been unsuccessful (with the possible exception of Vegavis; see Supplementary Information), and together with recent divergence dating analyses have cast doubt on the presence of neoavian subclades before the K–Pg mass extinction1,74,79. Further, recent work has demonstrated the tendency of avian divergence estimates to greatly exceed uninformative priors, resulting in spuriously ancient divergence dating results (for example, refs 28, 75, 76, 80). These results motivated our implementation of the 65 Ma soft bound for our neoavian calibrations.

Contrary to expectation, when we compared the effective prior on the entire tree to the final summary derived from the posterior distribution of divergence times (Supplementary Fig. 9), we found no overall trend of posterior estimated ages post-dating prior calibrations. In fact, the inclusion of our molecular data decreases the inferred ages of almost all of the deepest nodes in our tree. A similar result has been obtained for mammals by using large amounts of nuclear DNA sequences81. Future work investigating the interplay of the density of genomic sampling and the application of various calibration age priors will be indispensible for sensitivity analyses to help us further develop a robust timescale of avian evolution. However, the pattern of posterior versus prior age estimates observed in our study raises the prospect that the new class of data used in this study (that is, semi-conserved anchor regions) may exhibit some immunity to longstanding problems associated with inferring avian divergence times, such as systematically over-estimating the antiquity of extant avian clades.

Implementing BEAST and summarizing a final calibrated tree

In addition to making predictions about the phylogenetic utility of a locus or partition towards topological resolution, PI profiles have recently also been used to mitigate the influence of substitution saturation on divergence time estimates82. Given the variance in PI profile shapes for captured loci and their subsequent partition assignments (Supplementary Fig. 4c), and observations that alignments and subsets of data alignments characterized by high levels of homoplasy can mislead branch length estimation83,84, we limited our divergence time estimates to 36 partitions that did not exhibit a decline in informativeness towards the root of the tree. We ran BEAST on each partition separately until parameter ESS values were greater than 200 (most were greater than 1,000) to ensure adequate posterior sampling of each parameter value. After concatenating 10,000 randomly sampled post burn-in trees from each of these completed analyses, we summarized a final MCC tree with median node heights in TreeAnnotator v1.8.1 (ref. 42). Supplementary Fig. 6 shows the full, calibrated Bayesian tree (Fig. 1) with 95% HPD confidence intervals on the node ages, and Supplementary Fig. 7 shows the distribution of estimated branching times, ranked by median age (using clade numbers from Fig. 1). All computations were carried out on 64-core PowerEdge M915 nodes on the Louise Linux cluster at the Yale University Biomedical High Performance Computing Center.

Data reporting

No statistical methods were used to predetermine sample size.