Abstract
Although reconstruction of the phylogeny of living birds has progressed tremendously in the last decade, the evolutionary history of Neoaves—a clade that encompasses nearly all living bird species—remains the greatest unresolved challenge in dinosaur systematics. Here we investigate avian phylogeny with an unprecedented scale of data: >390,000 bases of genomic sequence data from each of 198 species of living birds, representing all major avian lineages, and two crocodilian outgroups. Sequence data were collected using anchored hybrid enrichment, yielding 259 nuclear loci with an average length of 1,523 bases for a total data set of over 7.8 × 107 bases. Bayesian and maximum likelihood analyses yielded highly supported and nearly identical phylogenetic trees for all major avian lineages. Five major clades form successive sister groups to the rest of Neoaves: (1) a clade including nightjars, other caprimulgiforms, swifts, and hummingbirds; (2) a clade uniting cuckoos, bustards, and turacos with pigeons, mesites, and sandgrouse; (3) cranes and their relatives; (4) a comprehensive waterbird clade, including all diving, wading, and shorebirds; and (5) a comprehensive landbird clade with the enigmatic hoatzin (Opisthocomus hoazin) as the sister group to the rest. Neither of the two main, recently proposed Neoavian clades—Columbea and Passerea1—were supported as monophyletic. The results of our divergence time analyses are congruent with the palaeontological record, supporting a major radiation of crown birds in the wake of the Cretaceous–Palaeogene (K–Pg) mass extinction.
Similar content being viewed by others
Main
Birds (Aves) are the most diverse lineage of extant tetrapod vertebrates. They comprise over 10,000 living species2, and exhibit an extraordinary diversity in morphology, ecology, and behaviour3. Substantial progress has been made in resolving the phylogenetic history of birds. Phylogenetic analyses of both molecular and morphological data support the monophyletic Palaeognathae (the tinamous and flightless ratites) and Galloanserae (gamebirds and waterfowl) as successive, monophyletic sister groups to the Neoaves—a diverse clade including all other living birds4. Resolving neoavian phylogeny has proven to be a difficult challenge because this radiation was very rapid and deep in time, resulting in very short internodes4.
In the last decade, phylogenetic analyses of large, multilocus data sets have resulted in the proposal of numerous, novel neoavian relationships. For example, a clade consisting of diving and wading birds has been consistently recovered, as well as a large landbird clade in which falcons and parrots are successive sister groups to the perching birds4,5,6,7,8. Recently, phylogenetic analyses of 48 whole avian genomes resulted in the proposal of a novel phylogenetic resolution of the initial branching sequence within Neoaves1. Although this genomic study provided much needed corroboration of many neoavian clades, the limited taxon sampling precluded further insights into the evolutionary history of birds.
It has long been recognized that phylogenetic confidence depends not only on the number of characters analysed and their rate of evolution, but also on the number and relationships of the taxa sampled relative to the nodes of interest9,10,11. Theory predicts that sampling a single taxon that diverges close to a node of interest will have a far greater effect on phylogenetic resolution than will adding more characters11. Despite using an alignment of >40 million base pairs, sparse sampling of 48 species in the recent avian genomic analysis may not have been sufficient to confidently resolve the deep divergences among major lineages of Neoaves. Thus, expanded taxon sampling is required to test the monophyly of neoavian clades, and to further resolve the phylogenetic relationships within Neoaves.
Here, we present a phylogenetic analysis of 198 bird species and 2 crocodilians (Supplementary Table 1) based on loci captured using anchored enrichment12. Our sample includes species of 122 avian families in all 40 extant avian orders2, with denser representation of non-oscine birds (108 families) than of oscine songbirds (14 families). Effort was made to include taxa that would break up long phylogenetic branches, and provide the highest likelihood of resolving short internodes at the base of Neoaves11. We also sampled multiple species within groups whose monophyly or phylogenetic interrelationships have been controversial—that is, tinamous, nightjars, hummingbirds, turacos, cuckoos, pigeons, sandgrouse, mesites, rails, storm petrels, petrels, storks, herons, hawks, hornbills, mousebirds, trogons, kingfishers, barbets, seriemas, falcons, parrots, and suboscine passerines.
We targeted 394 loci centred on conserved anchor regions of the genome that are flanked by more variable regions12. We performed all phylogenetic analyses on a data set of 259 genes with the highest quality assemblies. The average locus was 1,524 bases in length (361–2,316 base pairs (bp)), and the total percentage of missing data was 1.84%. The concatenated alignment contained 394,684 sites. To minimize overall model complexity while accurately accounting for substitution processes, we performed a partition model sensitivity analysis with PartitionFinder13,14, and compared a complex partition model (one partition per locus) to a heuristically optimized (rclust) partition model. Phylogenetic informativeness (PI) approaches15,16 provided strong evidence that the phylogenetic utility of our data set was high, with low declines in PI profiles for individual loci, data set partitions, and the concatenated matrix (Supplementary Fig. 4). We estimated concatenated trees in ExaBayes17 and RAxML18 using a 75 partition model. Coalescent species trees were estimated with the gene tree summation methods in STAR19, NJst20, and ASTRAL21 from gene trees estimated with RAxML (see Methods.)
Our concatenated Bayesian analyses resulted in a completely resolved, well supported phylogeny. All clades had a posterior probability (PP) of 1, except for a single clade including shoebill (Balaeniceps) and pelican (PP = 0.54) (Fig. 1). The concatenated maximum likelihood analysis recovered a single topology that was identical to the Bayesian tree except for three clades, all of which are far from the base of Neoaves: the relationships among pigeons; among skimmers, gulls, and terns; and among pelicans, shoebill, and waders (Supplementary Fig. 1). Almost all clades in the maximum likelihood tree were maximally supported with bootstrap scores (BS) of 1.00, but nine clades within Neoaves (including four of the most inclusive neoavian clades) received support <0.70 (Supplementary Fig. 1). Coalescent species tree analyses produced substantially different hypotheses for neoavian relationships (Supplementary Fig. 3), but most of the discordant clades received conspicuously lower bootstrap support values (0.07 < BS < 0.30). Quantifying the phylogenetic informativeness of individual loci15,16 revealed that these low support values were not due to homoplasy driven by saturation of nucleotide states, but rather by the low power of individual loci to resolve the entire range of internode lengths across the depth of the tree (Supplementary Figs 4 and 5; see Methods). This result was not unexpected. The low phylogenetic information content of individual genes at deep timescales has been demonstrated to impede phylogenetic resolution in a coalescent species tree framework22,23. Furthermore, when clades with <0.75 bootstrap support values in the species trees are collapsed, the resulting topology is exactly congruent with the concatenated Bayesian tree (except for the relationships of tinamous among palaeognaths; Supplementary Fig. 3). Although coalescent species trees account for incomplete lineage sorting, simulations show that species tree methods based on gene tree summation may not provide significantly better performance over concatenation methods22.
Our phylogeny identifies many new clades, and supports many phylogenetic relationships proposed in previous studies (see detailed phylogenetic discussion in the Supplementary Information). Congruent with all recent studies, the phylogeny places palaeognaths as the sister group to the rest of birds, and the flying tinamous (Tinamidae) within the flightless ratites. This tree, however, places tinamous as the sister group to cassowary and emu alone (Fig. 1, grey). The phylogeny of Galloanserae is exactly congruent with previous studies4 (Fig. 1, red).
Within the monophyletic Neoaves, we recover five major clades, each of which is the successive sister group to the remaining clades in the series (Fig. 1). The Strisores includes the nightjars and their nocturnal relatives with the diurnal swifts and hummingbirds (Fig. 1, brown). Four nocturnal lineages—nightjars, a neotropical oilbird-potoo clade, frogmouths, and owlet-nightjars—form successive sister groups to the diurnal swift and hummingbird clade.
The Columbaves is a novel clade that consists of two monophyletic groups recently identified by Jarvis et al.1 (Fig. 1, purple). A clade consisting of turacos, bustards, and cuckoos (Otidimorphae) is sister to a clade consisting of pigeons as the sister group to sandgrouse and mesites (Columbimorphae). The third neoavian clade consists of a well recognized monophyletic group of core gruiform birds (Gruiformes; Fig. 1, yellow), with interrelationships that are consistent with previous phylogenies4.
The Aequorlitornithes is a novel, comprehensive clade of waterbirds, including all shorebirds, diving birds, and wading birds (Fig. 1, blue). Within this group, the flamingos and grebes1,4,5,6 are the sister group to shorebirds, and the sunbittern and tropicbirds1,4,6 are the sister group to the wading and diving birds (Fig. 1, blue). Other interrelationships within these groups are extensively congruent with the results in ref. 4 and the work of others (see Supplementary Information).
The fifth major neoavian clade, which we name Inopinaves, is a very diverse landbird clade with the same composition as previously recognized (Telluraves)1,4,5,6, but with the enigmatic, neotropical hoatzin (Opisthocomus hoazin) as the sister group to all other landbirds (Fig. 1, green). The phylogeny of the landbirds shares many points of congruence with earlier hypotheses, including the relationships of seriemas, falcons, parrots, and perching birds1,4,5,6, and the interrelationships among oscine songbirds24. However, we find that hawks (Accipitriformes) are the sister group to a new clade including the rest of the landbirds, to be called Eutelluraves (see Supplementary Information).
Our divergence time analyses employed 19 phylogenetically and geologically well-constrained fossil calibrations (following recently proposed best practices25), documenting many deep divergences within the avian crown group (Fig. 1, grey nodes; see Supplementary Information). Our analysis supports an extremely rapid radiation of the avian crown group in the wake of the K–Pg mass extinction event (Fig. 1, Supplementary Figs. 6 and 7). Although the post-K–Pg radiation hypothesis has long been strongly supported by the avian fossil record26,27, it has so far received little support from molecular divergence time analyses4,28. The tempo and mode of the extant avian radiation remains contentious. For example, an alternative calibration analysis including the fossil Vegavis did not support significantly different dates of divergence outside of the Galloanserae (see Supplementary Information and Supplementary Figs 10, 11, 12). Confident determination of the age of crown Aves will have to await discoveries of Mesozoic stem neognaths and palaeognaths, and detailed assessments of the influence of soft maximum bound parameterization on the age of the deepest avian divergences.
Our results indicate that the recent genome phylogeny1 may contain some erroneous relationships induced by long branch attraction from sparse taxon sampling. Maximum likelihood analysis of our sequence data pruned down to a phylogenetically equivalent subsample of 48 species produces relationships along the neoavian ‘backbone’ (Supplementary Fig. 8) that are entirely discordant with the phylogeny based on our full data set (Fig. 1). This reduced taxon analysis recovers some of the specific features of the recent genome phylogeny by Jarvis et al.1 (Supplementary Fig. 8): for example, the placement of the pigeons, mesites, and sandgrouse (a subclade of Columbea1) outside of the rest of the Neoaves. Differences in tree topology when taxa are excluded are to be expected if early internodes in Neoaves are very short. Adding taxa that have diverged near nodes of interest has been theoretically demonstrated to constrain the possible historical substitution patterns, and increase the accuracy of phylogenetic inference11. By increasing our taxon sampling to include all major avian lineages, we have minimized the possibility that additional taxon sampling alone will alter the relationships in our tree.
Jarvis et al.1 also identified a well supported clade consisting of the hoatzin (Opisthocomus) as the sister group to a crane (Grus) and a plover (Charadrius) (total evidence nucleotide tree, BS = 0.91, 0.96, respectively). However, Grus and Charadrius were the only species sampled from two very diverse neoavian orders: Gruiformes, 185 species; and Charadriiformes, 385 species2. Our results indicate that Opisthocomus is the most ancient bird lineage (∼ 64 million years) consisting of only a single, extant species. Thus, the three taxa placed in this assemblage by Jarvis et al.1 comprise three of the most ancient, and under-sampled lineages within all birds, indicating the strong possibility of long branch attraction artefacts. By contrast, these same groups are represented by 26 species in our analysis, and they do not form an exclusive clade (Fig. 1).
In addition to providing a new backbone for comprehensive avian supertrees and comparative evolutionary analyses28, this new avian phylogeny supports many interesting hypotheses about avian evolution. This phylogeny upholds the hypothesis that the ancestor of the diurnal swifts and hummingbirds evolved from a clade that had been predominantly nocturnal for ∼10 million years. Although hummingbirds have acute near-ultraviolet vision29, the effect of extended ancestral nocturnality on the evolution of the visual system in this group of birds is unknown. Our findings also support the emerging pattern that landbirds evolved from a raptorial grade1. The sister group relationships of hawks to the rest of the landbirds, of owls to the diverse coraciimorph clade, and of seriemas and falcons to the parrots and passerines indicate the persistence of a raptorial ecology among ancestral landbirds. Lastly, the identification of a new, broadly comprehensive waterbird–shorebird clade indicates a striking, and previously unappreciated, level of evolutionary constraint on the ecological diversification of birds that will be exciting to investigate in the future.
Methods
Locus selection and probe design
Anchor loci described in ref. 12 were extended such that each contained approximately 1,350 bp. In some cases neighbouring loci were joined to form a single locus. Also, loci that performed poorly in ref. 12 were removed from the locus set. This process produced 394 loci (referred to as the version 2 vertebrate loci). Genome coordinates corresponding to these regions in the Gallus gallus genome (galGal3, UCSC genome browser) were identified and sequences corresponding to this region were extracted (coordinates are available in the Zenodo archive (http://dx.doi.org/10.5281/zenodo.28343)). In order to improve the capture efficiency for passerines, we also obtained homologous sequences for Taeniopygia guttata. After aligning the Gallus and Taeniopygia sequences using MAFFT31, alignments were trimmed to produce the final probe region alignments (alignments available in the Zenodo archive), and probes were tiled at approximately 1.5 × tiling density (probe specification will be made available upon publication).
Data collection
Data were collected following the general methods of ref. 12 through the Center for Anchored Phylogenomics at Florida State University (http://www.anchoredphylogeny.com). Briefly, each genomic DNA sample was sonicated to a fragment size of ∼150–350 bp using a Covaris E220 focused-ultrasonicator with Covaris microTUBES. Subsequently, library preparation and indexing were performed on a Beckman-Coulter Biomek FXp liquid-handling robot following a protocol modified from ref. 32. One important modification is a size-selection step after blunt-end repair using SPRIselect beads (Beckman-Coulter; 0.9 × ratio of bead to sample volume). Indexed samples were then pooled at equal quantities (typically 12–16 samples per pool), and enrichments were performed on each multi-sample pool using an Agilent Custom SureSelect kit (Agilent Technologies), designed as specified above. After enrichment, the 12 enrichment pools were pooled in groups of three in equal quantities for sequencing on four PE150 Illumina HiSeq2000 lanes (three enrichment pools per lane). Sequencing was performed in the Translational Science Laboratory in the College of Medicine at Florida State University.
Data processing
Paired-read merging (Merge.java). Typically, between 50% and 75% of sequenced library fragments had an insert size between 150 bp and 300 bp. As 150 bp paired-end sequencing was performed, this means that the majority of the paired reads overlap and thus should be merged before assembly. The overlapping reads were identified and merged following the methods of ref. 33. In short, for each degree of overlap for each read we computed the probability of obtaining the observed number of matches by chance, and selected degree of overlap that produced the lowest probability, with a P value less than 10−10 required to merge reads. When reads are merged, mismatches are reconciled using base-specific quality scores, which were combined to form the new quality scores for the merged read (see ref. 33 for details). Reads failing to meet the probability criterion were kept separate but still used in the assembly. The merging process produces three files: one containing merged reads and two containing the unmerged reads.
Assembly (Assembler.java). The reads were assembled into contigs using an assembler that makes use of both a divergent reference assembly approach to map reads to the probe regions and a de novo assembly approach to extend the assembly into the flanks. The reference assembler uses a library of spaced 20-mers derived from the conserved sites of the alignments used during probe design. A preliminary match was called if at least 17 of 20 matches exist between a spaced kmer and the corresponding positions in a read. Reads obtaining a preliminary match were then compared to an appropriate reference sequence used for probe design to determine the maximum number of matches out of 100 consecutive bases (all possible gap-free alignments between the read and the reference ware considered). The read was considered mapped to the given locus if at least 55 matches were found. Once a read is mapped, an approximate alignment position was estimated using the position of the spaced 20-mer, and all 60-mers existing in the read were stored in a hash table used by the de novo assembler. The de novo assembler identifies exact matches between a read and one of the 60-mers found in the hash table. Simultaneously using the two levels of assembly described above, the three read files were traversed repeatedly until an entire pass through the reads produced no additional mapped reads.
For each locus, mapped reads were then clustered into clusters using 60-mer pairs observed in the reads mapped to that locus. In short, a list of all 60-mers found in the mapped reads was compiled, and the 60-mers were clustered if found together in at least two reads. The 60-mer clusters were then used to separate the reads into clusters for contig estimation. Relative alignment positions of reads within each cluster were then refined in order to increase the agreement across the reads. Up to one gap was also inserted per read if needed to improve the alignment. Note that given sufficient coverage and an absence of contamination, each single-copy locus should produce a single assembly cluster. Low coverage (leading to a break in the assembly), contamination, and gene duplication, can all lead to an increased number of assembly clusters. A whole-genome duplication, for example, would increase the number of clusters to two per locus.
Consensus bases were called from assembly clusters as follows. For each site an unambiguous base was called if the bases present were identical or if the polymorphism of that site could be explained as sequencing error, assuming a binomial probability model with the probability of error equal to 0.1 and alpha equal to 0.05. If the polymorphism could not be explained as sequencing error, the ambiguous base was called that corresponded to all of the observed bases at that site (for example, ‘R’ was used if ‘A’ and ‘G’ were observed). Called bases were soft-masked (made lowercase) for sites with coverage lower than five. A summary of the assembly results is presented in a spreadsheet in the electronic data archive (http://dx.doi.org/10.5281/zenodo.28343; Prum_AssemblySummary_Summary.xlsx).
Contamination filtering (IdentifyGoodSeqsViaReadsMapped.r, GatherALLConSeqsWithOKCoverage.java). In order to filter out possible low-level contaminants, consensus sequences derived from very low coverage assembly clusters (<10 reads) were removed from further analysis. After filtering, consensus sequences were grouped by locus (across individuals) in order to produce sets of homologues.
Orthology (GetPairwiseDistanceMeasures.java, plotMDS5.r). Orthology was then determined for each locus as follows. First, a pairwise distance measure was computed for pairs of homologues. To compute the pairwise distance between two sequences, we computed the percent of 20-mers observed in the two sequences that were found in both sequences. Note that the list of 20-mers was constructed from consecutive 20-mers as well as spaced 20-mers (every third base), in order to allow increased levels of sequence divergence. Using the distance matrix, we clustered the sequences using a neighbour-joining algorithm, but allowing at most one sequence per species to be in a given cluster. Clusters containing fewer than 50% of the species were removed from downstream processing.
Alignment (MAFFT). Sequences in each orthologous set were aligned using MAFFT v7.023b31 with “–genafpair” and “–maxiterate 1000” flags.
Alignment Trimming (TrimAndMaskRawAlignments3). The alignment for each locus was then trimmed/masked using the following procedure. First, each alignment site was identified as ‘good’ if the most common character observed was present in >40% of the sequences. Second, 20 bp regions of each sequence that contained <10 good sites were masked. Third, sites with fewer than 12 unmasked bases were removed from the alignment. Lastly, entire loci were removed if both outgroups or more than 40 taxa were missing. This filter yielded 259 trimmed loci containing fewer than 2.5% missing characters overall.
Model selection and phylogenetic inference
To minimize the overall model complexity while accurately accounting for substitution processes, we performed a partition-model sensitivity analysis with the development version of PartitionFinder v2.0 (ref. 13), sensu14, and compared a complex partition-model (one partition per gene) to a heuristically optimized (relaxed clustering with the RAxML option for accelerated model selection) partition-model using BIC. Based on a candidate pool of potential partitioning strategies that spanned a single partition for the entire data set to a model allowing each locus to represent a unique partition, the latter approach suggested that 75 partitions of our data set represented the best-fitting partitioning scheme, which reduced the number of necessary model parameters by 71%, and hugely decreased computation time.
We analysed each individual locus in RAxML v8.0.20 (ref. 18), and then the concatenated alignment, using the two partitioning strategies identified above with both maximum likelihood and Bayesian based approaches in RAxML v8.0.20, and ExaBayes v1.4.2 9 (ref. 34). For each RAxML analysis, we executed 100 rapid bootstrap inferences and thereafter a thorough ML search using a GTR+Γ4 model of nucleotide substitution for each data set partition. Although this may potentially over-parameterize a partition with respect to substitution model, the influence of this form of model over-parameterization has been found to be negligible in phylogenetic inference35. For the Bayesian analyses, we ran four Metropolis-coupled ExaBayes replicates for 10 million generations, each with three heated chains, and sampling every 1,000 generations (default tuning and branch swap parameters; branch lengths among partitions were linked). Convergence and proper sampling of the posterior distribution of parameter values were assessed by checking that the effective sample sizes of all estimated parameters and branch lengths were greater than 200 in the Tracer v1.6 software36 (most were greater than 1,000), and by using the ‘sdsf’ and ‘postProcParam’ tools included with the ExaBayes package to ensure the average standard deviation of split frequencies and potential scale reduction factors across runs were close to zero and one, respectively. Finally, to check for convergence in topology and clade posterior probabilities, we summarized a greedily refined majority-rule consensus tree (default) from 10,000 post burn-in trees using the ExaBayes ‘consense’ tool for each run independently and then together. Analyses of the reduced data set referenced in the main text were conducted using the same partition-model as the full data set.
To explore variation in gene tree topology and to look for outliers that might influence combined analysis, we calculated pairwise Robinson-Foulds37 (RF) and Matching Splits (MS) tree distances implemented in TreeCmp38. We then visualized histograms of tree distances and multidimensional scaling plots in R, and estimated neighbour-joining ‘trees-of-trees’ in the Phangorn R package sensu lato39,40. Using RF and MS distances, outlier loci were identified as those that occurred in the top 10% of pairwise distances for >30 comparisons to other loci (∼10%) in the data set. We also identified putative outlier loci using the kdetrees.complete function of the kdetrees R package41. All three methods identified 13 of the same loci as potential outliers; however removal of these loci from the analysis had no effect on estimating topology or branch lengths.
Coalescent species tree analyses
Although fully parametric estimation (for example, *BEAST, see ref. 42) of a coalescent species tree with hundreds of genes and hundreds of taxa is not currently possible, we estimated species trees using three gene-tree summation methods that have been shown to be statistically consistent under the multispecies coalescent model43. First, we used the STRAW web server44 to estimate bootstrapped species trees using the STAR19 and NJ-ST20 algorithms (also available through STRAW). The popular MP-EST45 method cannot currently work for more than ∼50 taxa. STAR takes rooted gene trees and uses the average ranks of coalescence times19 to build a distance matrix from which a species tree is computed with the neighbour-joining method46. By contrast, NJst applies the neighbour-joining method to a distance matrix computed from average gene-tree internode distances, and relaxes the requirement for input gene trees to be rooted20.
We also summarized a species tree with the ASTRAL 4.7.6 algorithm. With simulated data, ASTRAL has been shown to outperform concatenation or other summary methods under certain amounts of incomplete lineage sorting21. For very large numbers of taxa and genes, ASTRAL uses a heuristic search to find the species tree that agrees with the largest number of quartet trees induced by the set of input gene trees. For analysis with ASTRAL, we also attempted to increase the resolution of individual gene trees (Supplementary Fig. 2) by generating supergene alignments using the weighted statistical binning pipeline of refs 47, 48 with a bootstrap score of 0.75 as a bin threshold.
STAR, NJst (not shown), and the binned ASTRAL (Supplementary Fig. 3) analysis produced virtually identical inferences when low support branches (<0.75) were collapsed, and differed only with respect to the resolution of a few branches. NJst resolved the Passeroidea (Fringilla plus Spizella) as the sister group to a paraphyletic sample of Sylvioidea (Calandrella, Pycnonotus, and Sylvia), while STAR does not resolve this branch. Comparing STAR/NJst to ASTRAL, we find five additional differences: (1) within tinamous, STAR/NJst resolves Crypturellus as sister to the rest of the tinamous, whereas ASTRAL resolves Crypturellus as sister to Tinamus (similar to ExaBayes/RAxML); (2) STAR/NJst resolves pigeons as sister to a clade containing Mesitornithiformes and Pteroclidiformes, while ASTRAL does not resolve these relationships; (3), STAR/NJst fails to resolve Oxyruncus and Myiobius as sister genera, while ASTRAL does (similar to RAxML/ExaBayes); (4), in STAR/NJst, bee-eaters (Merops) are resolved as the sister group to coraciiforms (congruent with ref. 4), while ASTRAL resolves bee-eaters as sister to the rollers (Coracias) (similar to RAxML/ExaBayes); (5) lastly, in STAR/NJst, buttonquail (Turnix) is resolved as sister to the most inclusive clade of Charadriiformes not including Burhinus, Charadrius, Haematopus, and Recurvirostra, while in ASTRAL, buttonquail is resolved as sister to a clade containing Glareola, Uria, Rynchops, Sterna, and Chroicocephalus (similar to RAxML/ExaBayes).
Although lower level relationships detected with concatenation are generally recapitulated in the species trees, few of the higher level, or interordinal, relationships are resolved. This lack of resolution of the gene-tree species-tree based inferences relative to the inferences based on concatenation are not surprising, as it is increasingly recognized that the phylogenetic information content required to resolve the gene-tree histories of individual loci becomes scant at deep timescales47. Despite our extensive taxon sampling and the slow rate of nucleotide substitution that characterizes loci captured using anchored enrichment12, no single locus was able to fully resolve a topology, and this lack of information will challenge the accuracy of any coalescent-based summary approach relative to concatenation49,50,51,52,53,54. Finally, all summation methods tested here assume a priori that the only source of discordance among gene trees is deep coalescence, and violations of this assumption may introduce systematic error in phylogeny estimation54.
Phylogenetic informativeness
Site-specific evolutionary rates, λi…j, were calculated for each locus using the program HyPhy55 in the PhyDesign web interface56 in conjunction with a guide chronogram generated by a nonparametric rate smoothing algorithm57 applied to our concatenated RAxML tree. Using these rates to predict whether an alignment will yield correct, incorrect, or no resolution of a given node, we quantified the probability of phylogenetically informative changes (ψ)16 contributing to the resolution of the earliest divergences in Neoaves. Estimates generated under a three character state model58 reveal that the majority of loci have a strong probability of ψ, and suggest a high potential for most loci and partitions containing multiple loci (assigned by PartitionFinder) to correctly resolve this internode. The potential for resolution as a consequence of phylogenetic signal is therefore high relative to the potential for saturation and misleading inference induced by stochastic changes along the subtending lineages (Supplementary Fig. 4a).
To assess the information content of the loci across the entire topology, we profiled their phylogenetic informativeness (PI)15, (Supplementary Fig. 4b). There was considerable variation in PI across loci (Supplementary Fig. 4). In all cases, the loci with the lowest values of ψ are categorized by substantially lower (60–90%) values of PI, rather than sharp declines in their PI profiles. The absence of a sharp decline in the PI profile suggests that a lack of phylogenetic information, rather than rapid increases in homoplasious sites, underlie low values of the probability of signal ψ59.
Because declines in PI can be attributed to increases in homoplasious site patterns59, we further assessed the phylogenetic utility of data set partitions by quantifying the ratio of PI at the most recent common ancestor of Neoaves to the PI at the most recent common ancestor of Aves (Supplementary Fig. 4c). Values of this ratio that are less than 1 correspond to a rise in PI towards the root. Values close to 1 correspond to fairly uniform PI. Values greater than 1 correspond to a decline in PI towards the root. Sixty-six out of 75 partitions demonstrated less than a 50% percent decline in PI, and only six partitions demonstrated a decline of PI greater than 75% (Supplementary Fig. 4c). As all but a few nodes in this study represent divergences younger than the crown of Neoaves, these ratios of PI suggest that the predicted impact of homoplasy on our topological inferences should be minimal.
As PI profiles do not directly predict the impact of homoplasious site patterns on topological resolution16,60, we evaluated probabilities of ψ for focal nodes using both the concatenated data set as well as individual loci that span the variance in locus lengths. Concordant with expectations from the PI profiles, all quantifications strongly support the prediction that homoplasy will have a minimal impact on topological resolution for the concatenated data set across a range of tree depths and internode distances (ψ = 1.0 for all nodes), while individual loci vary in their predicted utility (Supplementary Fig. 4d). As the guide tree does not represent a true known tree, we additionally quantified ψ across a range of tree depths and internode distances to test if our predictions of utility are in line with general trends in the data. Concordant with our results above, the concatenated data set is predicted to be of high phylogenetic utility at all timescales (ψ = 1.0 for all nodes), while the utility of individual loci begins to decline for small internodes at deep tree depths (Supplementary Fig. 5).
Estimating a time-calibrated phylogeny
We estimated a time-calibrated tree with a node dating approach in BEAST 1.8.1 (ref. 42) that used 19 well justified fossil calibrations phylogenetically placed by rigorous, apomorphy-based diagnoses (see the descriptions of avian calibration fossils in the Supplementary Information). We used a starting tree topology based on the ExaBayes inference (Fig. 1), and prior node age calibrations that followed a lognormal parametric distribution based on occurrences of fossil taxa. To prevent BEAST from exploring topology space and only allow estimation of branch lengths, we turned off the subtree-slide, Wilson–Balding, and narrow and wide exchange operators61,62. Finally, we applied a birth–death speciation model with default priors.
As rates of molecular evolution are significantly variable across certain bird lineages63,64,65, we applied an uncorrelated relaxed clock (UCLN) to each partition of the data set where rates among branches are distributed according to a lognormal distribution66. All dating analyses were performed without crocodilian outgroups to reduce the potential of extreme substitution rate heterogeneity to bias rate and consequent divergence time estimates of the UCLN model67.
All calibrations were modelled using soft maximum age bounds to allow for the potential of our data to overwhelm our user-specified priors68. Soft maximum bounds are the preferred method for assigning upper limits on the age of phylogenetic divergences69. As effective priors necessarily reflect interactions between user specified priors, topology, and the branching-model, they may not precisely reflect the user-specified priors70. To correct for this potential source of error, we carefully examined the effective calibration priors by first running the prepared BEAST XML without any nucleotide data (until all ESS values were above 200). We then iteratively adjusted our user-defined priors until all of the effective priors (as examined in the Tracer software) reflected the intended calibration densities. Finally, using the compare.phylo function in the Phyloch R package, we examined how the inclusion of molecular data influenced the divergence time estimates relative to the effective prior (Supplementary Fig. 9; see below).
Defining priors
Our initial approach was to set a prior’s offset to the age of its associated fossil; the mean was then manually adjusted such that 95% of the calibration density fell more recently than the K–Pg boundary at 65 Ma (million years ago) (the standard deviation was fixed at 1 Ma). In general, priors constructed this way generated calibration densities that specified their highest density peak (their mode) about 3–5 million years older than the age of the offset.
We applied a loose gamma prior to the node reflecting the most recent common ancestor of crown birds—we used an offset of 60.5 Ma (the age of the oldest known definitive, uncontroversial crown bird fossil; the stem penguin Waimanu), and adjusted the scale and shape of the prior such that 97.5% of the calibration density fell more recently than 86.5 Ma71 (see below and Supplementary Information for discussion of the >65 Ma putative crown avian Vegavis). This date (86.5 Ma) reflects the upper bound age estimate of the Niobrara Formation—one of many richly fossiliferous Mesozoic deposits exhibiting many crownward Mesozoic stem birds, without any trace of avian crown group representatives. The Niobrara, in particular, has produced hundreds of stem birds and other fragile skeletons, without yielding a single crown bird fossil, and therefore represents a robust choice for a soft upper bound for the root divergence of the avian crown71,72,73. Previous soft maxima employed for this divergence have arbitrarily selected the age of other Mesozoic stem avians (that is, Gansus yumenensis, 110 Ma) that are phylogenetically stemward of the Niobrara taxa28. Although the implementation of very ancient soft maxima such as the age of Gansus are often done in the name of conservatism, the extremely ancient divergence dates yielded by such analyses illustrate the misleading influence of assigning soft maxima that are vastly too old to be of relevance to the divergence of crown group birds74. However, this problem has been eliminated in some more recent analyses75.
All of the fossil calibrations employed in our analysis represent neognaths; rootward divergences within Aves (for example the divergence between Palaeognathae and Neognathae, and Galloanserae and Neoaves) cannot be confidently calibrated due to a present lack of fossils representing the palaeognath, neognath, galloanserine, and neoavian stem groups. As such, the K–Pg soft bound was only applied to comparatively apical divergences within neognaths. Although the question of whether major neognath divergences occurred during the Mesozoic has been the source of controversy76,77,78, renewed surveys of Mesozoic sediments for definitive crown avians or even possible crown neoavians have been unsuccessful (with the possible exception of Vegavis; see Supplementary Information), and together with recent divergence dating analyses have cast doubt on the presence of neoavian subclades before the K–Pg mass extinction1,74,79. Further, recent work has demonstrated the tendency of avian divergence estimates to greatly exceed uninformative priors, resulting in spuriously ancient divergence dating results (for example, refs 28, 75, 76, 80). These results motivated our implementation of the 65 Ma soft bound for our neoavian calibrations.
Contrary to expectation, when we compared the effective prior on the entire tree to the final summary derived from the posterior distribution of divergence times (Supplementary Fig. 9), we found no overall trend of posterior estimated ages post-dating prior calibrations. In fact, the inclusion of our molecular data decreases the inferred ages of almost all of the deepest nodes in our tree. A similar result has been obtained for mammals by using large amounts of nuclear DNA sequences81. Future work investigating the interplay of the density of genomic sampling and the application of various calibration age priors will be indispensible for sensitivity analyses to help us further develop a robust timescale of avian evolution. However, the pattern of posterior versus prior age estimates observed in our study raises the prospect that the new class of data used in this study (that is, semi-conserved anchor regions) may exhibit some immunity to longstanding problems associated with inferring avian divergence times, such as systematically over-estimating the antiquity of extant avian clades.
Implementing BEAST and summarizing a final calibrated tree
In addition to making predictions about the phylogenetic utility of a locus or partition towards topological resolution, PI profiles have recently also been used to mitigate the influence of substitution saturation on divergence time estimates82. Given the variance in PI profile shapes for captured loci and their subsequent partition assignments (Supplementary Fig. 4c), and observations that alignments and subsets of data alignments characterized by high levels of homoplasy can mislead branch length estimation83,84, we limited our divergence time estimates to 36 partitions that did not exhibit a decline in informativeness towards the root of the tree. We ran BEAST on each partition separately until parameter ESS values were greater than 200 (most were greater than 1,000) to ensure adequate posterior sampling of each parameter value. After concatenating 10,000 randomly sampled post burn-in trees from each of these completed analyses, we summarized a final MCC tree with median node heights in TreeAnnotator v1.8.1 (ref. 42). Supplementary Fig. 6 shows the full, calibrated Bayesian tree (Fig. 1) with 95% HPD confidence intervals on the node ages, and Supplementary Fig. 7 shows the distribution of estimated branching times, ranked by median age (using clade numbers from Fig. 1). All computations were carried out on 64-core PowerEdge M915 nodes on the Louise Linux cluster at the Yale University Biomedical High Performance Computing Center.
Data reporting
No statistical methods were used to predetermine sample size.
Change history
12 October 2015
The Supplementary Table 1 file was uploaded on 12 October 2015 as it was omitted at the time of online publication.
27 October 2015
The PDF was replaced with a higher-resolution version on October 27.
References
Jarvis, E. D. et al. Whole-genome analyses resolve early branches in the tree of life of modern birds. Science 346, 1320–1331 (2014)
Gill, F. & Donsker, D. IOC World Bird List (v5.1) http://dx.doi.org/10.14344/IOC.ML.5.1 (2015)
Gill, F. B. Ornithology 2nd edn (W. H. Freeman and Co., 1995)
Hackett, S. J. et al. A phylogenomic study of birds reveals their evolutionary history. Science 320, 1763–1768 (2008)
Ericson, P. G. P. et al. Diversification of Neoaves: integration of molecular sequence data and fossils. Biol. Lett. 2, 543–547 (2006)
McCormack, J. E. et al. A phylogeny of birds based on over 1,500 loci collected by target enrichment and high-throughput sequencing. PLoS ONE 8, e54848 (2013)
Mayr, G. Paleogene Fossil Birds (Springer, 2009)
Mayr, G. Metaves, Mirandornithes, Strisores and other novelties — a critical review of the higher-level phylogeny of neornithine birds. J. Zoological Syst. Evol. Res. 49, 58–76 (2011)
Graybeal, A. Is it better to add taxa or characters to a difficult phylogenetic problem? Syst. Biol. 47, 9–17 (1998)
Heath, T. A., Hedtke, S. M. & Hillis, D. M. Taxon sampling and the accuracy of phylogenetic analyses. Journal of Systematics and Evolution 46, 239–257 (2008)
Townsend, J. P. & Lopez-Giraldez, F. Optimal selection of gene and ingroup taxon sampling for resolving phylogenetic relationships. Syst. Biol. 59, 446–457 (2010)
Lemmon, A. R., Emme, S. A. & Lemmon, E. M. Anchored hybrid enrichment for massively high-throughput phylogenomics. Syst. Biol. 61, 727–744 (2012)
Lanfear, R., Calcott, B., Ho, S. Y. & Guindon, S. PartitionFinder: combined selection of partitioning schemes and substitution models for phylogenetic analyses. Mol. Biol. Evol. 29, 1695–1701 (2012)
Berv, J. S. & Prum, R. O. A comprehensive multilocus phylogeny of the neotropical cotingas (Cotingidae, Aves) with a comparative evolutionary analysis of breeding system and plumage dimorphism and a revised phylogenetic classification. Mol. Phylogenet. Evol. 81, 120–136 (2014)
Townsend, J. P. Profiling phylogenetic informativeness. Syst. Biol. 56, 222–231 (2007)
Townsend, J. P., Su, Z. & Tekle, Y. I. Phylogenetic signal and noise: predicting the power of a data set to resolve phylogeny. Syst. Biol. 61, 835–849 (2012)
Aberer, A. J., Kobert, K. & Stamatakis, A. ExaBayes: massively parallel Bayesian tree inference for the whole-genome era. Mol. Biol. Evol. 31, 2553–2556 (2014)
Stamatakis, A. RAxML version 8: a tool for phylogenetic analysis and post-analysis of large phylogenies. Bioinformatics 30, 1312–1313 (2014)
Liu, L., Yu, L., Pearl, D. K. & Edwards, S. V. Estimating species phylogenies using coalescence times among sequences. Syst. Biol. 58, 468–477 (2009)
Liu, L. & Yu, L. Estimating species trees from unrooted gene trees. Syst. Biol. 60, 661–667 (2011)
Mirarab, S. et al. ASTRAL: genome-scale coalescent-based species tree estimation. Bioinformatics 30, i541–i548 (2014)
Tonini, J., Moore, A., Stern, D., Shcheglovitova, M. & Ortí, G. Concatenation and species tree methods exhibit statistically indistinguishable accuracy under a range of simulated conditions. PLOS Currents Tree of Life 1 http://dx.doi.org/10.1371/currents.tol.34260cc27551a527b124ec5f6334b6be (2015)
Mirarab, S., Bayzid, M. S. & Warnow, T. Evaluating summary methods for multi-locus species tree estimation in the presence of incomplete lineage sorting. Syst. Biol. http://dx.doi.org/10.1093/sysbio/syu063 (2014)
Barker, F. K., Cibois, A., Schikler, P., Felsenstein, J. & Cracraft, J. Phylogeny and diversification of the largest avian radiation. Proc. Natl Acad. Sci. USA 101, 11040–11045 (2004)
Parham, J. F. et al. Best practices for justifying fossil calibrations. Syst. Biol. 61, 346–359 (2012)
Longrich, N. R., Tokaryk, T. & Field, D. J. Mass extinction of birds at the Cretaceous–Paleogene (K–Pg) boundary. Proc. Natl Acad. Sci. USA 108, 15253–15257 (2011)
Feduccia, A. The Origin and Evolution of Birds 2nd edn (Yale Univ. Press, 1999)
Jetz, W., Thomas, G. H., Joy, J. B., Hartmann, K. & Mooers, A. O. The global diversity of birds in space and time. Nature 491, 444–448 (2012)
Goldsmith, T. H. Hummingbirds see near ultraviolet light. Science 207, 786–788 (1980)
del Hoyo, J., Elliott, A., Sargatal, J., Christie, D. A. & de Juana, E. Handbook of the Birds of the World Alive (Lynx Edicions, 2015)
Katoh, K. & Standley, D. M. MAFFT multiple sequence alignment software version 7: improvements in performance and usability. Mol. Biol. Evol. 30, 772–780 (2013)
Meyer, M. & Kircher M Illumina sequencing library preparation for highly multiplexed target capture and sequencing. Cold Spring Harb Protoc. http://dx.doi.org/10.1101/pdb.prot5448 (2010)
Rokyta, D. R., Lemmon, A. R., Margres, M. J. & Arnow, K. The venom-gland transcriptome of the eastern diamondback rattlesnake (Crotalus adamanteus). BMC Genomics 13, 312 (2012)
Misof, B. et al. Phylogenomics resolves the timing and pattern of insect evolution. Science 346, 763–767 (2014)
Dornburg, A., Santini, F. & Alfaro, M. E. The influence of model averaging on clade posteriors: an example using the triggerfishes (Family Balistidae). Syst. Biol. 57, 905–919 (2008)
Tracer. v1.6. http://beast.bio.ed.ac.uk/Tracer (2014)
Robinson, D. F. & Foulds, L. R. in Combinatorial Mathematics VI in Lecture Notes in Mathematics, Vol. 748 (eds Horadam A. F. & Wallis W. D. ) Ch. 12 119–126 (Springer, 1979)
Bogdanowicz, D., Giaro, K. & Wróbel, B. TreeCmp: comparison of trees in polynomial time. Evol. Bioinform. 8, 475–487 (2012)
Nye, T. M. W. Trees of Trees: an approach to comparing multiple alternative phylogenies. Syst. Biol. 57, 785–794 (2008)
Schliep, K. P. phangorn: phylogenetic analysis in R. Bioinformatics 27, 592–593 (2011)
Weyenberg, G., Huggins, P. M., Schardl, C. L., Howe, D. K. & Yoshida, R. KDETREES: non-parametric estimation of phylogenetic tree distributions. Bioinformatics 30, 2280–2287 (2014)
Drummond, A. J., Suchard, M. A., Xie, D. & Rambaut, A. Bayesian phylogenetics with BEAUti and the BEAST 1.7. Mol. Biol. Evol. 29, 1969–1973 (2012)
Rannala, B. & Yang, Z. Bayes estimation of species divergence times and ancestral population sizes using DNA sequences from multiple loci. Genetics 164, 1645–1656 (2003)
Shaw, T. I., Ruan, Z., Glenn, T. C. & Liu, L. STRAW: species tree analysis web server. Nucleic Acids Res. 41, W238–W241 (2013)
Liu, L., Yu, L. & Edwards, S. A maximum pseudo-likelihood approach for estimating species trees under the coalescent model. BMC Evol. Biol. 10, 302 (2010)
Saitou, N. & Nei, M. The neighbor-joining method: a new method for reconstructing phylogenetic trees. Mol. Biol. Evol. 4, 406–425 (1987)
Mirarab, S., Bayzid, M. S., Boussau, B. & Warnow, T. Statistical binning enables an accurate coalescent-based estimation of the avian tree. Science 346, (2014)
Mirarab, S., Bayzid, M. S. & Warnow, T. Evaluating summary methods for multilocus species tree estimation in the presence of incomplete lineage sorting. Syst. Biol. (2014)
Bayzid, M. S. & Warnow, T. Naive binning improves phylogenomic analyses. Bioinformatics 29, 2277–2284 (2013)
DeGiorgio, M. & Degnan, J. H. Fast and consistent estimation of species trees using supermatrix rooted triples. Mol. Biol. Evol. 27, 552–569 (2010)
Kimball, R. T., Wang, N., Heimer-McGinn, V., Ferguson, C. & Braun, E. L. Identifying localized biases in large datasets: a case study using the avian tree of life. Mol. Phylogenet. Evol. 69, 1021–1032 (2013)
McCormack, J. E. et al. A phylogeny of birds based on over 1,500 loci collected by target enrichment and high-throughput sequencing. PLoS ONE 8, e54848 (2013)
Springer, M. S. & Gatesy, J. Land plant origins and coalescence confusion. Trends Plant Sci. 19, 267–269 (2014)
Tonini J, Moore A, Stearn D, Shcheglovitova M & Ortí, G. Concatenation and species tree methods exhibit statistically indistinguishable accuracy under a range of simulated conditions. PLOS Currents Tree of Life 1, (2015)
Pond, S. L. K. & Muse, S. V. in Statistical Methods in Molecular Evolution (ed. Nielsen, R. ) 125–181 (Springer, 2005)
López-Giráldez, F. & Townsend, J. P. PhyDesign: an online application for profiling phylogenetic informativeness. BMC Evol. Biol. 11, 152 (2011)
Sanderson, M. A nonparametric approach to estimating divergence times in the absence of rate constancy. Mol. Biol. Evol. 14, 1218 (1997)
Simmons, M. P., Carr, T. G. & O’Neill, K. Relative character-state space, amount of potential phylogenetic information, and heterogeneity of nucleotide and amino acid characters. Mol. Phylogenet. Evol. 32, 913–926 (2004)
Townsend, J. P. & Leuenberger, C. Taxon sampling and the optimal rates of evolution for phylogenetic inference. Syst. Biol. 60, 358–365 (2011)
Klopfstein, S., Kropf, C. & Quicke, D. L. J. An evaluation of phylogenetic informativeness profiles and the molecular phylogeny of Diplazontinae (Hymenoptera, Ichneumonidae). Syst. Biol. 59, 226–241 (2010)
Drummond, A. J. & Bouckaret, R. R. Bayesian Evolutionary Analysis With BEAST (Cambridge Univ. Press, 2015)
Hsiang, A. Y. et al. The origin of snakes: revealing the ecology, behavior, and evolutionary history of early snakes using genomics, phenomics, and the fossil record. BMC Evol. Biol. 15, 87 (2015)
Phillips, M. J., Gibb, G. C., Crimp, E. A. & Penny, D. Tinamous and moa flock together: mitochondrial genome sequence analysis reveals independent losses of flight among ratites. Syst. Biol. 59, 90–107 (2010)
Pereira, S. L. & Baker, A. J. A mitogenomic timescale for birds detects variable phylogenetic rates of molecular evolution and refutes the standard molecular clock. Mol. Biol. Evol. 23, 1731–1740 (2006)
Nam, K. et al. Molecular evolution of genes in avian genomes. Genome Biol. 11, R68 (2010)
Drummond, A. J., Ho, S. Y. W., Phillips, M. J. & Rambaut, A. Relaxed phylogenetics and dating with confidence. PLoS Biol. 4, e88 (2006)
Dornburg, A., et al. Relaxed clocks and inferences of heterogeneous patterns of nucleotide substitution and divergence time estimates across whales and dolphins (Mammalia: Cetacea). Mol. Biol. Evol. 29, 721–736 (2012)
Yang, Z. & Rannala, B. Bayesian estimation of species divergence times under a molecular clock using multiple fossil calibrations with soft bounds. Mol. Biol. Evol. 23, 212–226 (2006)
Ho, S. Y. W. Calibrating molecular estimates of substitution rates and divergence times in birds. J. Avian Biol. 38, 409–414 (2007)
Heled, J. & Drummond, A. J. Calibrated tree priors for relaxed phylogenetics and divergence time estimation. Syst. Biol. 61, 138–149 (2012)
Benton, M. J. & Donoghue, P. C. J. Paleontological evidence to date the tree of life. Mol. Biol. Evol. 24, 26 (2007)
Clarke, J. A. Morphology, phylogenetic taxonomy, and systematics of Ichthyornis and Apatornis (Avialae: Ornithurae). Bull. Am. Mus. Nat. Hist. 286, 1–179 (2004)
Field, D. J., LeBlanc, A., Gau, A. & Behlke, A. D. B. Pelagic neonatal fossils support viviparity and precocial life history of Cretaceous mosasaurs. Palaeontology 58, 401–407 (2015)
Mayr, G. The age of the crown group of passerine birds and its evolutionary significance — molecular calibrations versus the fossil record. Syst. Biodivers. 11, 7–13 (2013)
Jetz, W. et al. Global distribution and conservation of evolutionary distinctness in birds. Curr. Biol. 24, 919–930 (2014)
Hedges, S. B., Parker, P. H., Sibley, C. G. & Kumar, S. Continental breakup and the ordinal diversification of birds and mammals. Nature 381, 226–229 (1996)
Benton, M. J. Early origins of modern birds and mammals: molecules vs. morphology. Bioessays 21, 1043–1051 (1999)
Hope, S. in Mesozoic Birds: Above the Heads of Dinosaurs (eds Chiappe L. M. & Witmer L. M. ) 339–388 (Univ. of California Press, 2002)
Longrich, N. R., Tokaryk, T. & Field, D. J. Mass extinction of birds at the Cretaceous–Paleogene (K–Pg) boundary. Proc. Natl Acad. Sci. USA 108, 15253–15257 (2011)
Baker, A. J., Pereira, S. L. & Paton, T. A. Phylogenetic relationships and divergence times of Charadriiformes genera: multigene evidence for the Cretaceous origin of at least 14 clades of shorebirds. Biol. Lett. 3, 205–209 (2007)
dos Reis, M. et al. Phylogenomic datasets provide both precision and accuracy in estimating the timescale of placental mammal phylogeny. Proc. R. Soc. B 279, 3491–3500 (2012)
Dornburg, A., Townsend, J. P., Friedman, M. & Near, T. J. Phylogenetic informativeness reconciles ray-finned fish molecular divergence times. BMC Evol. Biol. 14, 169 (2014)
Brandley, M. C. et al. Accommodating heterogenous rates of evolution in molecular divergence dating methods: an example using intercontinental dispersal of Plestiodon (Eumeces) lizards. Syst. Biol. 60, 3–15 (2011)
Phillips, M. J. Branch-length estimation bias misleads molecular dating for a vertebrate mitochondrial phylogeny. Gene 441, 132–140 (2009)
Acknowledgements
The research was supported by W. R. Coe Funds from Yale University to R.O.P., and by NSF grants to A.R.L. and E.M.L. We thank the ornithology curators and staff of the following collections for granting research access to the invaluable avian tissue collections that made this work possible: American Museum of Natural History, Field Museum of Natural History, Royal Ontario Museum, University of Kansas Museum of Natural History and Biodiversity Research Center, University of Washington Burke Museum of Natural History, and Yale Peabody Museum of Natural History. We thank M. Kortyna and H. Ralicki for contributions to laboratory work, S. Gullapalli for computational assistance, and N. J. Carriero and R. D. Bjornson at the Yale University Biomedical High Performance Computing Center, which is supported by the NIH. Bird illustrations reproduced with permission from the Handbook of the Birds of the World Alive Online, Lynx Edicions, Barcelona30. The research was aided by discussions with R. Bowie, S. Edwards, I. Lovette, J. Musser, T. Near, and K. Zyskowski.
Author information
Authors and Affiliations
Contributions
R.O.P., J.S.B., A.R.L., and E.M.L. conceived of and designed the study. R.O.P. selected the taxa studied. A.R.L. selected the loci and designed the probes. J.S.B., A.R.L., and E.M.L. collected the data. J.S.B. and A.R.L. performed the phylogenetic analyses. A.D. and J.P.T. performed the phylogenetic informativeness, and signal and noise analyses. D.J.F. selected fossil taxa for calibration, and J.S.B., D.J.F., and A.D. designed and performed the dating analyses. R.O.P. wrote the paper with contributions from all other authors.
Corresponding authors
Ethics declarations
Competing interests
The authors declare no competing financial interests.
Additional information
Electronic data files and software are permanently archived at http://dx.doi.org/10.5281/zenodo.28343.
Supplementary information
Supplementary Information
This file contains Supplementary Text and Data, Supplementary References and Supplementary Figures 1-12. (PDF 3073 kb)
Supplementary Data
This file contains Supplementary Table 1. This file was uploaded on 12 October 2015 as it was omitted at the time of online publication. (PDF 377 kb)
PowerPoint slides
Rights and permissions
About this article
Cite this article
Prum, R., Berv, J., Dornburg, A. et al. A comprehensive phylogeny of birds (Aves) using targeted next-generation DNA sequencing. Nature 526, 569–573 (2015). https://doi.org/10.1038/nature15697
Received:
Accepted:
Published:
Issue Date:
DOI: https://doi.org/10.1038/nature15697
This article is cited by
-
Gene flow and an anomaly zone complicate phylogenomic inference in a rapidly radiated avian family (Prunellidae)
BMC Biology (2024)
-
A juvenile bird with possible crown-group affinities from a dinosaur-rich Cretaceous ecosystem in North America
BMC Ecology and Evolution (2024)
-
Deep Conservation and Unexpected Evolutionary History of Neighboring lncRNAs MALAT1 and NEAT1
Journal of Molecular Evolution (2024)
-
The Picocoraciades (hoopoes, rollers, woodpeckers, and allies) from the early Eocene London Clay of Walton-on-the-Naze
PalZ (2024)
-
Origin of the propatagium in non-avian dinosaurs
Zoological Letters (2023)
Comments
By submitting a comment you agree to abide by our Terms and Community Guidelines. If you find something abusive or that does not comply with our terms or guidelines please flag it as inappropriate.