Main

Adaptive radiations are particularly informative for understanding the ecological and genetic basis of biodiversity1,2. Those causes are best identified in young radiations, as they represent the early stages of diversification when phenotypic transitions between species are small and interpretable and extinctions are likely to be minimal3. Darwin’s finches are a classic example of such a young adaptive radiation3,4. They have diversified in beak sizes and shapes, feeding habits and diets in adapting to different food resources4,5 (Extended Data Table 1). The radiation is entirely intact, unlike most other radiations, none of the species having become extinct as a result of human activities4.

Fourteen of the currently recognized species evolved from a common ancestor in the Galápagos archipelago (Fig. 1a) in the past 1.5 million years according to mitochondrial DNA (mtDNA) dating6; a fifteenth species inhabits Cocos Island. The radiation proceeded rapidly as a result of strong isolation from the South American continent, generation of new islands by volcanic activity, climatic oscillations caused by the El Niño phenomenon, and sea level changes associated with glacial and interglacial cycles over the past million years that led to repeated alternations of island formation and coalescence7,8.

Figure 1: Sample locations and phylogeny of Darwin’s finches.
figure 1

a, Geographical origin of samples; the letter after the species name is the abbreviation used for geographical origin. The map is modified from ref. 30. b, Maximum-likelihood trees based on all autosomal sites; all nodes having full local support on the basis of the Shimodaira–Hasegawa test are marked by asterisks. The colour code for groups of species applies to both panels. Taxa that showed deviations from classical taxonomy are underscored.

PowerPoint slide

Traditional taxonomy of Darwin’s finches is based on morphology3, and has been largely supported by observations of breeding birds4,5 and genetic analysis6,9. However, the branching order of several recently diverged taxa is unresolved6 and genetic analysis of phylogeny has been limited to mtDNA and a few microsatellite loci. Some candidate genes for beak development are differentially expressed in species with different beak morphologies10,11,12, but the loci controlling genetic variation in beak diversity among Darwin’s finches remain to be discovered.

Here we report results from whole genome re-sequencing of 120 individuals representing all Darwin’s finch species and two closely related tanagers, Tiaris bicolor and Loxigilla noctis13. For some species we collected samples from multiple islands (Fig. 1a). We comprehensively analyse patterns of intra- and interspecific genome diversity and phylogenetic relationships among species. We find widespread evidence of interspecific gene flow that may have enhanced evolutionary diversification throughout phylogeny, and report the discovery of a locus with a major effect on beak shape.

Considerable nucleotide diversity

We generated approximately 10× sequence coverage per individual bird using 2 × 100 base-pair (bp) paired-end reads (Extended Data Fig. 1). Reads were aligned to the genome assembly of a female medium ground finch (G. fortis)14. We identified Z- and W-linked scaffolds on the basis of significant differences in read depth between males (ZZ) and females (ZW) (Supplementary Table 1) and generated a G. fortis mtDNA sequence through a combined bioinformatics and experimental approach. Stringent variant calling revealed approximately 45 million variable sites within or between populations. We found a considerable amount of genetic diversity within each population, in the range 0.3 × 10−3 to 2.2 × 10−3 (Extended Data Table 2), similar to that reported in other bird populations15 including island populations of the zebra finch16. We used these estimates of diversity to estimate effective population sizes of Darwin’s finch species within a range of 6,000–60,000 (Supplementary Text). Extensive sharing of genetic variation among populations was evident, particularly among ground and tree finches, with almost no fixed differences between species in each group (Extended Data Fig. 2).

Genome-based phylogeny

According to the classical taxonomy of Darwin’s finches, supported by morphological and mitochondrial (cytochrome b) data, warbler finches were the first to branch off, and ground and tree finches constitute the most recent major split3,6,9. Our maximum-likelihood phylogenetic tree based on autosomal genome sequences is generally consistent with current taxonomy, but shows several interesting deviations (Fig. 1b). First, Geospiza difficilis occurring on six different islands forms a polyphyletic group separated into three distinct groups: (1) populations occupying the highlands of Pinta, Santiago and Fernandina, (2) populations occupying the low islands of Wolf and Darwin in the northwest3,6,9 and (3) the population on Genovesa in the northeast. This is consistent with an earlier version of the taxonomy, in which these three groups were classified as distinct species on the basis of morphological differences17,18.

Second, Geospiza conirostris on Española showed the highest genetic similarity to another species, Geospiza magnirostris, whereas G. conirostris on Genovesa clustered with Geospiza scandens (Fig. 1b). Here, phenotypic similarity parallels genetic similarity; G. conirostris on Genovesa have a pointed beak similar to G. scandens, whereas those on Española have a blunt beak more similar to the beaks of G. magnirostris (Extended Data Fig. 3).

A network constructed from autosomal genome sequences indicates conflicting signals in the internal branches of ground and tree finches that may reflect incomplete lineage sorting and/or gene flow (Extended Data Fig. 3). The exact branching order of the most recently evolved ground and tree finches should be interpreted with caution as it may change with additional sampling. Since our data revealed some important discrepancies with the phenotype-based taxonomy, we propose a revised taxonomy for the sharp-beaked ground finch (G. difficilis) and the large cactus finch (G. conirostris) (Supplementary Text and Extended Data Fig. 4), but will use the current names in the text.

We dated phylogenetic splits on the basis of genome divergence (Fig. 2a), and compared these estimates with those obtained using mtDNA (Extended Data Fig. 5a and Supplementary Text). We infer that the most basal split, between warbler finches (Certhidea sp.) and other finches, occurred about 900,000 years ago. The rapid radiations of ground and tree finches began around 100,000–300,000 years ago. Although these estimates are based on whole-genome data, they should be considered minimum times, as they do not take into account gene flow.

Figure 2: Population history.
figure 2

a, Dating the nodes (in thousands of years) with confidence intervals (when applicable) in the phylogeny on the basis of divergence corrected for coalescence in ancestral populations; the topology is the representation of the inferred species tree from Fig. 1b. b, ABBA–BABA analysis of G. magnirostris, G. difficilis from Wolf and Pinta, and L. noctis. Number of sites supporting different trees is indicated both as a percentage and as actual numbers. The D statistic and corresponding Holm–Bonferroni-corrected P value are given for testing the null hypothesis of symmetry in genetic relationships. Finch heads are reproduced from ref. 5. How and Why Species Multiply: The Radiation of Darwin's Finches by Peter R. Grant & B. Rosemary Grant. Copyright © 2008 Princeton University Press. Reprinted by permission.

PowerPoint slide

Extensive interspecies gene flow

The discrepancies between phylogenies based on morphology and genome sequences may be due to convergent evolution and/or interspecies gene flow. We found evidence of introgression from three sources: ABBA–BABA tests, discrepancies between phylogenetic trees based on autosomal and sex-linked loci, and mtDNA (Supplementary Text and Extended Data Fig. 5a).

First, the D statistic19 associated with the ABBA–BABA test was used to compare two populations of G. difficilis from Pinta and Wolf, and G. magnirostris from Genovesa, using L. noctis as outgroup; G. magnirostris also occurs on Wolf but we lacked samples from that population. The analysis confirmed that G. difficilis on Wolf has a closer genetic relationship with G. magnirostris than with G. difficilis on Pinta (Fig. 2b). But there is evidence of gene flow between G. difficilis on Wolf and Pinta (P = 5 × 10−113), because the substantial asymmetry in genetic relationships cannot be explained by incomplete lineage sorting. However, the D statistic does not distinguish admixture from ancestral subdivision19. We conclude that the closely related populations of G. difficilis on Wolf and Darwin are a species of mixed ancestry where most of the genome originates from G. magnirostris or a close relative (Supplementary Table 2), whereas a considerable proportion of the genome, possibly including genetic variants affecting phenotypic characters, is derived from G. difficilis. Similarly, G. difficilis on Genovesa shows a closer genetic relationship to the other ground and tree finches than to G. difficilis on Pinta, but we also found evidence for gene flow between the two groups previously classified as G. difficilis (P = 3 × 10−87; Supplementary Table 2).

We next investigated gene flow involving the populations of G. conirostris on Genovesa and Española, which appear as separate species in our phylogenetic analysis. The ABBA–BABA analysis confirmed that G. conirostris on Española shows a closer genetic relationship to G. magnirostris than to G. conirostris on Genovesa (Extended Data Fig. 6a), but also provided evidence for gene flow between G. conirostris on Española and G. conirostris on Genovesa, which may explain some of their phenotypic similarities and their previous classification as a single species.

Given the evidence of relatively recent hybridization, we explored the possibility of more ancient hybridization between warbler finches (Certhidea fusca and Certhidea olivacea) and other finches. ABBA–BABA analysis provided evidence for gene flow between C. fusca and the other finches (P = 7 × 10−199; Extended Data Fig. 6b). This pattern of gene flow was apparent for all non-warbler finches, implying that it occurred before the radiation of the non-warbler finches (Supplementary Table 2).

The trees based on autosomal (Fig. 1b) and Z-linked sites (Extended Data Fig. 5b) are not completely congruent. The tree based on Z-linked polymorphisms indicated that G. difficilis present on the highlands of Pinta, Fernandina and Santiago is more closely related to Platyspiza crassirostris and emerged before the Cocos finch split off from the ground and tree finches, whereas the autosomal tree indicates a reversed order for the emergence of the two species. This discrepancy can potentially be explained by gene flow between G. difficilis and tree and ground finches after the Cocos finch became reproductively isolated from the finches on the Galápagos, which affected Z-linked and autosomal loci to different degrees. It is a common observation in closely related species that there is more interspecies sharing of sequence polymorphisms at autosomal loci than at sex-linked loci20. This interpretation of the phylogenetic status of G. difficilis (highland group) is supported by the trees based on both mtDNA and W (Extended Data Fig. 5), which suggest that G. difficilis diverged from the ancestor of other ground and tree finches before the emergence of the Cocos finch.

Finally, our analysis of demographic history using the pairwise sequentially Markovian coalescent (PSMC) model21 was consistent with extensive interspecies gene flow among the ground finches, as they have maintained larger effective population sizes than the other species (Supplementary Text and Extended Data Fig. 6c, d).

A major locus controlling beak shape

The most striking morphological difference among Darwin’s finches concerns beak shape (Extended Data Fig. 3). We performed a genome-wide scan on the basis of populations that are closely related but show different beak morphology: G. magnirostris and G. conirostris on Española have blunt beaks, whereas G. conirostris on Genovesa and G. difficilis on Wolf have pointed beaks. We used non-overlapping 15-kilobase (kb) windows to identify regions with the highest fixation indices (FST) between groups. The FST distribution was Z-transformed (ZFST) and regions with striking ZFST values were identified (Fig. 3a). Among the 15 most significant regions, six harboured genes previously associated with craniofacial and/or beak development in mammals or birds including calmodulin (CALM)11, goosecoid homeobox (GSC)22, retinol dehydrogenase 14 (RDH14)23, ALX homeobox 1 (ALX1)24,25, fibroblast growth factor 10 (FGF10)26 and forkhead box C1 (FOXC1)27. A previous study demonstrated differential expression of CALM between finches with different beak types11. Two other studies reported differential expression of bone morphogenetic protein 4 (BMP4)10,12, but we did not observe any elevated ZFST values in the vicinity of this locus, suggesting that differential expression is controlled by other loci.

Figure 3: A major locus controlling beak shape.
figure 3

a, Genome-wide FST screen comparing G. magnirostris and G. conirostris (Española) having blunt beaks with G. conirostris (Genovesa) and G. difficilis (Wolf) having pointed beaks. The y axis represents ZFST values. b, Nucleotide diversities in the ALX1 region. The 240-kb region showing high homozygosity in blunt-beaked species is highlighted. Red and blue colours in bd refer to blunt and pointed beak haplotypes, respectively. c, Neighbour-joining haplotype tree of ALX1 region. Haplotypes originating from heterozygous birds (see text) are indicated in yellow. Estimated time since divergence (± confidence interval) of blunt and pointed beak haplotypes are given in thousands of years. d, Upper panel: genotypes at 335 SNPs showing complete fixation between ALX1 haplotypes associated with blunt (B) and pointed (P) beaks. d, Middle panel: classification of alleles associated with blunt beaks at the 335 SNPs as derived or ancestral on the basis of allelic state in the outgroup. d, Lower panel: PhastCons35 scores (on the basis of human, mouse and finch alignments) for the 335 SNP sites. TFBS, transcription factor binding sites. e, Linear regression analysis of beak-shape scores among G. fortis individuals on Daphne Major Island classified according to ALX1 genotype; distribution of pointedness in each class is shown as a boxplot; n = 62; F = 17.7, adjusted R2 = 0.22. Differences in six individual body and beak size traits were not significant (all P > 0.05).

PowerPoint slide

The most striking finding was a 240-kb region with high ZFST values, including the window with the highest ZFST score (9.46) overall (Fig. 3a, b). The region overlaps part of LRRIQ1 (leucine-rich repeats and IQ motif containing 1), the entire ALX1 gene and about 130 kb downstream of ALX1. No previous report indicates that LRRIQ1 has a role during development in vertebrates. By contrast, ALX1 is an excellent candidate for variation in beak morphology. It encodes a paired-type homeodomain protein that plays a crucial role in development of structures derived from craniofacial mesenchyme, the first branchial arch and the limb bud24, and on migration of cranial neural crest cells, highly relevant to beak development25. Loss of ALX1 in humans causes disruption of early craniofacial development24.

All individuals in the blunt beak category were homozygous for a blunt beak-associated haplotype (denoted B), except one heterozygous G. conirostris individual from Española. Furthermore, except for one heterozygous bird from Genovesa, all 19 G. difficilis individuals not included in the FST scan were homozygous for a pointed beak haplotype (P), consistent with their phenotypic appearance (sharp-beaked ground finches). This is notable because genome-wide, G. difficilis on Wolf, Darwin and Genovesa are all more closely related to the blunt-beaked G. magnirostris than to the pointed-beaked G. difficilis from Pinta (Fig. 2b).

A phylogenetic tree based on this region revealed a deep divergence between the B and P haplotypes that must have occurred soon after the split between warbler finches and other Darwin’s finches (Fig. 3c). Apart from the blunt-beaked G. magnirostris and G. conirostris on Española, all individuals except three were homozygous for P haplotypes, the remaining three being heterozygous. The two G. fortis from Daphne Major Island were both homozygous, but for different haplotypes (BB and PP; Fig. 3c). The short branch lengths among B haplotypes are consistent with a selective sweep. There were 335 fixed differences between the B and P haplotypes (Fig. 3d, upper panel), which we assigned as derived or ancestral on the basis of comparison with the outgroup sequence (L. noctis). Derived alleles on the B haplotype were aggregated in the vicinity of ALX1, including the downstream region (Fig. 3d, middle panel). Furthermore, 8 of these 335 fixed differences occurred at conserved sites, and the B haplotype carried the derived allele at seven of them (Fig. 3d, lower panel). Four derived alleles occurred at sites corresponding to transcription factor binding sites in the human genome28. Two other changes constitute missense mutations (L112P and I208V) at ALX1 amino-acid residues that are highly conserved among birds and mammals (Extended Data Fig. 7), and ‘Sorting Intolerant From Tolerant’ (SIFT)29 analysis classified both as damaging (score 0.03 for both). The ratio of non-synonymous to synonymous substitutions between the P and B alleles is high (2/1 = 2.00) compared with the ratio observed between the ancestral P allele and orthologous zebra finch (2/14 = 0.14) and human (21/122 = 0.17) sequences, suggesting that one or both of these missense mutations are non-neutral.

That ALX1 is polymorphic in G. fortis (Fig. 3c, d, upper panel) is particularly interesting, because field observations have shown there is considerable diversity in beak shape in this species5,30. We genotyped an additional 62 G. fortis birds from Daphne Major Island for a diagnostic single nucleotide polymorphism (SNP), and observed a significant association with beak shape (P = 8.8 × 10−5, Fig. 3e). PP homozygotes tended to have proportionately long, pointed beaks, BB homozygotes had proportionately deep, blunt beaks, whereas heterozygotes (BP) had intermediate beak shapes. We also compared haplotype frequencies among G. fortis individuals on Daphne Major Island with those on Santa Cruz, which have a larger and blunter beak on average31, possibly as a result of introgressive hybridization with G. magnirostris4,5. We found the B haplotype to be more frequent on Santa Cruz than on Daphne Major (0.74, n = 21 versus 0.49, n = 62; P = 0.007, Fisher’s exact test).

Natural selection on beak size and shape of G. fortis on Daphne Major Island has led to evolutionary change in the past few decades5,30. Moreover, genetic variation in beak shape has been increased through introgressive hybridization5,30 with two species of Geospiza, scandens and fuliginosa, that have relatively pointed beaks. Therefore we expect hybrids and backcrosses in the G. fortis population to have a relatively high frequency of the P haplotype. We genotyped an additional 25 G. fortis at ALX1, added them to the sample of 62 (Methods) and compared the haplotype frequencies in eight hybrids (including backcrosses) and 79 non-hybrids. ALX1-P had a frequency of 0.75 among hybrids, and 0.44 among the others, which is statistically significant in the expected direction (P = 0.03, Fisher’s exact test). Thus, ALX1-P alleles introduced by introgressive hybridization most probably contributed to evolution of more pointed beaks in 1987 following natural selection as a result of a change in food supply in the 1985–86 drought30.

Discussion

Our revised and dated phylogeny of Darwin’s finches shows that the adaptive radiation took place in the past million years, with a rapid accumulation of species recently (Supplementary Text). We have genomically characterized the entire radiation, which has revealed a striking connection between past and present evolution. Evidence of introgressive hybridization, which has been documented as a contemporary process, is found throughout the radiation. Hybridization has given rise to species of mixed ancestry, in the past (this study) and the present30. It has influenced the evolution of a key phenotypic trait: beak shape. Similar introgressive hybridization affecting an adaptive trait (mimicry) has been described in Heliconius butterflies32. The degree of continuity between historical and contemporary evolution is unexpected because introgressive hybridization plays no part in traditional accounts of adaptive radiations of animals1,2. For young radiations it complements the better-known role of natural selection.

Charles Darwin first noted the diversity in beak shapes among the finches on Galápagos. Our genomic study has now revealed some of the underlying genetic variation explaining this diversity. A polygenic basis for beak diversity is indicated by our discovery of about 15 regions with strong genetic differentiation between groups of finches with blunt or pointed beaks. We present evidence that the ALX1 locus contributes to beak diversity, within and among species. The derived ALX1-B haplotype associated with blunt beaks has a long evolutionary history (hundreds of thousands of years), because its origin predates the radiation of vegetarian, tree and ground finches (Fig. 3c). This haplotype is fixed or nearly fixed in two ground finches with blunt beaks, G. magnirostris and G. conirostris on Española, and it co-segregates with variation in beak shape in G. fortis. As previously documented in domestic animals33 and natural populations34, the haplotype might have evolved by accumulating both coding and regulatory changes affecting ALX1 function. Natural selection and introgression affecting this locus have contributed to the diversification of beak shapes among Darwin’s finches and hence to their expanded utilization of food resources on Galápagos.

Methods

Study samples

No statistical methods were used to predetermine sample size. Blood samples from a total of 200 individuals of Darwin’s finches, captured in mist nets and then released, were collected on FTA papers and stored at −70 °C until DNA preparation. These included all 15 species of Darwin’s finches currently present on the Galápagos and Cocos Island, and two closely related tanagers from Barbados used as outgroups13. Details on the name of each species, the specific island where they were sampled and the total number of individuals sampled from each species are in Extended Data Table 2 and phenotype descriptions of each species are in Extended Data Table 1.

Whole-genome sequencing

DNA was isolated from pieces of FTA papers using DNeasy tissue kit (QIAGEN). Each DNA sample was uniquely tagged with a sequence index during multiplexing library preparation protocol. The libraries (average fragment size about 400 bp) were sequenced using Illumina Hiseq2000 sequencers and 2 × 100 bp paired-end reads were generated. The amount of sequence per bird was targeted to approximately 10× coverage.

Reference genome assembly

Sequence reads were aligned to the genome assembly of a female medium ground finch (G. fortis)14. This draft genome assembly has a size of 1.07 Gb with scaffold N50 size of 5.2 Mb and contig N50 size of 30 kb. The annotation of the genome included a total of 16,286 protein-coding genes.

In addition, as the complete sequence for mtDNA was not previously available for any of the Darwin’s finches, we also generated an assembly of the mtDNA genome sequence. For this, we first mapped all reads from one G. fortis individual against the zebra finch (Taeniopygia guttata) mtDNA. All the aligned reads were locally reassembled using SOAP DENOVO36, and then the gaps between the contigs were filled using Sanger sequencing to generate a single mtDNA genome sequence of 16.8 kb in length.

Sequence alignment and variant calling

The short sequence reads (2 × 100 bp) were quality checked using FASTQC (http://www.bioinformatics.babraham.ac.uk/projects/fastqc/). Then we used BWA37 (version 0.6.2) with default parameters to map the genomic reads from each individual against the reference genome assembly. The alignments were further checked for PCR duplicates using PICARD (http://picard.sourceforge.net/). We used Genome Analysis Toolkit (GATK)38 for base quality recalibrations, insertion/deletion (INDEL) realignment, SNP and INDEL discovery and genotyping across all 120 samples simultaneously according to GATK best practice recommendations39,40.

Quality filtering of the raw variant calls was done according to an in-house filtering pipeline that excluded a variant as low quality if it did not satisfy the following cut offs for filtering: SNP quality > 100, base quality > 30, mapping quality > 50, haplotype score < 10, Fisher strand bias <60, mapping quality rank sum > −4.0, read position rank sum > −2.0, quality by depth > 2.0, minimum depth (summing all 120 samples) > 125, and maximum depth (summing all 120 samples) < 1,875. These parameters are explained in detail in the GATK user manual39. The cut-offs were chosen on the basis of the distribution of each of these parameters from the raw variant calls generated by the GATK UnifiedGenotyper module. The missing and low quality genotypes from the call set were inferred separately for each population using BEAGLE (version 3.3.2)41. Finally, we retained 44,753,624 variable sites in the data set. The variant calling in mtDNA was also performed using a similar BWA and GATK pipeline as described above. We identified 1,429 mtDNA variable sites in mtDNA. We calculated the average nucleotide diversity for autosomes, chromosomes Z and W, and in the mtDNA genome separately to estimate the amount of genetic variation in each population in different parts of the genome.

Identification of scaffolds from chromosomes Z and W

The medium ground-finch genome assembly contains 27,239 scaffolds unassigned to chromosomes. We used the MultiSV package to identify scaffolds that belong to chromosomes Z and W by comparing the read depth for each scaffold in 85 males and 35 females. This analysis identified 133 scaffolds, which belonged to chromosome Z with a total length of 67,176,652 bp (Supplementary Table 1a), and 662 scaffolds, which belonged to chromosome W with a total length of 643,111 bp (Supplementary Table 1b).

Estimation of genetic distance and phylogeny reconstruction

We used PLINK (version 1.07)42 to calculate genetic distance (on the basis of proportion of alleles identical by state) for all pairs of individuals separately for autosomes and the Z chromosome. We used the neighbour-net method of SplitsTree4 (http://www.splitstree.org/) to compute the phylogenetic network from genetic distances. We used FastTree to infer approximately maximum-likelihood phylogenies with standard parameters for nucleotide alignments of variable positions in the data set (http://meta.microbesonline.org/fasttree/). FastTree computes local support values with the Shimodaira–Hasegawa test.

ABBA–BABA analysis

Patterns of gene flow and the extent of admixture in populations were analysed and tested for asymmetry in the frequencies of discordant gene trees in a three-population phylogeny rooted with an outgroup using the D statistic43 as implemented for polymorphic sites19. The D statistics were transformed to Z scores by division with the standard error, which was calculated with a jackknife procedure. Blocks of 40,000 variable sites for autosomes and 10,000 for the Z chromosome were used in the jackknife to overcome the effect of linkage disequilibrium, which yielded 1,027 and 291 blocks, respectively. The Z scores were translated to two-sided P values that were Holm–Bonferroni-corrected44 for multiple testing by stepwise division of the lowest P value with the remaining number of tests performed for all 1,768 possible tests in the phylogeny and the two tests with pooled species (Supplementary Table 2).

Mutation rates

We used the following previously reported estimated mutation rates for nuclear and mtDNA: nuclear DNA, 2.04 × 10−9 per site per year estimated from the synonymous mutation rate on the Darwin’s finches’ lineage since the split from zebra finch45; mtDNA, a fossil-calibrated divergence rate of 2.1% per million years for bird cytochrome b sequences46.

Estimation of effective population size

Effective population sizes (Ne) were calculated from Watterson’s θ (ref. 47) across the whole genome and the above-mentioned mutation rate. Fluctuations in Ne were inferred using PSMC37 and with ‘64*1’ as the time interval parameter pattern. Plots were scaled assuming a mutation rate per generation of 1.02 × 10−8 and a generation time of 5 years (ref. 48).

Dating the nodes in the phylogeny and demographic history

Times of population splits were calculated with our estimates of genetic distances in the two subtrees of a node and corrected for the time to coalescence in ancestral populations49 and mutation rate. Confidence intervals were estimated from the standard deviation of genetic distances estimated from the pairwise species comparisons. We estimated the time of divergence between the blunt and pointed ALX1 haplotypes by estimating the average pairwise difference at this locus between species containing all blunt and all pointed haplotypes and correcting for mutation rate. G. fortis and heterozygous individuals were excluded. Cytochrome b sequences were used to date the mtDNA phylogeny in which the most recently evolved ground finches (that is, G. magnirostris, conirostris, scandens, fortis, fuliginosa and difficilis on Genovesa) were treated as one population, with diversities averaged across species, because they did not form monophyletic groups according to species.

To elucidate and display the demographic history of Darwin’s finches we used the pairwise sequentially Markovian coalescent (PSMC) model, which infers fluctuations in effective population size over evolutionary time from a single genome sequence21.

Signatures of selection for beak diversification

We scanned the whole genome in non-overlapping 15-kb windows to identify regions with increased genetic divergence (FST) between species with blunt and pointed beaks. We used VCFtools version 0.1.11 (ref. 50) to calculate FST. The genomic windows with high ZFST (>6) were analysed for gene content.

ALX1 genotyping in additional samples

A Taqman SNP genotyping assay (Life Technologies) was designed for one SNP (A/C at nucleotide position 517,149 bp in scaffold JH739921) diagnostic for the ALX1 haplotypes associated with blunt and pointed beaks. A standard TaqMan Allele discrimination assay was performed using an Applied Biosystems 7900 HT real-time PCR instrument. The association of individual genotypes with beak shape measurements was evaluated using standard linear regression in R.

Comparison of ALX1 protein sequences among vertebrates

The ALX1 protein sequence for G. fortis was downloaded from NCBI (XP_005421635). This G. fortis protein is a representative for the pointed allele and was edited to create a blunt counterpart by introducing the two amino-acid substitutions (L112P and I208V). ALX1 protein sequences from other species were collected from predicted orthologues of the chicken ALX1 gene in Ensembl51, including representative species from teleosts, reptiles, birds and mammals. The protein sequences were aligned using MUSCLE52 (version 3.8.31) with default settings, and the multiple sequence alignment was viewed and edited using Jalview29,53. The probability of functional consequences of amino-acid substitutions was predicted using SIFT29 with the multiple sequence alignment as input after exclusion of the blunt allele. Both substitutions were predicted to be damaging with probability scores of 0.03, where a score less than 0.05 is considered significant. Both predictions were reported to have a low confidence due to limited divergence in the alignment. However, we argue that because we have sampled orthologues from such a diverse set of species where ALX1 displays considerable conservation, these predictions can be viewed with greater confidence. Protein domains were predicted with Interpro scan54 using the G. fortis ALX1 protein sequence.

Functional annotation of SNPs

NCBI’s genome annotation for the G. fortis assembly (GeoFor1) was downloaded from NCBI’s FTP server (ftp://ftp.ncbi.nlm.nih.gov/genomes/Geospiza_fortis/) in GFF format. The annotation was filtered to include only genes annotated with a coding sequence (13,949 genes with 16,365 transcripts) before using it to build a local SnpEff (version 3.4) database55. The SnpEff database was subsequently used to annotate all detected sequence variants among the Darwin’s finches with putative functional effects according to categories defined in the SnpEff manual. The upstream and downstream categories are regions within 5,000 bp in the respective direction of an annotated gene. SnpEff allows SNPs to be included in multiple categories; for example, a SNP may be intronic in one gene and a synonymous change in another gene residing in the intron of the first gene.