Main

Independents constitute 80–95% of male ruffs and strive to defend territories on leks1,2,3,7. Independent males show a spectacular diversity in the color of their ruff and head tufts (Fig. 1a). Satellites are slightly smaller than independents, usually show white ruff and white tufts (Fig. 1a) and constitute 5–20% of males7,8. Satellites are non-territorial and display submissive behavior, allowing independent males to dominate them at leks (Fig. 1a, middle). Independents clearly recognize satellites as a different kind of male and behave differently with satellites than they do with other independents (see URLs for a link to a video showing the reproductive strategies of the three male morphs). Both independents and satellites may benefit from their interaction by attracting females7. The faeder is a rare third morph (<1% of male ruffs) mimicking females by its smaller size and female-like plumage1,9,10 (Fig. 1a). These disguised males appear on the leks where they attempt to gain access to females that are ready to mate.

Figure 1: A 4.5-Mb inversion associated with the satellite and faeder morphs in male ruff.
figure 1

(a) Phenotypic diversity among male ruffs. The middle photograph shows an independent male dominating a satellite. The photographs were provided by Ola Jennersten (left and middle) and F.W. (right). (b) Genome-wide screen for genetic differentiation between independents and satellites using normalized FST values (ZFST) calculated in 15-kb windows. The pattern of differentiation in the associated 4.5-Mb region on scaffold 28 is highlighted below. (c) Neighbor-joining trees for independent and satellite males based on the 4.5-Mb associated region (left) and the rest of the genome (right). (d) The 4.5-Mb inversion disrupts CENPN. (e) Diagnostic test for the inversion: design and genotype results. Primer binding sites are indicated by red arrows, and predicted PCR products are shown below as gray boxes. (f) Conserved synteny between chicken chromosome 11 and ruff scaffold 28 based on an independent male; colored blocks represent individual genes.

A high-quality genome assembly was established using genomic DNA from an independent male kept at the Helsinki zoo in Finland. We estimated the genome size to be 1.23 Gb and generated 139.3 Gb of Illumina HiSeq 2000 sequencing data using fragment libraries with insert sizes ranging from 250 bp to 20 kb (Supplementary Figs. 1, 2, 3 and Supplementary Table 1). The N50 scaffold size was as high as 10.0 Mb (Supplementary Table 2).

We generated 8× genome coverage on the basis of 2 × 125-bp paired-end reads from 15 independent and nine satellite males, all from a single location. A screen based on the fixation index (FST) comparing independents and satellites identified a single highly differentiated 4.5-Mb region on scaffold 28 (Fig. 1b). Independents and satellites clustered as two genetically distinct groups in a phylogenetic tree based on this region (Fig. 1c, left). In contrast, there was no significant differentiation between these groups when the tree was constructed on the basis of the rest of the genome (Fig. 1c, right). We hypothesized that the large region of strong differentiation might reflect the presence of an inversion. We used BreakDancer11 to screen for structural changes and identified a 4.5-Mb inversion present in satellites and overlapping perfectly the differentiated region (Fig. 1d). PCR-based sequencing confirmed a proximal breakpoint at 5.8 Mb and a distal breakpoint at 10.3 Mb and identified a 2,108-bp insertion of a repetitive sequence at the distal breakpoint. A diagnostic test (Fig. 1e) showed that all satellites were heterozygous for the inversion and all 112 independents except five were homozygous for the wild-type sequence (Table 1); the latter five were heterozygous for the inversion and most likely reflect phenotype misclassifications in the field. The inversion was also found among adult females and young birds (Table 1). The Independent allele clearly represents the ancestral state, as the inversion disrupts conserved synteny among birds (Fig. 1f).

Table 1 Results of a diagnostic test for the presence of a 4.5-Mb inversion associated with alternative mating strategies in ruff

The single faeder in our material was also heterozygous for the 4.5-Mb inversion (Table 1). We sequenced this individual to 30× coverage, and FST analysis indicated striking genetic differentiation between the faeder and both independents and satellites within the inverted region (Fig. 2a). The differentiation between the faeder and independents was equally strong across the 4.5-Mb region, whereas the pattern of differentiation between the faeder and satellites was a mirror image of the pattern between satellites and independents (Fig. 2a). We phased haplotypes using Beagle12 and constructed haplotype trees separately for region A showing high FST between satellites and independents and for region B showing low FST between satellites and independents (Fig. 2a,c). In region A, the Satellite and Faeder chromosomes were closely related and divergent from Independent chromosomes. In contrast, in region B, the Satellite and Independent chromosomes were more closely related, whereas the Faeder chromosome was divergent (Fig. 2c, bottom). Because we only had access to a single faeder, we genotyped the entire material (>200 birds) using two SNPs diagnostic for the Faeder chromosome. The Faeder chromosome should not be present in independent or satellite males but should occur at a low frequency in adult females and young birds. The results confirmed this prediction, as we identified the Faeder chromosome in only a single adult female (Supplementary Table 3). This female was clearly an outlier with regard to body size (Supplementary Fig. 4), consistent with the observation that females heterozygous for the Faeder allele are smaller than other females6.

Figure 2: The Faeder allele and structural changes associated with the 4.5-Mb inversion.
figure 2

(a) Genetic differentiation among independent, satellite and faeder males based on FST analysis. (b) Box plots showing estimated divergence times between the Independent and Satellite chromosomes for regions A and B, calculated using the substitution rates estimated for 48 different bird genomes18. The central rectangle spans the first to third quartiles of the distribution, and the 'whiskers' above and below the box show the maximum and minimum estimates. The line inside the rectangle shows the median. MYA, million years ago. (c) Phylogenetic trees for haplotypes carried by independent, satellite and faeder males for region A (red) and region B (blue) showing different patterns of differentiation. Hap, haplotype. (d) Three deletions (blue) present in satellites and faeders within the 4.5-Mb inversion are all located in the vicinity of HSD17B2 and SDR42E1. Two duplications (magenta) are only present in faeder. (e) The 5.2-kb deletion shows 50% reduced sequence coverage in satellites (top); the deletion encompasses evolutionarily conserved sequences among birds18 (bottom). (f) Proposed evolutionary history of the three alleles underlying the mating system in ruff.

We examined the genetic consequences of the inversion and searched for candidate mutations that might contribute to phenotypic differences. This is challenging because the inverted region contains about 90 genes (Supplementary Fig. 5 and Supplementary Table 4). First, we note that the inversion disrupts the CENPN gene (encoding centromere protein N) (Fig. 1d). The inversion may be recessive lethal, as data from human cells13 and zebrafish14 with mutations in the orthologous gene show that CENPN inactivation has severe deleterious effects. In fact, ruff pedigree data have confirmed that the inversion is recessive lethal15. Birds heterozygous for the Satellite allele must have about 5% higher fitness to maintain an allele frequency of about 5% in compensating for the lethality of the homozygote. Second, we identified a large number of missense mutations present on Satellite and/or Faeder chromosomes (Supplementary Table 5). Third, BreakDancer11 and depth-of-coverage analysis identified three deletions ranging in size from 3.3 to 17.6 kb (Fig. 2d) present in the heterozygous condition in all satellites and in the faeder but not in independent males (Supplementary Table 3). Two of these (5.2 kb and 17.6 kb in length) delete evolutionarily conserved sequences (Fig. 2e and Supplementary Fig. 6), and all three cluster in the vicinity of HSD17B2 (hydroxysteroid (17-β) dehydrogenase 2) and SDR42E1 (short-chain dehydrogenase/reductase family 42E, member 1) (Fig. 2d). HSD17B2 and SDR42E1 both have important roles in the metabolism of sex hormones: HSD17B2 catalyzes conversion of the 17β-hydroxy forms of estrogen and androgens (including testosterone and dihydrotestosterone) into their less active 17-keto forms16. We postulate that one or more of these deletions constitute cis-acting regulatory mutations that alter the expression pattern of HSD17B2 and/or SDR42E1 and contribute to phenotypic differences among male morphs. We identified two deletions and two duplications that were unique to the faeder and may thus contribute to the faeder phenotype (Supplementary Fig. 7).

There is a striking diversity in plumage color among male ruffs (Fig. 1a). One of the most obvious candidate genes for variation in pigmentation, MC1R (encoding melanocortin 1 receptor), is located within the inverted region at position 10.2 Mb (Supplementary Fig. 5). Whole-genome sequencing showed that satellites are heterozygous for four derived MC1R missense mutations (encoding p.Val105Leu, p.Arg149His, p.His207Arg and p.Arg303Trp) at residues that are conserved among birds and mammals (Fig. 3). We performed Sanger sequencing of the single MC1R exon in all satellites, the faeder and a subset of the independents. This analysis confirmed complete association between the Satellite allele and these four missense mutations, whereas the faeder was heterozygous for the variant encoding p.His207Arg and three other missense mutations (Supplementary Table 6). The p.His207Arg substitution most likely has functional consequences because the same variant is associated with light color in the red-footed booby17. We propose that the MC1R allele on Satellite chromosomes, possibly together with altered metabolism of sex hormones, underlies the white color of ornamental feathers in satellites. To be causal, this allele must have a dominant effect, as satellites are always heterozygous. This implies a dominant-negative effect or, more likely, a combination of regulatory and coding changes leading to overexpression of a variant form of MC1R specifically in ornamental feathers. The latter mechanism would explain why satellite males, despite their spectacular light color during the breeding season (Fig. 1a), are almost indistinguishable from independents outside the breeding season and why females carrying the Satellite allele are not markedly lighter in color than other females.

Figure 3: Four amino acid substitutions encoded by MC1R missense mutations associated with the Satellite allele.
figure 3

Missense variants detected in satellites are represented by red circles, and a selected set of variants associated with plumage variation in other birds29 is represented by yellow circles. Lower left, the amino acids associated with the Independent allele are conserved among birds and mammals.

We estimated the time since divergence of the Satellite and Independent alleles on the basis of region A, showing high FST between the two types of males, to 3.87 ± 0.15 million years ago (Fig. 2a,b), using the nucleotide divergence (1.4%) and estimated mutation rates for birds18. A similar estimate of 4.09 ± 0.16 million years ago was obtained for divergence of the Faeder and Independent alleles on the basis of sequence divergence for the entire 4.5-Mb region. There can be no recombination between the Satellite and Faeder chromosomes, as the Faeder/Satellite genotype is not viable. Furthermore, an inversion is expected to cause suppression of recombination within the inverted region between the wild-type and mutant alleles. Comparison of the faeder and independents across the inversion is consistent with this lack of recombination, as FST values are equally strong across the 4.5-Mb region (Fig. 2a). Remarkably, the strong differentiation between satellites and independents is disrupted in two regions (1.5 and 0.7 Mb in size) that show FST values that are much lower than those for other segments of the inversion but still markedly higher than the background level (Fig. 2a). We postulate that the Satellite chromosome arose by one or two rare recombination events between an Independent and a Faeder-like chromosome (Fig. 2f). This would explain why the pattern of genetic differentiation between the Faeder and Satellite chromosomes is a mirror image of the differences between the Independent and Satellite chromosomes (Fig. 2a). A similar rare recombination event between an inversion and a wild-type chromosome created a third allele at the Rose-comb locus in chicken19. Our model predicts that the pairwise genetic distances between the Satellite and Independent chromosomes for region B should all be equal, as they reflect divergence since the recombination event happened. Our data are consistent with this hypothesis (Supplementary Table 7). Similarly, the pairwise genetic distances between the Satellite and Faeder chromosomes in region A are also equal (Supplementary Table 7). We estimated that this recombination event occurred 520,000 ± 20,000 years ago, on the basis of sequence divergence (0.2%) between the Satellite and Independent chromosomes in region B (Fig. 2b). We constructed 5-kb and 10-kb mate-pair libraries from one satellite male, but analysis of these data did not identify any additional inversions in the vicinity of the 4.5-Mb inversion.

The genetic basis for the satellite and faeder morphs constitutes a combination of genetic changes that have accumulated within the inverted region over a period of about 3.8 million years. This resembles the situation in white-throated sparrows in which at least two pericentric inversions involving about 100 Mb are associated with altered plumage (brown-and-tan stripes versus white-and-black stripes on the crown) and altered territorial and parental behavior20,21. A recent study suggested that sequence differences in the promoter of ESR1 (encoding estrogen receptor α) are causally related to behavioral differences between morphs22. When an inversion is associated with a complex phenotype, it is challenging to pinpoint causal mutations because many sequence polymorphisms within the inverted region show an equally strong association to the phenotype. An additional complication in ruff is that recessive variants located within the inverted region will never be exposed to purifying selection because homozygosity for the inversion is lethal. Only sequence variants on the Satellite and Faeder chromosomes that show some degree of dominance can contribute to phenotypic differences among the morphs. We propose that the inversion itself caused the first phenotypic effects that constituted the starting point for an evolutionary process eventually resulting in the current mating system in ruff. An inversion may cause phenotypic effects as a result of changes affecting the coding sequence or the regulation of genes, primarily in the vicinity of the breakpoints, as illustrated by the Rose-comb inversion in chicken where translocation of MNR2 leads to ectopic expression of the MNR2 transcription factor and altered comb development19. Similarly, translocation of CYB5B, located at the breakpoint at 10.3 Mb in ruff (Fig. 1d), may cause altered expression of this gene that has a role in the biosynthesis of glucocorticoids and sex steroids23. Furthermore, it appears highly plausible that one or more of the three deletions in the near vicinity of HSD17B2 and SDR42E1 lead to altered metabolism of testosterone and other steroids, which may affect both behavior and plumage. In fact, independents have higher circulating levels of testosterone, whereas satellites and faeders have higher concentrations of androstenedione15. A possible explanation for this difference is that upregulation and/or ectopic expression of HSD17B2 lead to conversion of testosterone into androstenedione in individuals heterozygous for the Satellite or Faeder allele, a model consistent with the dominant inheritance of these alleles. This is a testable hypothesis because it predicts allelic imbalance in HSD17B2 expression in satellites and faeders.

Our study has demonstrated how an inversion followed by subsequent accumulation of several adaptive changes within the inverted region led to the evolution of a spectacular mating system comprising three alleles at a single locus maintained by balancing selection. The presence of an inversion that allowed the evolution of a non-recombining 'supergene' (ref. 24) was critical for this process. Other examples where inversions are associated with complex phenotypes include mimicry in butterflies25,26,27 and colony organization in fire ants28.

Methods

Sample collection and DNA extraction.

Blood was collected form the wing vein of a captive male ruff (further referred to as the reference individual) kept at the Helsinki zoo, Finland, and mixed with EDTA as anticoagulant. DNA was immediately isolated using a standard salt precipitation method. The quantity and quality of the sample were evaluated using the Qubit dsDNA BR assay (Life Technologies) as well as by pulsed-field gel electrophoresis (CHEF Mapper XA, Bio-Rad).

The samples for whole-genome resequencing and further genetic analysis were collected from a ruff population that was studied during the breeding seasons of 1990–2002 on the island of Gotland in the Baltic Sea (57° 10′ N, 18° 20′ E)30,31,32. In each year, males and females were caught on leks using cannon nets. In addition, females and newly hatched young were caught on nests. Individuals were ringed, and morphological measurements including tarsus length (in mm) and wing length (in mm; maximum chord) were collected (all by F.W.). A blood sample was drawn from the wing or brachial vein of each bird and later used for DNA extraction. The birds were released after handling. Birds were caught and handled according to ethical permits and permissions guiding Swedish research and animal welfare (Stockholms Södra Djurförsöksetiska Nämnd S54-99).

Male strategies were determined from plumage pattern and from behavioral observations at leks for ringed independents and satellites7. Most males (>90%) in nuptial plumage were scored as independents or satellites from their plumage alone. Whether female-colored birds were faeder males could only be preliminarily determined from morphological measurements and later verified through sexing using a DNA test33.

Genome assembly.

We generated 139.32 Gb of high-quality next-generation sequencing data with fragment lengths ranging from 250 bp to 20 kb for the reference individual using Illumina HiSeq 2000 sequencing (Supplementary Table 1). We estimated the ruff genome size using k-mer analysis to be about 1.23 Gb (Supplementary Fig. 1), suggesting genome sequence coverage of 113.7-fold. Using SOAPdenovo (version v2.04)34, we obtained an assembly spanning 1.25 Gb, with contig N50 and scaffold N50 sizes of 106.46 kb and 10.00 Mb, respectively (Supplementary Table 2); 96.3% of the assembly was non-gap sequences. The distribution of sequencing depth, calculated on the basis of reads from all sequencing libraries, and the distribution of GC content are presented in Supplementary Figures 2 and 3, respectively.

Annotation of the region associated with an inversion.

A preliminary annotation for the inverted region was generated with the Maker package (version 2.31-8)35. We first composed a set of high-confidence reference sequences from UniProt by selecting bird proteins that were classified as full length and supported by either proteomics or transcriptomics (74,138 sequences). As additional input, we collected available 454 sequencing data from the SRA available under accession SRA049313. Reads were assembled with the Trinity package (release 2014-07-07)36 into 16,746 transcripts. Finally, to improve the accuracy of the annotation process, we modeled new repeat sequences from a preliminary genome assembly and used this library in combination with a curated repeat library for vertebrates included in the RepeatMasker package. From these data, we generated two complementary gene builds: one based directly on the aligned sequences to most accurately reflect the evidence data ('evidence build') and a second set of gene models seeded from ab initio (de novo) gene predictions generated by the chicken reference profile model, included with the Augustus package (version 2.7)37. Both builds were compared and reconciled for the target region using the WebApollo curation platform38. Functional annotation of candidate transcript models was performed through similarity searches39 against the UniProt/SWISS-PROT reference protein set (downloaded May 2014) in combination with the prediction of functional motifs and domains via the InterProScan package (release 5.7-48.0)40.

Whole-genome resequencing and SNP calling.

Sequencing libraries (average fragment size of about 500 bp) were constructed for 15 independents, nine satellites and one faeder, and 2 × 125-bp paired-end reads were generated using Illumina HiSeq 2000 sequencers. For the independents and satellites, we generated 8-fold coverage based on high-quality reads, after strict filtering of low-quality and adaptor-contaminated reads, and 30-fold coverage was generated for the single faeder male. The genomic reads were mapped against the ruff genome assembly using Burrows-Wheeler Aligner (BWA)41 (version 0.6.2) with default parameters. PCR duplicates were filtered from the alignments using Picard. Further, we performed base quality recalibration and indel realignment using the Genome Analysis Toolkit (GATK)42 and performed SNP discovery and genotyping across the 25 samples according to GATK best-practices recommendations43,44. Low-quality SNP calls were filtered out by an in-house filtering pipeline that excluded a SNP if it did not satisfy the threshold of a combination of various quality parameters (for example, SNP quality, base quality, mapping quality, haplotype score, Fisher strand bias, minimum read depth and maximum read depth). Thresholds were chosen on the basis of the distribution of each of these parameters from the raw variant calls.

Genome-wide screen for genetic differentiation among male morphs.

We divided the genome into non-overlapping 15-kb windows and estimated the genetic divergence (FST) between independents and satellites using VCFtools (v.0.1.11)45. The 4.5-Mb region in scaffold 28 that showed strong genetic differentiation between independents and satellites was further subdivided into smaller windows (5 kb in length) to refine the pattern of differentiation. A similar analysis was carried out when comparing data from the single faeder male and the other morphs.

Phylogenetic analysis.

We used PLINK46 to calculate pairwise genetic distances between individuals. These distances were used to generate neighbor-joining trees with PHYLIP47. Phased haplotypes for the 4.5-Mb inversion region were generated using Beagle12, and these haplotypes were used to estimate Jukes-Cantor corrected nucleotide distances among the Independent, Satellite and Faeder alleles with PHYLIP47. The net frequency of nucleotide substitutions (dA) was calculated according to the method of Nei48, and the time since divergence (t) of alleles was calculated as t = dA/2λ, where λ is the genomic substitution rate. As a substitution rate in ruff is not yet available, we used the substitution rates estimated for each of 48 other bird genomes18 to calculate a confidence interval for t, and the data are presented as a box plot.

Structural variants.

We used the paired-end sequence data for detection of structural variants with BreakDancer11. Information on read pairs that mapped with unexpected separation distances or orientation was used to predict inversions, insertions, duplications and deletions. Two-sided Fisher's exact test was used to identify structural variants showing significant frequency differences between morphs. Sequence alignments around the detected structural variants were also manually inspected using the Integrative Genomics Viewer (IGV)49, to exclude false positives. Normalized read coverage was compared between morphs to check the consistency of the deletions predicted by BreakDancer11. We also generated 5-kb and 10-kb mate-pair data from a single satellite individual to confirm the structural variants detected using paired-end data.

Functional annotation of genetic variants.

We used SNPeff (v.3.4)50 to annotate the genetic variants and categorized the variants into coding (synonymous and nonsynonymous), upstream/downstream and intronic/intergenic classes.

Diagnostic PCR tests, Sanger sequencing and SNP genotyping.

PCR was used to amplify the regions around the identified inversion and deletion breakpoints and the MC1R coding sequence; all primer sequences are given in Supplementary Table 8. Amplified fragments were either analyzed by agarose gel electrophoresis or subjected to Sanger sequencing using standard methods. DNA sequences were analyzed using CodonCode Aligner 5.1.4 software. Four TaqMan SNP genotyping assays (Life Technologies) diagnostic for the Satellite or Faeder haplotype were designed (Supplementary Table 8). Standard TaqMan allele discrimination assays were performed using an Applied Biosystems 7900 HT real-time PCR instrument.

URLs.

NCBI Sequence Read Archive (SRA), http://www.ncbi.nlm.nih.gov/sra/; RepeatMasker, http://repeatmasker.org/; Picard, http://broadinstitute.github.io/picard/; video of ruff reproductive strategies, http://www.scilifelab.se/research/scientific-highlights/ruff.

Accession codes.

The Illumina reads have been submitted to the Sequence Read Archive (SRA) under accession SRA266458. The assembly and annotation are available under accession PRJNA281024. DNA sequences reported in this manuscript have been submitted to GenBank under accessions KT202232KT202235 and KT428875.