Introduction

Malaria parasites are major human pathogens associated with 300–660 million clinical cases worldwide and 0.5–3 million deaths each year, mostly among children under the age of 5 years living in sub-Saharan Africa. Human malaria is caused by four species of parasitic protozoa of the genus Plasmodium, P. falciparum, P. vivax, P. malariae and P. ovale, whose complex life cycle comprises several morphologically and antigenically distinct stages (Supplementary Figure 1 online).

The extensive polymorphism of surface antigens is one of the major factors why immunity to malaria, in contrast to most viral and bacterial infections, develops only after repeated infections with the same species (Ferreira et al, 2004). Not surprisingly, genes encoding these antigens are under strong diversifying selection (Hughes, 1992; Hughes and Hughes, 1995; Escalante et al, 1998), whereas synonymous nucleotide sites and non-coding nuclear DNA sequences, which are thought to be selectively neutral, are mostly conserved (Hartl, 2004). Understanding the mechanisms generating variation in malaria surface antigens is essential for designing immunization strategies to circumvent the emergence of novel polymorphisms (Hartl et al, 2002).

There are two distinct forms of antigenic diversity in malaria parasites: (a) the classical genetic mechanisms of mutation and recombination create allelic polymorphism, the existence of multiple genetically stable alternative forms of antigen-coding genes; (b) the sequential expression of alternate forms of an antigen by the same clonal parasite lineage leading to antigenic variation, which characterizes some antigens (such as PfEMP-1 and rifins) exported by the parasite to the surface of infected red blood cells (Ferreira et al, 2004). Blood-stage surface antigens of P. falciparum may show a specific pattern of allelic polymorphism in which all observed alleles are clearly divided into two allelic classes, such that alleles within a class are much less divergent than are alleles from different classes. Such dimorphic lineages of alleles are present in the genes coding for P. falciparum merozoite surface proteins MSP-1 (Tanabe et al, 1987; Miller et al, 1993) and MSP-2 (Snewin et al, 1991; Ferreira and Hartl, 2006) (Figure 1). They have also been suggested to occur in MSP-3 (McColl and Anders, 1997) and MSP-6 (Pearce et al, 2004) (Figure 1), and EBA-175, a 175-kDa parasite molecule that binds to sialic acid and glycophorin A during red blood cell invasion (Ware et al, 1993). The Msp-1 locus of P. vivax also comprises highly divergent and presumably ancient alleles (Putaporntip et al, 2006). However, the extensive recombination between different lineages has generated a mosaic pattern in which allelic dimorphism is difficult to discern (Putaporntip et al, 2002).

Figure 1
figure 1

Schematic representation of genes encoding four major malarial surface antigens, MSP-1, MSP-2, MSP-3 and MSP-6, of Plasmodium falciparum. Length in amino acids is indicated. Msp-1 alleles of P. falciparum may be divided into 17 blocks according to levels of sequence divergence (amino-acid similarity of 87–97% in conserved blocks, 65–77% in semiconserved blocks and 13–38% in variable blocks). The sequences of each block may be grouped into one of two allelic lineages, K1 and MAD20 (Tanabe et al, 1987). Block 2 represents an exception to dimorphism, as an apparently non-repetitive version (known as RO33) is found in addition to K1-type and MAD20-type alleles. Naturally occurring Msp-1 alleles are mosaics generated by meiotic recombination between dimorphic lineages (Tanabe et al, 1987), but such recombination events do not occur in 3.7 kb of DNA sequence between blocks 6 and 16 (Tanabe et al, 1987; Miller et al, 1993). Throughout this paper, we refer to the region between blocks 6 and 16, where recombination is suppressed, as the dimorphic segment of the Msp-1 gene. Most diversity in MSP-2 occurs in its central repeats (block 3), which are flanked by variable non-repetitive blocks 2 and 4 and conserved blocks 1 and 5 (Snewin et al, 1991). Two allelic lineages (FC27 and 3D7) are found in natural parasite populations. Repeats in the FC27 family consist of 1–4 copies of a relatively conserved 32-mer motif followed by 0–5 copies of a 12-mer motif. Even more variation occurs in block 3 of 3D7-type alleles: they differ according to the sequence, length (2–10 amino acids) and number of copies of GSA-rich repetitive motifs, the sequence of non-repetitive domains that flank the repeats and the number of threonine residues at the 3′ end of block 3. Hybrid alleles with sequences derived from both dimorphic lineages were found in only 3.1% of the 448 Msp-2 alleles so far sequenced (Ferreira and Hartl, 2006). Most divergence in P. falciparum Msp-3 alleles is found in the central region of the gene, which contains three blocks of sequence coding for Ala-X-X-Ala-X-X-X heptad repeats (where Ala=alanine and X is any residue), but the 5′ and 3′ ends of the gene are conserved (McColl and Anders, 1997; see Supplementary Figure 3 online). The central region of Msp-6 may or may not contain three domains (respectively 15-, 24- and 129-bp long), which are present in K1-type and P. reichenowi alleles but absent from 3D7-type alleles. Additional sequence divergence is found in the domains flanking the repeats, which contain degenerate GAAGAT/GAAACA and GAAGAA(AAA/G) repeat motifs (Pearce et al, 2004; Supplementary Figure 2 online).

This review examines patterns of sequence diversity at dimorphic loci coding for merozoite surface antigens of P. falciparum putatively involved in red blood cell invasion and immune evasion. We review available sequence data, previous hypotheses for the origin and maintenance of dimorphism, and the difficulties of explaining the available data under these proposals. We conclude by offering several possible explanations for dimorphism, which we think worthy of further theoretical and empirical exploration.

Genealogy of dimorphic allelic lineages

As recently described for single-nucleotide polymorphisms across the human genome (Zhang et al, 2003), strictly neutral evolution may yield two classes of widely divergent alleles or haplotypes. This is because the basal branches of genealogies of neutrally evolving alleles are longer than the terminal branches (Hudson, 1990; Figure 2a), which may create the impression of allelic dimorphism. However, allelic dimorphism in malarial antigens represents a deviation from neutral expectation in four ways.

  1. 1

    As expected for genes evolving under balancing selection, the genealogy of malaria surface antigen genes is usually deeper (Figure 2b; Hughes, 1992; Conway and Baum, 2002; Polley et al, 2005) than that of other nuclear genes within the species (Rich and Ayala, 2000; Tanabe et al, 2004; Hartl, 2004), as balancing selection tends to retain different allelic variants that otherwise would have been eliminated by random genetic drift.

  2. 2

    As in other cases of balancing selection, levels of divergence within allelic classes are much lower than those between allelic classes (Figure 2b). At various loci evolving under balancing selection, such as MHC genes, the S locus of self-incompatibility in plants, and the b1 mating-type locus of the fungus Coprinus cinereus, different allelic classes often diverged millions of years ago, but there is often little or no variation within allelic classes (reviewed by Richman, 2000). Levels of diversity within allelic families at dimorphic P. falciparum loci are higher than those observed in most other cases of balancing selection, but the divergence between allelic classes is still orders of magnitude greater than the divergence within allelic classes.

  3. 3

    As with other loci under balancing selection with clearly defined allelic classes, recombinants between different allelic classes are either rare or not observed at all (Stein et al, 1991). The long-term persistence of two highly divergent allelic classes suggests that recombination between them has been suppressed over millions of years (see ‘Structure of dimorphic loci’).

  4. 4

    Genealogies of dimorphic malarial antigens vary from other cases of balancing selection in that there are exactly two different allelic classes, rather than three or more. The striking feature of dimorphic malarial loci is a single extremely deep internal branch dividing all known alleles into two distinct allelic classes (Figure 2c). Allelic dimorphism therefore represents a deviation from the expected shape of the genealogy in which basal branches are disproportionally long, with extensive sequence divergence between two allelic classes.

Figure 2
figure 2

Allelic genealogies under neutral evolution, standard balancing selection and allelic dimorphism. (a) Neutral evolution. The time to the last common ancestor of the sample (TLCA) is expected to be four times the average time to the common ancestor of pairs of individuals that diverged more recently than the common ancestor. (b) Balancing selection with four allelic classes (white circles). TLCA is much deeper than for case (a). Divergences within allelic classes are much more shallow than divergences between allelic classes. As in the absence of balancing selection, the time to the last common ancestor of the sample is expected to be four times the average time to the common ancestor of allelic classes that diverged more recently than the common ancestor. (c) Allelic dimorphism. As with standard balancing selection, TLCA is much deeper than in the absence of balancing selection. In this case, there are exactly two allelic classes, and TLCA is much deeper than four times the average divergence within allelic classes.

Defining allelic dimorphism: the one-fourth rule

We used a standard neutral coalescent approach (reviewed by Hudson, 1990) to derive a simple test of whether allelic dimorphism represents a significant departure from neutral expectations. If we divide a sample into two maximally diverged allelic classes (for instance, by minimizing within-class pairwise divergence), the expected average time to the common ancestor of two individuals within each class can be shown to be one-fourth of the expected time to the last common ancestor of the entire sample (TLCA) (Figure 2c). In other words, the average degree of divergence between alleles from the same class should be roughly one-fourth the average divergence between alleles from opposite classes (see Box 1). Results of numeric simulations to explore the behaviour of this metric are provided in Supplementary Materials online.

Box 1 The one-fourth rule

As under strong balancing selection all branches of the genealogy are proportionally longer, the expected relative length of each branch is the same as under the standard coalescent model (Takahata, 1990). Therefore, the one-fourth rule distinguishes dimorphism (Figure 2c) from both standard neutral coalescence models (Figure 2a) and general balancing selection models with more than two alleles (Figure 2b).

This rule defines a minimum condition under which a locus should be considered dimorphic. That is, a locus should not be considered dimorphic if there is more than one-fourth as much within-class as between-class divergence. The utility of this metric is illustrated by the analysis of two P. falciparum antigens, EBA-175 and MSP-3, which have been reported to exhibit allelic dimorphism (Ware et al, 1993; Conway and Baum, 2002) but do not pass the one-fourth rule (Supplementary Material online).

Structure of dimorphic loci

Allelic dimorphism at the Msp-1 and Msp-2 loci of P. falciparum (Figure 1) has been confirmed by extensive sampling of alleles (Tanabe et al, 1987; Snewin et al, 1991; Miller et al, 1993; Ferreira and Hartl, 2006). The limited available data also suggest that the central region of P. falciparum MSP-6 is dimorphic (Pearce et al, 2004; Supplementary Figure 2 online). Across the 148 amino-acid dimorphic region, nucleotide divergence between the K1 and 3D7 classes is 35 times higher than that within each class (6.4 vs 0.18%). As only eight Msp-6 sequences are available, further sampling is necessary to confirm these findings. Interestingly, although MSP-1 and MSP-6 are functionally related (they form a protein complex that is thought to be involved in red blood cell invasion), there is no clear linkage disequilibrium between the allelic status of these loci in different strains (Pearce et al, 2004).

The dimorphic domains of MSP-1, MSP-2 and MSP-6 are bracketed between conserved domains (Figure 1). This pattern is very likely due to recombination between dimorphic domains, which are evolving under balancing selection, and adjacent coding regions, which are not evolving under balancing selection. A single recombination between allelic classes per generation is sufficient to homogenize flanking regions in a randomly mating population (Hudson, 1990). Thus, persistence of dimorphism implies a suppression of recombination within dimorphic domains, either owing to a lack of meiotic crossovers or selection against recombinants. As the dimorphic domains of MSP-1 (blocks 6–16) contain significant stretches of conserved sequence (Figure 1), which would afford ample opportunity for occasional recombination, the absence of recombinants (Tanabe et al, 1987; Miller et al, 1993) seems very likely to be due to selective disadvantage. Suppression of recombination between alleles encoding highly divergent antigenic variants is the expected outcome if variant-specific immunity plays a major role in parasite clearance (Ferreira et al, 2004); parasites expressing hybrid variants of surface proteins, with antigenic determinants derived from both major dimorphic types, would be recognized by hosts previously exposed to either dimorphic type, reducing their fitness (McKenzie et al, 2001). Recombination creates a clear distinction between non-recombining dimorphic domains (which are quite likely to closely correspond to immunodominant domains under balancing selection) and the conserved flanking regions, which have been homogenized by recombination events (Figure 1).

Dates of divergence between dimorphic lineages

Different P. falciparum isolates exhibit vastly different alleles at the Msp-1 locus, with amino-acid sequence divergence between alleles exceeding 80% in some domains. Based on protein sequence divergence, Hughes (1992) estimated the time of divergence between the major Msp-1 families, MAD20 and K1, as 35 million years ago (Mya), using as outgroups the rodent parasites P. yoelii and P. chabaudi (presumed to have diverged coincident with the rodent-primate split, 80 Mya). More recently, Polley et al (2005) provided an estimate of 27 Mya, using protein-sequence comparisons of orthologous loci in species likely to have diverged both before and after that date. As both estimates are based on alignments of highly divergent sequences, which have apparently experienced a large amount of insertion and deletion events since their divergence, aligned residues may not actually reflect homologous relationships, leading to inaccurate estimated rates of protein evolution (Rich and Ayala, 2000; Hartl et al, 2002).

One way to avert such problems is to limit consideration to domains of dimorphic loci that are sufficiently conserved to allow confidence about residue homology in alignments. We compared blocks 6–16 from the P. falciparum Msp-1 alleles MAD20 and K1 (3.7 kb of DNA sequence) with the single published Msp-1 sequence of the closely related primate parasite P. reichenowi (Polley et al, 2005), but considered only those codons for which three-way identity of the upstream two residues and the downstream two residues made confident assignment of residue homology possible. This identified 179 amino-acid positions. There was synonymous divergence between K1 and P. reichenowi and between K1 and MAD20 at three and 17 positions, respectively. Assuming a P. falciparum-P. reichenowi divergence at 5–7 Mya, this is consistent with a K1-MAD20 divergence 28.3–39.7 Mya. There were nonsynonymous divergences between K1 and P. reichenowi and between K1 and MAD20 at 16 and 34 positions, respectively, consistent with a P. falciparum-P. reichenowi divergence at 10.6–14.9 Mya.

Analogous calculations were made for Msp-2 (117 amino-acid positions aligned), for which the P. reichenowi ortholog has been published (Dubbeld et al, 1998), and Msp-6, for which we assembled a Msp-6 contig using sequence reads from the P. reichenowi genome sequencing project database (Supplementary Figure 2 online). The number of synonymous nucleotide replacements is consistent with 0.46–0.64 million years of divergence of P. falciparum Msp-2 alleles, whereas the number of nonsynonymous nucleotide replacements is consistent with a slightly more recent divergence, 0.34–0.46 Mya (Ferreira and Hartl, 2006). In 92 amino-acid positions of three-way alignment within the potentially dimorphic region of Msp-6, there are four synonymous and 28 nonsynonymous differences between the two classes in P. falciparum, and 0 synonymous and 24 nonsynonymous differences between the P. reichenowi allele and the most closely related P. falciparum allele. This suggests a between-class divergence of Msp-6 coincident with or slightly before the speciation event. Although uncertainties remain as to the exact dates of between-class divergence, owing to the small numbers of sites available for these analyses, we can be confident of qualitative differences among MSP-1 (well before the speciation event), MSP-2 (well after the speciation event) and MSP-6 (roughly coincident with the speciation event).

Evolutionary history of dimorphic domains in MSP-1

The dimorphic region of MSP-1 spans blocks 6–16, with both generally conserved and highly divergent domains (Figure 1). Whether the conserved regions are deeply diverged from each other but conserved by functional constraints or more recently homogenized by recombination (as with the 5′ and 3′ regions of various dimorphic loci including Msp-1) is unclear (Polley et al, 2005). Table 1 gives the observed number of synonymous and nonsynonymous differences across the 17 Msp-1 blocks. In all dimorphic domains but block 12, MAD20 is the most divergent sequence, consistent with a deep MAD20-K1 divergence across the entire region. The conserved blocks 11 and 13 show a clear preponderance of changes in MAD20, relative to K1 and the single P. reichenowi allele, whereas block 12 shows four differences in P. reichenowi (Figure 3). In addition, there are two differences in MAD20 and one in K1 at sites that did not pass our strict homology filter but which are nonetheless clearly homologous (Figure 3).

Table 2 Comparison of synonymous and nonsynonymous nucleotide differences at Msp-1 alleles of P. reichenowi (Pr) and the K1 and MAD20 dimorphic types of P. falciparum
Figure 3
figure 3

Alignment of blocks 11–13 of Msp-1 alleles for P. reichenowi (PrMSP1) and P. falciparum isolates K1 and MAD20. In the last line of the alignment, signs indicate columns with conserved residues (asterisks), columns with residues with roughly the same size and hydropathy (colons) and columns with residues with either similar size or hydropathy (periods). Nonsynonymous and synonymous (underlined) differences, as determined at the nucleotide level, are shown in bold, and summarized on the phylogenetic trees. Whereas blocks 11 and 13 show a clear majority of divergent sites at which MAD20 differs from P. reichenowi and K1, block 12 shows a slight excess of P. reichenowi-P. falciparum differences. This could reflect homogenization between the P. falciparum alleles by recombination or, as argued in the text, statistical fluctuation.

The relatively low K1-MAD20 divergence in block 12 could reflect a post-speciation homogenization (recombination) event, or could simply be due to random sampling. To test whether such a pattern deviates from random expectations, we simulated nine P. reichenowi differences out of 45 total differences (as found across blocks 11–13) and found a stretch of seven consecutive differences containing four P. reichenowi differences (as in block 12) in 2956 out of 10 000 runs (P=0.30). As the pattern observed in block 12 is not a strong evidence for a homogenization event, sequence data suggest that no recombination between K1 and MAD20 alleles has occurred over the entire dimorphic region since the P. falciparum-P. reichenowi speciation event, and quite possibly since the initial MAD20-K1 divergence.

Previous hypotheses about the allelic dimorphic in Plasmodium

Previous hypotheses for the origin of dimorphism in P. falciparum are reproduced in Box 2 (Supplementary Material online). Here, we outline the reasons why these hypotheses are not likely to provide a general explanation for dimorphism in P. falciparum surface antigens.

Intraspecific population division

In their classic paper unveiling the dimorphic structure of Msp-1 in P. falciparum, Tanabe et al (1987) hypothesized that the extensive sequence divergence in variable blocks of MSP-1 resulted from adaptation of parasites to local host populations. However, as the MAD20 and K1 alleles diverged several millions of years ago, such divergence would reflect the adaptation to highly structured ancient host populations, before the chimpanzee-human split. The maintenance of dimorphism after the speciation event is an unlikely result, as there is no evidence that dimorphic lineages differ in functional properties, such as the ability to bind human red blood cells of different phenotypes or mediate invasion through different pathways (Binks and Conway, 1999).

Recent interspecific introgression

Alternatively, the divergence between dimorphic classes could reflect an ancient speciation event, with more recent introgression between species leading to the presence of highly divergent alleles in modern P. falciparum. This hypothesis does not require dimorphic alleles to be under strong selection pressure. There are several reasons why we consider this hypothesis unlikely.

  1. a)

    The divergence between allelic families should reflect the time of speciation between P. falciparum and the other species, but the dates of divergence between allelic families vary across dimorphic loci. Dimorphism at each locus would therefore need to reflect separate speciation and introgression events, a surprising scenario for which no direct evidence is known.

  2. b)

    As no close relative of P. falciparum other than P. reichenowi is currently known, there is thus no obvious donors for introgression events leading to dimorphism in MSP-1, MSP-2 or MSP-6.

  3. c)

    In the absence of balancing selection, the success of an introgressed allele is expected to be similar to that of a neutral variant or likely less, as an allele unaccustomed to a new genetic background might not be functionally equivalent to the native copy. The fraction of introgressed alleles that would go on to be highly represented in the population would therefore be in the order of 1/N, and the amount of time during which significant polymorphism would be seen would be similar to the neutral coalescent time, nearly 100 000 years for P. falciparum (Tanabe et al, 2004). A locus would therefore only be expected to exhibit apparent dimorphism for a small fraction of the time, unless the rate of introgression was very high. Rates of introgression high enough to explain multiple dimorphic loci would be expected to have homogenized the two species genomes, a result which has not been found.

  4. d)

    If introgression in the absence of strong balancing selection is responsible for the presence of highly divergent dimorphic alleles, any locus should have a roughly equal chance of showing a dimorphic structure. Instead, there is a clear bias towards surface antigens to be dimorphic, or more generally polymorphic (Volkman et al, 2002).

  5. e)

    Alternatively, introgression could explain the origin of allelic dimorphism, with subsequent balancing selection maintaining the two allelic classes. However, if balancing selection is capable of maintaining multiple allelic classes for a long time, there is no reason to invoke an introgression event to explain the high level of diversity. An initial mutational event could provide a selective differentiation between alleles, with subsequent additional differentiating (as well as neutral) mutations leading to ever greater allelic divergence.

Gene duplication

A third hypothesis suggests that allelic families evolved as paralogs, owing to gene duplication events, followed by the deletion of one of the copies (Hartl et al, 2002). There are three reasons why accumulating data have rendered this hypothesis unlikely.

  1. a)

    The dimorphic alleles are present in a single copy in all parasite strains so far examined. As the postulated decrease in selection for multiple copies would be coincident with a relatively recent increase in transmission and an associated increase in parasite population size, a very large number of (nearly) neutrally selected, independent unequal crossover events would be necessary to explain the lack of known two-copy genotypes at any of the dimorphic loci.

  2. b)

    If unequal crossover events are sufficiently common that all available alleles have lost one gene copy, three or more copies should also be observed.

  3. c)

    Both copies would need to be maintained in the ancestral lineage for an extremely long time, then subsequently lost both in P. reichenowi and P. falciparum. If increased transmission explains copy number reduction, a similar increase in transmission must both be postulated in chimpanzees, and denied in millions of years of prior hominid evolution.

Idiosyncracies of mutational patterns

Rich and Ayala (2000) argued that allelic dimorphism was a likely outcome of the evolution of heterogeneous repetitive regions. In the case of Msp-2, the common P. reichenowi-P. falciparum ancestor appears to contain representatives of two repeat types, each of which dominates one of the two observed allelic classes in P. falciparum. If evolution of these regions proceeds by expansion and deletion of each region, the outcome of the evolution could be the extinction of one or the other repeat class in each allele, yielding allelic dimorphism. However, it is not clear why both repeat classes should be retained in P. reichenowi. This model may explain dimorphism in MSP-2 (Figure 4), but not in blocks 6–16 of MSP-1, that are not clearly repetitive.

Figure 4
figure 4

A model for the origin of dimorphic lineages at the Msp-2 locus of P. falciparum (Rich and Ayala, 2000). The common ancestry between the GSA-rich repeat motifs found in P. reichenowi Msp-2 and one (3D7) of the dimorphic lineages of P. falciparum MSP-2 is revealed by the first repeat homology region, RHR1. Both repeat motifs appear to have originated from the proliferation of a GGTGCT hexamer present in the ancestral sequence. The other repeat homology region shown here, RHR3, comprises a 10-bp motif, shared by alleles of both species. This motif is present in a single copy in P. reichenowi Msp-2 and in two copies (that flank the 12-mer repeats) in FC27-type alleles of P. falciparum Msp-2. These 12-mer repeats, however, are absent from 3D7-type alleles and P. reichenowi Msp-2. Redrawn from an unpublished figure by Stephen M Rich (University of Massachusetts, Amherst, USA), with permission.

Population bottleneck

Rich and Ayala (2000) also examined the hypothesis that allelic dimorphism at Msp-1 of P. falciparum could reflect a recent population bottleneck from a formerly more diverse P. falciparum population, with only two distantly related alleles surviving. However, although this hypothesis offers a plausible explanation for the observation of dimorphism at a single locus, it does not predict the recurrent observation of exactly two allelic families at some loci, and thus fails to offer a general explanation for allelic dimorphism.

Possible classes of hypotheses about dimorphism

Ongoing work on patterns of intraspecific diversity in P. falciparum has identified a recurrent pattern of allelic dimorphism across loci. Whereas early hypotheses for the origin of dimorphism offered ad hoc explanations for a particular case of dimorphism, our current understanding of dimorphic loci and the commonalities of patterns of polymorphism across loci suggest a more general explanation. We expect that the ultimate explanation will include some subset of the following three factors.

Population structure

Theoretical and empirical work shows that population bottlenecks can lead to reduction in levels of allelic diversity both at loci evolving under balancing selection and other loci, but no model that we are aware of predicts reduction to exactly two allelic classes. Some evidence suggests that, under strong balancing selection, highly divergent alleles preferentially survive bottlenecks, a pattern which might ultimately lead to the maintenance of only a few highly divergent allelic classes (reviewed by Richman, 2000). Again, why the number of classes should so frequently be two is not clear. In particular, the effect of recurrent population size fluctuation should be explored.

Selective relationship between alleles

Models of allelic genealogies under balancing selection are dependent on the assumed mode of balancing selection. For instance, the nearly equal frequencies of different allelic types for the self-incompatibility S locus of flowering plants suggests straightforward frequency-dependent selection, whereas the dearth of diversity within allelic types suggests selective equivalence between alleles of the same type. The mode of selection between alleles for Plasmodium surface antigens has sometimes been assumed to be similar, with simple frequency-dependent selection determining frequencies between allelic types but without selective distinction between alleles of the same dimorphic type. However, allelic families of some dimorphic loci exhibit considerable diversity, which might reflect selective differences within families. Moreover, the consistent observation of a predominance of MAD20-type over K1-type alleles at Msp-1 in most malaria-endemic areas (Conway, 1997) might suggest a more complicated relationship. The possibility that different modes of balancing selection might drive dimorphism should be explored further.

Mutational rates and patterns

Different assumptions about rates of creation of functionally distinct alleles should be further explored. If for some reason the rate of mutation to possible alternative allelic types is somehow influenced by current allelic structure, there could be a high probability of observing only two alleles. Such explanations might be particularly important if rates of mutation to selectively differentiable alleles were very low. However, it is not clear why such a pattern should be found across otherwise unrelated loci.

Conclusions

Ongoing research has illuminated the patterns of allelic dimorphism in blood-stage surface antigens of P. falciparum. Patterns across loci show both commonalities (deep allelic genealogies over a well-defined region of the gene, with recombination apparently homogenizing other areas) and differences (total genealogy depth, mode of mutation, observed allelic frequencies).

Immediate further work should address four issues: (1) New general hypotheses about the origins of allelic dimorphism should be developed and tested. (2) Further sequence data should confirm or reject the existence of allelic dimorphism at P. falciparum Msp-6. (3) Genome-wide studies of additional P. falciparum strains should probe the prevalence and character of allelic dimorphism. (4) Genomic and case studies of additional Plasmodium parasites should investigate the generality across species of allelic dimorphism.