The challenges to global food security are complex and compounding. Our growing population and changing dietary expectations are projected to increase demand on food systems for at least the next four decades15, outpacing forecasted crop yield gains6. Limitations in land, water and other natural resource inputs, competition for arable soils with non-food crops and other land uses, soil degradation, climate change and the need to minimize harmful impacts on ecosystem services and biodiversity further constrain production potential3,4,7,8. Although gains in food availability may partially be obtained through dietary change and food waste reduction1,3, increases in the productivity, resilience and sustainability of current agricultural systems are clearly necessary5. Key to this sustainable intensification is the use of novel genetic diversity in plant breeding to produce crop varieties containing traits such as drought and heat tolerance, increased pest and disease resistance, and input use efficiency911.

As sources of new genetic diversity, crop wild relatives—the wild cousins of cultivated plant species—have been used for many decades for plant breeding, contributing a wide range of beneficial agronomic and nutritional traits1217. Their utilization is expected to increase as a result of ongoing improvements in information on species and their diversity and advances in breeding tools16,18. However, this expectation is based on the assumption that crop wild relatives will be readily available for research and plant breeding, which requires their conservation as germplasm accessions in gene banks as well as functioning mechanisms to enable access to this diversity10,11. Preliminary assessments of the comprehensiveness of conservation of wild relatives in gene banks have suggested substantial gaps19,20, and wild populations of a range of species are threatened by the conversion of natural habitats to agriculture, urbanization, invasive species, mining, climate change and/or pollution2123. A concerted effort devoted to improving the conservation and availability of crop wild relatives for crop improvement is thus timely both for biodiversity conservation and for food security objectives24, as the window of opportunity to resolve these deficiencies will not remain open indefinitely20,22.

We conducted a detailed analysis of the extent of representation of the wild relatives of 81 crops in gene banks equipped to provide access to these genetic resources to the global research and breeding community. The crops include major and minor cereals, root and tuber crops, oilcrops, vegetables, fruits, forages and spices, chosen on the basis of their importance to food security, income generation and sustainable agricultural production (Supplementary Table 1). We first modelled the geographic distributions of a total of 1,076 unique crop wild relative taxa from 76 genera and 24 plant families (Supplementary Table 2). We then compared the potential geographic and ecological diversity encompassed in these distributions to that which is currently accessible in gene banks25. To aid conservation strategies, we categorized taxa with a final priority score (FPS) for further collecting from the natural habitats of crop wild relatives to increase representation in gene banks, on a scale from zero to ten. The FPS was created by averaging each taxon's assessed current representation in gene banks in regard to overall number of accessions, geographic diversity and ecological diversity. High priority for further collecting was assigned for taxa where FPS ≥ 7 (that is, very little or no current representation in gene banks); medium priority where 5 ≤ FPS < 7; low priority where 2.5 ≤ FPS < 5; and sufficiently represented for taxa with FPS < 2.5. Finally, we identified geographic hotspots where considerable richness of high-priority wild relative taxa is concentrated. Such sites represent particularly valuable targets, both for efficient collecting for ex situ conservation in gene banks and for in situ conservation in protected areas.

Results

The distributions of crop wild relatives were modelled to occur on all continents except Antarctica, and throughout most of the tropics, subtropics and temperate regions, except the most arid areas and polar zones (Fig. 1). The greatest richness of taxa was modelled in the Mediterranean, Near East and southern Europe, South America, Southeast and East Asia, and Mesoamerica, with up to 84 taxa overlapping in a single 25 km2 grid cell. These richness hotspots largely align with traditionally recognized centres of crop diversity26, although the analysis also identified a number of less well-recognized areas, for example central and western Europe, the eastern USA, southeastern Africa and northern Australia, which also contain considerable richness. Hotspots in tropical and subtropical areas also largely aligned with zones recorded as possessing high richness of endemic flora and fauna, and experiencing exceptional degrees of loss of habitat27. Temperate regions identified under the same criteria, for example the California and Cape Floristic Provinces, southwestern Australia, central Chile and New Zealand, had considerably less overlap with areas rich in crop wild relatives.

Figure 1: Crop wild relative taxon richness map.
figure 1

The map displays overlapping potential distribution models for assessed crop wild relatives. Dark red indicates greater overlap of potential distributions of taxa, that is, where greater numbers of crop wild relative taxa occur in the same geographic area.

Wild relative taxa as a class of plant genetic resources were found to be critically under-represented in gene banks. For 313 (29.1% of total) taxa associated with 63 crops, no germplasm accessions exist at all, and a further 257 taxa are represented by fewer than ten accessions. A total of 765 (71.1%) taxa were ranked as high priority for further collecting from their natural habitats, 148 (13.8%) as medium priority, 118 (11.0%) as low priority and only 45 (4.2%) as currently sufficiently represented in gene banks (Supplementary Table 2). The mean FPS across all species (7.9 ± 2.5 (mean ± s.d.)) fitted well within the high priority category range (Fig. 2). Lack of geographic and ecological representation in gene banks contributed significantly to most of the high FPS values, whereas less extreme gaps were generally evident in the total numbers of accessions conserved (Supplementary Fig. 1).

Figure 2: Collecting and conservation priorities for crop wild relatives by associated crop.
figure 2

Black circles represent the FPS for further collecting for wild relative taxa, with larger grey circles representing the average FPS across taxa per crop gene pool. The blue straight vertical line represents the mean FPS across all crop wild relative taxa within all crop gene pools.

An analysis of wild relatives grouped by their associated crop (that is, by crop gene pool) revealed that 72% of the crop gene pools had been assigned to high priority for further collecting (as an average of FPS scores across associated wild relative taxa), and thus require urgent conservation action (Fig. 2). These included the gene pools of commodity crops of critical importance to global food supplies and/or agricultural production, for example sugarcane (9.2 ± 1.6), sugar beet (8.1 ± 1.6) and maize (6.9 ± 2.1), as well as important food security staples such as banana and plantain (9.4 ± 0.8), cassava (9.0 ± 1.6), sorghum (8.8 ± 1.0), yams (8.5 ± 2.9), cowpea (8.4 ± 1.7), sweet potato (8.4 ± 1.7), pigeon pea (8.4 ± 1.1), millets (8.4 ± 2.7) and groundnut (7.6 ± 1.8) (Fig. 3 and Supplementary Table 1). High priority was also assigned to the gene pools of numerous crops important for smallholder income generation in the tropics (for example, cacao and papaya) and minor crops increasing in popularity because of their nutritional qualities (quinoa), as well as various other important fruits (for example, grape, apple, watermelon, orange and mango), oilcrops (rapeseed) and forages (alfalfa) possessing considerable numbers of wild related taxa. Although all gene pools contained taxa with considerable conservation concerns, the wild relatives of fruits, forages, sugar crops, starchy roots and vegetables were those assessed as least well represented in gene banks (Supplementary Fig. 2). Average FPS values across all wild relatives per crop type were 8.8 ± 1.8 for fruits, 8.7 ± 1.7 for forages, 8.6 ± 1.6 for sugar crops, 8.2 ± 2.3 for starchy roots, 8.1 ± 2.4 for vegetables, 7.2 ± 2.6 for pulses, 7.1 ± 2.3 for oilcrops, 7.1 ± 1.9 for spices and 6.4 ± 3.1 for cereals.

Figure 3: Collecting priorities for crop wild relatives and the importance of associated crops.
figure 3

The priority scale displays the average FPS across wild relatives per crop. The mean importance class of associated crops displays the significance of crops averaged across global food supplies and agricultural production metrics (see Supplementary Methods). For both axes, the scale is zero to ten, with ten representing the highest priority for further collecting/most important crop. The size of crop gene pool circles denotes the number of wild relative taxa per crop, ranging from 1 (faba bean) to 135 (cassava).

None of the 81 assessed crop gene pools demonstrated an average FPS across its wild relatives that would permit its categorization as sufficiently well represented in gene banks (Fig. 2). The wild relatives of six crops were assessed as fairly well represented, that is low current priority for further collecting for the gene pools of wheat (3.7 ± 2.4), grass pea (3.7 ± 2.0), chickpea (4.2 ± 2.6) and tomato (4.5 ± 1.9). Wheat and tomato, along with medium-priority crop gene pools such as sunflower (6.3 ± 2.2), rice (6.6 ± 2.5) and potato (6.7 ± 2.6), have a long history of use of wild relatives in crop improvement9,13 and benefit from relatively extensive germplasm collections. Other crop gene pools determined as low priority (grass pea and chickpea) have few wild relatives, and these generally present restricted distributions that have been fairly well sampled. However, specific taxa were assessed as under-represented in gene banks even within these low-priority gene pools. For example, five taxa related to wheat were assessed as medium or high priority, one taxon related to grass pea as medium priority, three taxa related to chickpea as medium priority and six taxa related to tomato as medium or high priority (Supplementary Table 2).

Proposed hotspots for further collecting for high-priority crop wild relatives were identified across the world's tropical, subtropical and temperate regions, with the most critical gaps identified in the Mediterranean, Near East, and southern and western Europe; Southeast and East Asia; and South America (Fig. 4). Up to 43 wild relative taxa (main map in Fig. 4) associated with up to 23 crops (inset map in Fig. 4) may potentially be collected within a single 25 km2 grid cell.

Figure 4: Proposed hotspots for further collecting activities for high-priority crop wild relatives.
figure 4

The map displays geographic regions where high-priority crop wild relative taxa are expected to occur and have not yet been collected and conserved in gene banks. The inset map shows gaps for under-represented taxa by crop gene pool. Dark red indicates greater overlap of potential distributions of under-represented taxa, where greater numbers of under-represented crop wild relative taxa occur in the same geographic area. For the inset map, greater numbers indicate greater overlap of taxa associated with various crops.

Discussion

Our results demonstrate that crop wild relatives are currently under-represented and a systematic effort to improve their comprehensiveness in gene banks is critically needed. These findings are remarkable given the extensive efforts particularly in the past half century by international, regional and national initiatives to conserve the broad diversity of important agricultural crops11,20. Achieving the comprehensive conservation of crop genetic resources ex situ is constrained by technical as well as political and funding challenges in recent decades11, and is most poignant for wild taxa, which are less well researched than crop species and often more difficult to conserve and to utilize11,20,24. Addressing conservation gaps globally for crop wild relatives, a goal that is specifically targeted in recent major international agreements (the United Nations' Sustainable Development Goals and the Strategic Plan for Biodiversity28) will require substantial investment and extensive international collaboration. The high spatial resolution of these results is already informing such initiatives24 and can be useful to the development of further efforts.

Here we outline priorities for collecting wild relatives on the basis of their current representation in gene banks (Fig. 2 and Supplementary Table 2), and also provide an assessment of the relative importance to global food supplies and production systems worldwide of their associated crops (Fig. 3 and Supplementary Fig. 2), as well as additional information regarding the contribution of crops to food security and sustainable agriculture (Supplementary Table 1). We recommend filling gaps in ex situ conservation first for the wild relatives of crops significant to these criteria, for example rice, maize, sugarcane, cassava, potato, bananas and plantains, sorghum, millets, sweet potato, yams, groundnut, cowpea and pigeon pea.

To further refine these priorities, additional information and filters are needed. These include incorporating knowledge of threats to populations due to habitat modification, climate change and other impacts. Preliminary field surveys and threat analyses for under-represented taxa are therefore urgently needed. We note that extensive expert evaluations of the results generally confirmed the robustness of our species distribution models and conservation prioritizations but also clearly emphasized the need to address urgent threats to the survival of many crop wild relative populations (Supplementary Fig. 3). Realistic strategies for field collecting and subsequent ex situ conservation resulting in an increased availability of germplasm for plant breeding also require negotiating policy governing germplasm collecting and exchange29,30, assessing field work risks (for example, war and civil strife in regions with high levels of diversity of wild relatives), coordinating timing of field work to maximize the collection of viable seeds and other propagules, prioritizing target crop gene pools based on the interest of the breeding community in utilizing wild germplasm, and determining the relative difficulty of maintenance of targeted wild germplasm in gene banks. Although the seeds of most wild relatives can be maintained under standard conditions for long-term conservation ex situ, some wild relatives produce recalcitrant seeds or do not produce seeds at all. Such wild relatives may require more expensive approaches (for example, in vitro or cryopreservation), and particularly for such taxa alternative conservation strategies such as the establishment of in situ conservation reserves may be more effective.

Despite an extensive effort to compile occurrence records from more than 400 different data sources, the wild relatives of a number of important agricultural crops (namely coffee, tea and avocado) were not assessed because of the lack of sufficient accessible data. We also note that a number of agricultural crops are not currently known to possess closely related wild relatives, including taro (Colocasia esculenta), coconut (Cocos nucifera) and date palm (Phoenix dactylifera). Improvements in the generation and accessibility of taxonomic, relatedness and geographic information on wild relatives19,31 may permit conservation assessments for some of these gene pools in the future.

The combination of the sampling, geographic and ecological representativeness scores used to determine the extent of conservation of the wild relatives of important agricultural crops in gene banks represents an efficient methodology for prioritizing taxa across crop gene pools given wide variations in the potential diversity encompassed in each taxon and the general absence of molecular data for such species. The sampling representativeness score permitted an indication of the total number of germplasm accessions estimated as sufficient to represent a taxon, relative to the known extent of the taxon and utilizing all gene bank and reference data regardless of whether geographical coordinates are available. Geographic and ecological variation metrics were used as proxy for genetic diversity and potential functional adaptation to diverse environments, based on the assumption that the genetic composition of plant species varies across geographic range and is associated with adaptation to different ecological conditions32. The increasing power and decreasing costs of direct measures of diversity in genomes may make significant future refinements of priorities achievable10. However, further collecting is still needed for a very large number of wild relatives in order to assemble sufficient samples to perform such genetic assessments and to help resolve taxonomic and gene pool assignment uncertainties33.

Methods

Methods used for gathering data, modelling, analyses and the associated references are available in the Supplementary Information.

Interactive maps displaying occurrence data coordinates, potential distribution models, further collecting priority maps and collecting priority categories for the crop wild relatives analysed are available at http://www.cwrdiversity.org/distribution-map/. Occurrence data used for this analysis are available at http://www.cwrdiversity.org/checklist/cwr-occurrences.php. Further information on expert evaluations of the gap analysis are available at http://www.cwrdiversity.org/expert-evaluation/.