Main

Vascular plants contribute countless services to people1,2,3,4,5, by sustaining ecosystem functioning and through direct use for food, medicine, construction, utensils and cultural activities. Many of these contributions are threatened by global anthropogenic changes6. To prevent further loss of plant species and ecosystem services, we urgently need to (1) identify plant species at risk and (2) define spatial conservation priorities that capture multiple aspects of plant diversity and its contributions to humanity.

Threatened plant species are best identified using expert-validated extinction risk assessments following the methodology of the International Union for the Conservation of Nature (IUCN) Red List of Threatened Species (hereafter, the Red List). Red List assessments are a gold standard and, as such, a cornerstone of conservation planning7,8,9. However, the Red List assessment process is time-consuming and lack of resources has so far resulted in gaps and biases in the taxonomic and spatial coverage of the Red List. For instance, only 14% of vascular plant species have been assessed (versus 80% of vertebrates10), with biases towards trees and North American species11,12,13. A complementary and more extensive resource (ThreatSearch14) integrates Red List assessments with other kinds of published extinction risk assessments15. However, this still only covers ~30% of vascular plant species12. Accurate and rapid extinction risk assessment approaches are therefore urgently needed to inform global conservation efforts and to facilitate regular re-assessments12,16.

Two approaches for accelerating extinction risk assessments have been noted: ‘criteria explicit’ and ‘category predictive’17. The former is the automated generation of preliminary assessments based on subsets of Red List criteria18,19. Such an approach has the advantage of directly implementing Red List criteria but can overpredict risk in some cases19. In the category predictive approach, extinction risk is predicted on the basis of statistical associations found between Red List categories and other variables (for example, species distribution range size), therefore only implicitly relying on the Red List criteria. This approach often uses machine learning (ML)20,21,22,23 and its performance is sensitive to model parametrization and dataset properties, which create uncertainty around the predictions24. However, ML has been shown to perform well on a range of plant taxa and geographic scales21,23,25 and prediction uncertainty can be addressed by performing sensitivity analyses and by comparing predictions generated by models with different strengths and weaknesses26. We hereafter refer to Red List and ThreatSearch assessments as ‘published extinction risk assessments’, to make the distinction between them and ‘extinction risk predictions’ obtained using ML approaches.

To define conservation priorities, published extinction risk assessments or predictions can be combined with measures of species evolutionary distinctiveness (how much a species contributes to the phylogenetic diversity of its clade27) and functional distinctiveness (how functionally distant a species is from other species, on average28). Combining multiple aspects of biodiversity into a single prioritization criterion has the potential to increase conservation success by capturing more biodiversity features and associated contributions to people29,30. So far, however, studies combining taxonomic, functional and phylogenetic diversity have been geographically restricted31,32 and/or focused on vertebrates33,34,35,36. As a result, there is an urgent need to extend such studies to vascular plants given their myriad contributions to people and their importance for progress towards the United Nations 2030 Sustainable Development Goals (https://sdgs.un.org/2030agenda) and post-2020 targets of the Global Biodiversity Framework of the Convention on Biological Diversity (CBD) (https://www.cbd.int/conferences/post2020).

Integrative plant conservation approaches can also benefit from the inclusion of information on plant uses (ethnobotanical knowledge). This is especially relevant because it remains unclear how well plant uses and other aspects of nature’s contributions to people are captured by measures of phylogenetic and functional diversity37. Plant uses can be underpinned by genetic and functional traits, resulting in correlations between uses and certain features or phylogenetic placements38,39,40,41. However, plant use is also driven by plant availability and trait-independent cultural practices37, thus these correlations are not always apparent42. In this context, integrating plant use data into plant conservation studies is needed to evaluate risks associated with these culturally important uses. Finding a relationship between uses and genetic and functional traits may in turn help to identify potential substitutes for threatened used species because phylogenetically and/or functionally closer species are likely to be better substitutes than randomly chosen species41. So far, this idea is poorly explored and plant use data remain largely untapped in large-scale conservation studies (but see recent studies focusing on New Guinea and Brazil43,44), hindering the global conservation of plants that support human livelihoods.

This study integrates phylogenetic, functional and ethnobotanical information to provide a global conservation survey of palms (Arecaceae), a keystone plant family. Palms are among the most economically important plants in the world, with hundreds of wild species providing essential contributions to millions of people (for example, food, medicine and construction)37,45 and even supporting large-scale industries (for example, rattan products46). They are important components of vulnerable and biodiverse ecosystems such as tropical rainforests47. Their unique functional traits provide shelter and food for animals48,49 and deliver essential contributions to ecosystems50,51,52. Palm phylogenetics, functional traits and uses are well-documented compared to many other tropical plant families49,53,54,55,56 but they have never been analysed together from a global conservation perspective. Currently, extinction risk assessments are published on the Red List for 797 (31%)57 of the family’s ~2,500 species58,59, with palms from Africa and especially Madagascar being particularly well represented60,61. This number increases to 61% when including global assessments published on ThreatSearch14. However, it decreases to 23% (Red List) and 34.5% (ThreatSearch) when only including assessments published in the last decade (hereafter ‘recent’), an age above which assessments are considered outdated by the Red List16 (https://www.iucnredlist.org/assessment/process (accessed 16 September 2021)). The last global assessment of palms was published 25 years ago, falling well outside this time window62 and a recent global analysis of vascular plant Red List assessments concluded that extinction risk may be overestimated in the palm family due to representation biases (for example, in favour of threatened species and/or species from Madagascar)12.

Here, we integrate multiple aspects of palm diversity to advance conservation planning and research by (1) quantifying levels of extinction risk among evolutionarily distinct, functionally distinct and used palm species, (2) identifying global priority regions for palm conservation that capture these three aspects and (3) exploring the resilience of palm uses. To quantify risk, we draw on recently published extinction risk assessments of palms to train, test and evaluate 48 ML models covering a broad spectrum of strategies to address dataset gaps and imbalances (Extended Data Fig. 1, Supplementary Tables 1 and 2, Supplementary Note and Supplementary Fig. 1). We then predict the extinction risk of 1,381 palm species using the model with the highest balanced accuracy (82%; Supplementary Table 1) and we combine these predictions with 508 recently published extinction risk assessments (≤10 years old). This ‘total evidence’ approach allows us to provide a global extinction risk overview for 75% of the world’s palm species and to identify priority regions for palm conservation on the basis of levels of risk among evolutionarily distinct, functionally distinct and used palm species. Finally, we develop a new approach based on phylogenetic, functional and ecological information to estimate to what degree threatened used species may be substituted by co-occurring non-threatened species for providing similar services or products (Methods). Our work has practical implications for the conservation of the economically and ecologically important palm family and provides new understanding of how integrative approaches can enhance sustainable biodiversity use.

Results

Palm diversity and uses at risk

We found that over half of the world’s palm species may be threatened and that scenarios incorporating ML estimates differ from extrapolations based solely on published assessments (Fig. 1a). Our most accurate model classified 703 palm species as threatened (Supplementary Table 3). Adding these predictions to the 353 species assessed as threatened in recently published assessments, we found that 56% of the 1,889 palm species included in the total evidence approach were threatened. In contrast, 69–80% of palm species with published assessments appear to be threatened, depending on if including only recent or all published assessments (Fig. 1a and Supplementary Table 3).

Fig. 1: Percentages of threatened palm species in different diversity and use categories.
figure 1

a, Comparison between global estimates obtained with or without ML predictions. Triangles, estimates obtained by combining recent published assessments and ML predictions from the most accurate model (total evidence approach). Circles, extrapolations based on published assessments only: grey, all published assessments; black, only recent (that is ≤10 years old) published assessments. b, Regional percentages of threatened species according to the total evidence approach. c, Difference between regional percentages of threatened species obtained with and without ML predictions. d, Regional variation in percentages of threatened species, for each diversity and use category, under the total evidence approach. For all boxplots in b and d, each dot represents one region with at least one palm species of the category considered, the bold line represents the median value, the box spans values from the first to the third quartile and the lines outside the box extend until the smallest and largest values, no further than 1.5 times the distance between the first and third quartiles. To help visualization when many regions have 0% of threatened species, dots for these regions are not shown but instead the number of regions with 0% of threatened species is indicated.

Source data

We then considered threat levels among evolutionarily distinct, functionally distinct and used palms. The last included palms with at least one recorded use among ten use categories but we also separately focused on four main use categories: ‘culture’, ‘food’, ‘medicine’ and ‘utensils, tools and construction’ (Methods). According to the total evidence approach, 455 (48%) of the evolutionarily distinct, 447 (47.5%) of the functionally distinct and 185 (29%) of the used species were threatened. Percentages of threatened species were higher among species used for food and utensils, tools and construction than for culture or medicine (Fig. 1a and Supplementary Table 3). In all categories, total evidence estimates were lower than estimates based only on published assessments, especially when the latter included old assessments. The greatest difference was observed for used American species (13% versus 41–72%; Extended Data Fig. 2a and Supplementary Table 3).

Globally, palms occur in 227 out of 369 level-3 botanical countries (hereafter ‘regions’; Methods). Applying the total evidence approach, the percentage of threatened species per region ranged from zero to 100%, with mean values of 21% and 9% for ‘species-rich’ (>10 species) and ‘species-poor’ (≤10 species) regions, respectively (Fig. 1b and Supplementary Table 4). The percentage of threatened species per region correlated positively with the number of species in species-rich regions (Pearson correlation coefficient r = 0.37; P = 0.00022) but not in species-poor regions (Pearson’s r = 0.06; P = 0.5; Extended Data Fig. 2b). The percentages of threatened species per region estimated using the total evidence approach were usually lower than those estimated only from recent published assessments, except for 11 species-poor and 29 species-rich regions (the latter including Sri Lanka, Vietnam, Sulawesi, New Guinea, Paraguay, Cuba, Brazil North, Mauritius, Democratic Republic of the Congo and Gabon), where the percentages of threatened species were 4 to 33 points higher with the total evidence approach (see red areas on map in Fig. 1c and Supplementary Table 4).

The percentage of threatened palm species per region varied between the different measures of diversity and between use categories (Fig. 1d and Supplementary Table 4). When considering species-rich regions only, the median regional percentage of threatened species among evolutionarily distinct species was higher than for functionally distinct and used species. Furthermore, median regional percentages of threatened species in the utensils, tools and construction and food categories were higher than in the culture and medicine categories (Fig. 1d). There was no such pattern in species-poor regions, as most had no threatened species, regardless of the diversity measure or use category considered (Fig. 1d). Despite these variations, regional percentages of threatened species among evolutionarily distinct, functionally distinct and used species were significantly positively correlated (Pearson’s r = 0.58–0.84 depending on the categories tested; P < 2.2×10−16; Extended Data Fig. 2b), reflecting the fact that 80% of threatened used species were evolutionarily and/or functionally distinct (Supplementary Table 5). However, variations between categories were still sufficient to result in different region rankings based on the percentage of threatened species in each category (Extended Data Fig. 3).

Priority regions for palm conservation and research

Regional variation in extinction risk among evolutionarily distinct, functionally distinct and used species suggested that basing conservation prioritization on a single diversity measure or use category may miss risks associated with other categories. To identify priority regions for palm conservation while accounting for this variation, we therefore scored regions on the basis of their proportion of threatened species among evolutionarily distinct and/or functionally distinct and/or used species (hereafter referred to as ‘species of interest’; Fig. 2a). Under the total evidence approach, there were 25 regions where ≥40% of species of interest were threatened, including ten species-rich regions: Madagascar (199 palm species in total), New Guinea (280 species), the Philippines (130 species), Hawaii (34 species), Borneo (291 species), Jamaica (12 species), Vietnam (113 species), Vanuatu (21 species), New Caledonia (43 species), Sulawesi (62 species; Fig. 2a,b and Supplementary Table 4). In addition to these ‘top-priority’ regions for conservation, 36 lower priority regions (including 25 species-rich regions) had 20–39% of their species of interest potentially threatened, while the remaining 164 regions (including 58 species-rich regions) had <20% of their species of interest potentially threatened (Fig. 2a,b and Supplementary Table 4). All species-rich regions identified as top priorities with the total evidence approach were also top priorities when considering only recent and/or all published assessments (Supplementary Table 4). However, the ranking of the regions differed between approaches, with New Guinea, Vietnam and Vanuatu appearing among the top ten priorities only when applying the total evidence approach (Extended Data Fig. 3).

Fig. 2: Priority regions for palm conservation and research.
figure 2

a, Regional percentages of threatened species among species of interest. Percentages are based on recent (that is ≤10 year) published assessments and ML predictions (total evidence approach). Species of interest are evolutionarily distinct and/or functionally distinct and/or used by humans. b, Relationship between regional lack of extinction risk information and level of extinction risk among species of interest. Regions identified here as conservation priorities have ≥40% of their species of interest that are threatened, while research priorities have ≥20 species without extinction risk information under the total evidence approach. Only priority regions with >10 species are named (details on other regions can be found in Supplementary Table 4). All regions with at least one palm species of interest are represented (n = 225). The association between the variables was measured using Pearson’s product moment correlation coefficient in a two-sided Pearson’s correlation test. r is Pearson’s correlation coefficient and P is the associated P value.

Source data

To explore if priority regions for conservation were also among the least well-studied regions, the number of species lacking extinction risk information under the total evidence approach was calculated for each region (Fig. 2b and Supplementary Table 4). This highlighted priorities for research that were not among the top priorities for conservation, namely Peninsular Malaysia, Cuba, Sumatra, Thailand and India, which all had ≥20 species without extinction risk information (Fig. 2b). There was a weak but significant positive correlation between the number of species lacking extinction risk information and the percentage of threatened species of interest (Pearson’s r = 0.25; P = 0.00014), highlighting that some priority regions for conservation were also priorities for research. This was the case of New Guinea, Borneo, the Philippines and Sulawesi (Fig. 2b).

Potential alternatives to threatened used species

To explore the resilience of palm uses, we calculated the number of species that could serve as a potential alternative for each threatened used species. A species was considered a potential alternative if it was from the same region and biome as the threatened used species and if it was significantly phylogenetically and/or functionally close to it (Methods and Discussion). Importantly, this exploration of the replaceability of used species only addresses their direct use by people, without necessarily capturing their other contributions or importance for ecosystem functioning. The global replaceability of threatened used species appeared high, with a median of 8 potential alternatives and 1–84 potential alternatives being identified for 91% of the species (Fig. 3a and Supplementary Table 5). However, 77 (42%) threatened used species had only up to 5 alternatives identified, including 16 (9%) that completely lacked alternatives (Fig. 3a). Species that could be substituted usually had alternatives in all their regions of occurrences, except for nine species (5%) that had alternatives in some but not all their regions of occurrence (Extended Data Figure 4a). Utensils, tools and construction had the highest number of threatened species for which no potential alternative could be found (11 species) but this category and culture appeared generally more resilient than food and medicine on the basis of median numbers of potential alternatives per threatened used species (Extended Data Figure 4b). The global potential of non-threatened species for serving as substitutes for threatened used species tended to be restricted, suggesting a lack of use redundancy among palm species. Indeed, median and maximal numbers of substituted species per non-threatened species were only 1 and 33, respectively, and 268 (32%) species could not qualify as potential substitute for any threatened used species (Fig. 3b). Species with the highest replaceability were among those with intermediate stem volumes and smallest fruits, mostly in subfamilies Ceroxyloideae and Calamoideae (dark red, Extended Data Figure 4c). As expected from an approach relying on phylogenetic and functional distances, species with high potential for serving as substitute tended to occupy the same area of the trait space as the most replaceable threatened used species and to cluster with them in the palm phylogeny (dark blue, Extended Data Figure 4c).

Fig. 3: Availability of potential alternatives for threatened used palm species.
figure 3

a, Global replaceability of threatened used species. The replaceability of a species is defined as the number of potential alternatives identified for that species. b, Global substitution potential of non-threatened species. The potential for substitution of a species is defined as the number of threatened used species that may be substituted by that species. c, Resilience of threatened uses across regions. Use resilience is assessed by looking at the median number of potential alternative species per threatened used species for the region and by looking at the percentage of threatened used species with at least one potential alternative in the region. Each dot is a region with at least one threatened used palm species. Numbers indicate the number of identical dots overlapping. Region names are indicated in regular font for regions with >5 threatened used species and in bold for species-rich (>10 species) regions identified as top priorities for conservation in this study (Fig. 2).

Source data

There were 92 regions with threatened used species and their median number of potential alternatives per threatened used species varied from 0 to 63 (Fig. 3c and Supplementary Table 4). There was a positive correlation between median number of alternatives and total number of palm species in the region (Pearson’s r = 0.84; P = 2.2 × 10−16; Extended Data Figure 5). Consistently, regions with the lowest median numbers of alternatives were spread across the tropics and subtropics, while regions with the highest median numbers of alternatives were concentrated in palm-rich areas such as South-East Asia and North-East South America (Fig. 3c). In two-thirds of the regions, potential alternatives could be identified for all threatened used species, while 30 regions (including 18 species-rich regions) lacked alternatives for some or all of their threatened used species. (Fig. 3c, Extended Data Fig. 5 and Supplementary Table 4). Among species-rich regions identified as conservation priorities in the previous section, most had a low number of threatened used species (<5) and low median numbers of alternatives (<20; Fig. 3c and Supplementary Table 4). The Philippines and Madagascar stood out as high-risk and low-resilience priority regions with 41 and 25 threatened used species and median numbers of alternatives of only 10 and 2, respectively. In contrast, Borneo had only 12 threatened used species but the highest resilience, while New Guinea appeared as a high-risk and intermediate-resilience region, with 25 threatened used species and a median number of alternatives of 19 (Fig. 3c).

Discussion

Our multidimensional assessment of the extinction risk faced by palms and their contributions to people shows that (1) over 1,000 palm species may be threatened with extinction, (2) most priority regions for palm conservation and research are in South-East Asia and the Pacific and (3) alternatives (based on morphological similarity and relatedness) may be available for most threatened used species, albeit with regional variations in use resilience.

ML predictions allowed us to obtain extinction risk information for almost three times more species than when using only published assessments. These ML predictions were essential to provide a global view of extinction risk that was less biased than when extrapolating from published assessments alone (Fig. 1a,c). Furthermore, ML allowed us to correct the bias of the Red List towards threatened species12, which for palms was most pronounced in the Americas. The greater magnitude of this bias when considering assessments >10 years old (Fig. 1a and Extended Data Fig. 2a) probably results from data accumulation and guidelines development over the years, which enable better manual assessments (and predictions) today than a decade ago57.

Policy-relevant, our results and data contribute to accelerating Red List assessments by enabling us to (1) rapidly Red List as Least Concern63 the species identified as non-threatened by our models and (2) prioritize the manual Red Listing of the 703 species listed as threatened by our most accurate model. Moreover, as further data and published assessments become available, our workflow can be re-used to update palm extinction risk predictions and to re-evaluate conservation priorities.

While ML extinction risk predictions have their limitations19,24,26, our tests suggest that the results presented in this study are robust to geographical or taxonomic biases in the ML training and test datasets and to the omission of extinction risk predictors (Methods, Supplementary Note and Supplementary Fig. 1). The main limitation in this study comes from the number of occurrence data points used. Indeed, our ML predictions revealed hundreds of likely threatened species, in some cases based on as few as one occurrence record (Supplementary Table 5). On the one hand, this lack of data may have biased predictions towards higher probabilities of being threatened. On the other hand, many such poorly documented species predicted to be threatened may be truly rare and/or threatened and should therefore at least be considered priorities for research (Supplementary Note and Supplementary Fig. 2). The collection and curation of additional palm occurrence data will be essential to further improve the accuracy of ML predictions and to publish assessments for these species. Meanwhile, our lists of potentially threatened species include information on the underpinning data (Supplementary Table 5), thereby paving the way for developing research and conservation strategies so that palms can continue to provide ecosystem services and underpin livelihoods.

Integrating extinction risk evidence with phylogenetic, functional and ethnobotanical data enabled us to identify global geographic priorities for palm conservation and research that account for multiple aspects of palm diversity (Fig. 2). Even when using the regional percentage rather than the number of threatened species of interest as a prioritization criterion, palm-rich regions such as New Guinea, Madagascar and Borneo emerge as priorities. This is probably due to a combination of factors including the high species diversity and small range size of many palms (and other plants)64 in these regions, combined with high pressures for land use change in some of them65,66,67. Although some of the priority regions we identified were already suspected to be important for palm research and/or conservation62, this study identified New Guinea, Vietnam and Vanuatu as newly emerging among the top ten priority regions and provides a much-needed update for palm research and conservation projects globally. Importantly, by focusing on percentages rather than numbers of threatened species of interest, we shed light on 15 species-poor regions in which ≥40% palm species of interest are probably threatened (Fig. 2 and Supplementary Table 4). Although palm species are not numerous in these regions, single palm species can be of high importance locally62 and our results will underpin further investigation about the threats they face in these regions.

Global and regional proportions of threatened used species were relatively low (Fig. 1). However, this remains concerning because many species and uses may be threatened locally even if they are not threatened globally61 and because many plant uses are not easily replaceable. For instance, species recorded as belonging to the same use category (for example, medicine) may have different and non-interchangeable uses (for example, for different ailments68). Even co-occurring species providing apparently identical contributions to people may not be interchangeable, for example, if they can be used at different times of the year69,70. On the other hand, many different species are used for similar purposes throughout the world54,71, suggesting that some threatened used species may be replaceable by others. Our approach to evaluate regional use resilience combines phylogenetic, functional and ecological information and specifically addresses replaceability. However, it will need to be benchmarked against data on the transferability of uses and used species between communities and between regions, notably in the presence of fine-scale ecological heterogeneity. This will allow to identify the most suitable phylogenetic and functional distance thresholds and to select additional factors for classifying species as potential alternatives. Such additional drivers of species potential for substitution may include cultural priorities and practices, knowledge exchange between communities and regions, the ecology of threatened species and their potential alternatives, and the resilience of the latter to climate change72. By excluding these factors, our estimates of used species replaceability are probably too optimistic, as suggested by our finding that 91% of threatened used species may have alternatives (Fig. 3a). Yet, about a third of the regions comprising threatened used species appear to lack alternatives for some or all of them, highlighting instances of potentially low use resilience even under such a conservative approach (Fig. 3c and Extended Data Figure 5). Furthermore, it is important to remember that many threatened used species are also functionally and/or evolutionarily distinct41, so their loss could negatively impact ecosystems and humanity, even if they were found to be replaceable in terms of their direct use by people. Mitigating or adapting to the loss of used species will be best achieved by their users themselves, so that local priorities and knowledge can be fully accounted for. Our list of candidate alternative species per region (Supplementary Table 6) set an optimistic baseline that can underpin community-led benchmarking and conservation actions, for example, to assess whether alternative species are already locally used or whether they are as useful as predicted.

On a broader scale, the methods outlined in this paper urgently need to be tested on other plant groups which also contribute to supporting ecosystem functioning and people’s well-being and livelihoods. This will enable the rapid identification of a greater diversity of species most at risk and the development of effective and culturally relevant conservation strategies to enhance ecosystem health and the sustainable use of plants globally.

Methods

Machine learning predictions of extinction risk

Species sampling and cleaning of spatial occurrence data

We aimed to sample all the 2,510 palm species recognized at the time of the study73. However, since predictors for species conservation status can be obtained more precisely from occurrence data than from species presence/absence records at the region level, our machine learning analyses only included species with at least one valid occurrence record. The few palm species known only from cultivation were kept in the dataset as they represent a negligible fraction of all species. Global spatial occurrence data for 7,469 palm names with a Global Biodiversity Information Facility (GBIF) key out of the 7,570 published palm names73 were sourced from GBIF (derived dataset GBIF.org https://doi.org/10.15468/dd.at82kf) using the R package rgbif v.0.9.9 (ref. 74). Another 14,169 occurrence data points were obtained from herbarium specimen records from the database of the Royal Botanic Gardens, Kew (UK) and 106 from the database of the Naturalis Biodiversity Center, Leiden (the Netherlands). Each occurrence point was assigned to one of the 2,510 accepted palm species names73, or discarded, depending on whether the name associated with the occurrence point in GBIF could be unambiguously matched to an accepted name. Occurrence records were cleaned on the basis of the GBIF coordinate issue flags and using the R package CoordinateCleaner v.1.0–7 (ref. 75). Obvious issues such as wrong coordinate signs were corrected, while coordinates falling into marine areas, cities, province or country centroids or biodiversity institutions were removed. Coordinates with zero values, an uncertainty >100 km, inconsistent with country assignment, falling outside the reported native distribution range of the species, considered extinct in the wild (both following Plants Of the World Online76) or recorded before 1945 (when the precision of geolocalization devices was poor) were also removed, following recommendations from the authors of CoordinateCleaner (https://ropensci.github.io/CoordinateCleaner/articles/Cleaning_GBIF_data_with_CoordinateCleaner.html). Duplicated occurrence records were omitted. In total, 1,820 species (72.5%) had at least one clean occurrence and could thus be used in the ML analyses (Extended Data Fig. 1). Cleaned occurrence data can be found in Supplementary Table 7. Additional R packages used for cleaning occurrence data points included devtools v.1.13.5, tidyverse v.1.3.1, countrycode v.1.1, maps v.3.3.0, maptools v.0.9-2, rworldmap v.1.3-6 and sp v.1.2-7 (refs. 77,78,79,80,81,82,83).

Choice of extinction risk predictors

The classification of species into different extinction risk categories in the Red List is based on population size, trends in population sizes (for example, loss of habitat or declines due to species exploitation), species range size (for example, restricted and fragmented) and habitat quality (for example, impact of pests and invasive species)57. Predictors providing information on a species’ range size, namely extent of occurrence (EOO) and area of occupancy (AOO), were selected as they are more readily available than those relating to declines or population size. Two additional range-based metrics, i.e. number of subpopulations and number of locations (definitions18 provided in Supplementary Table 8), were included as they can be useful predictors, even if not explicitly aligned with IUCN criteria. Coarse-scale distribution data have also been shown to be useful predictors of extinction risk for plants21, we therefore also included the number of level-3 botanical countries occupied. Level-3 botanical countries are biogeographical units defined by the World Geographical Scheme for Recording Plant Distributions84 to reflect political country boundaries while taking into account botanical tradition and botanical heterogeneity within and between political countries85. For simplicity, we refer to them as ‘regions’ hereafter and in the main text. Detailed data on habitat quality and species exploitation are rarely available in a form that can be used in automated extinction risk assessments. However, the influence of habitat quality on a species is likely to be less important if the species has a large ecological amplitude and forest species like most palms are more likely to be subject to population size decline if they occur in areas strongly impacted by humans and especially deforestation. Accordingly, we included eight further predictors in our analyses, that is climatic amplitude in terms of temperature, climatic amplitude in terms of precipitation, average temperature seasonality, average precipitation seasonality, number of ecoregions86 occupied (defined in Supplementary Table 8), human impact, human density and forest loss. All 13 predictors are described with their sources in Supplementary Table 8.

Species value calculations for each predictor

Species EOO and AOO were calculated from the occurrence data using the R package rCAT v.0.1.6 (ref. 87) or the R package ConR v.1.3 (ref. 18) when species only had two occurrence data points available. EOO could not be estimated for species with one occurrence point, while AOO could be calculated for all species analysed. Numbers of ecoregions and of regions occupied were obtained from the source datasets84,88,89 (Supplementary Table 8) using custom R scripts relying on the R packages sf v.0.9, doParallel v.1.0.16, foreach v.1.5, httr v.1.4.2, jsonlite v.1.7.2 and progress v.1.2.2 (refs. 81,90,91,92,93,94). The Human Footprint Index (used to estimate human impact), human population density, temperature seasonality, precipitation seasonality, minimal temperature of the coldest month, maximal temperature of the warmest month, average precipitation of the driest month and average precipitation of the wettest month were obtained from global raster layers95,96 (Supplementary Table 8) and derived at 10 min resolution to match the precision uncertainty in the coordinates of occurrence records, while reducing computational burden. For each species, we extracted predictor values at each occurrence location and averaged them to obtain one value per species. The temperature and precipitation data were used to calculate temperature amplitude (average minimal temperature of the coldest month/ average maximal temperature of the warmest month) and precipitation amplitude (average precipitation of driest month/ average precipitation of wettest month), respectively. These indices account for cases where seasonality is low at each occurrence point for a species but high when considering all occurrence records. For forest loss, values assigned to species represented the proportion of species occurrence points found in areas that experienced forest loss between 2001 and 201897. The predictors were rescaled to range between 0 and 1 to improve the estimation of the models’ parameters. All data geoprocessing was performed using the R packages raster v.3.1-5, gdalUtils v.2.0.3.2, rgdal v.1.5-8, maps v.3.3.0, dplyr v.1.0.7, tidyr v.1.1.4, tidyverse v.1.3.1 and plyr v.1.8.6 (refs. 80,82,98,99,100,101,102,103).

Delimitation of training and test subsets

We obtained all available extinction risk assessments for palms first from the Red List104 and then from the global assessments collated in the ThreatSearch database of Botanic Gardens Conservation International105 when no assessment was available in the Red List. ThreatSearch assessments were only considered global if they were specified as such (assessments of unknown scope were not considered to be global). Species with assessments made before 2008 were considered unassessed to ensure an up-to-date training set and ‘data deficient’ (DD) species were also considered unassessed. Among the 1,820 species with occurrence data, 439 assessed species had a non-DD extinction risk assessment from 2008 or later (321 from the Red List and 118 from ThreatSearch) and were available to train or test (see below) the models used to predict the extinction risk of the remaining 1,381 unassessed species with occurrence data. However, assessed species were heavily biased towards the genus Dypsis from Madagascar60. We therefore randomly subsampled Dypsis species to balance their proportion in the assessed group with their proportion in the unassessed group, resulting in only two Dypsis species left in the assessed group. After removing most Dypsis, the taxonomic and geographic representation of the assessed and unassessed groups became more similar, although a small degree of geographic imbalance remained (Supplementary Fig. 2). The resulting group of 300 assessed species comprised 130 ‘non-threatened’ species and 170 ‘threatened’ species. The non-threatened category included species classified as ‘least concern’ (LC) in the Red List or as not threatened in ThreatSearch104,105. The threatened category included species classified in the Red List as critically endangered (CR), endangered (EN), vulnerable (VU) or near threatened (NT) or in ThreatSearch as threatened, near threatened or possibly threatened (Supplementary Table 5). These steps were performed using the R packages rredlist v.0.7.0, stringr v.1.4.0 and stringi v.1.7.5 (refs. 106,107,108).

To evaluate model performance, we divided this dataset of 300 representative assessed species into a training set comprising 225 species (75%) and a test set comprising the 75 (25%) remaining species. The training set was used for model parameterization, while the test set was only used to assess model performance, independent of the model parameterization process. To increase its representativeness, the test set was built iteratively by first randomly choosing 15 assessed species and adding 60 assessed species sequentially so that each added species would be as dissimilar as possible to the species already present in the test set, based on their extinction risk predictor values (Supplementary Table 5). This was done using the function maxDissim from the R package caret v.6.0 (ref. 109) with the default settings after preliminary tests indicated no effect of changing these settings on the representativeness of the test set. Details of the datasets are provided in Extended Data Fig. 1.

Addressing missing or biased data and correlated predictors

Three extinction risk predictors had missing values for some species: EOO (390 species), human footprint (4 species) and forest loss (9 species). The knnImputation function of the R package DMwR v.0.4.1 (ref. 110) was used to fill missing values by averaging the values of the species’ five nearest neighbours in terms of extinction risk predictors. Extinction risk predictors showed various degrees of correlation between each other (predictor redundancy), so ML analyses were run once without considering correlation among predictors and once after removing the predictors with a correlation coefficient >0.75 by using the ‘cutoff’ option of the preProcess function in the caret package109.

To account for the imbalance of the extinction risk categories in the training set, each ML analysis was performed first without any resampling and then repeated once with downsampling of the majority extinction risk category (down), once with upsampling of the minority category (up) and once with a method synthesizing new data for the minority category using the synthetic minority oversampling technique (smote)111.

Representation biases in the training and test sets were evaluated by plotting histograms of each extinction risk predictor for each dataset (unassessed, training and test) and calculating their degree of intersection with the function intersect.dist of the R package HistogramTools112. All intersections were high (≥0.81 and ≥0.95 for the intersection between the unassessed and test data) but visual inspection of the histogram overlaps revealed that human footprint, human population density and the four temperature and precipitation predictors had some parts of their distributions under-represented in both the training and test sets (Supplementary Fig. 2). We therefore trained ML models either including or excluding these predictors.

Taken together, these sensitivity analyses represented 16 different models for a given ML method (2 (with/without strongly correlated predictors) × 4 (resampling strategy) × 2 (with/without predictors with representation biases)). We combined these with three different ML methods (see below), giving a total of 48 models. A summary of the approach is provided in Extended Data Fig. 1. In addition to the above-cited R packages, evaluating and visualizing the representativeness, imbalance and redundancy of the datasets also relied on lattice v.0.20-40, ggplot2 v.3.3, gridExtra v.2.3, mosaic v.1.6.0, proxy v.0.4-23, plyr v.1.8.6, scales v1.1 and UBL v.0.0.6 (refs. 98,113,114,115,116,117,118,119).

Choice of machine learning method and model tuning

We first used a random forest algorithm to build the predictive models using the R package randomForest120 through implementation in the R package caret109. This method is hereafter referred to as the ‘RF method’. We fitted each model on the training set using ten repeats of tenfold cross-validations and calculated the average Kappa121 for the model on the basis of the training data. This was repeated many times to tune the ‘mtry’ parameter to find out how many extinction risk predictors used to split a node in the tree gave the best Kappa for a given model. Kappa was chosen because it has previously been shown to perform better than other metrics for unbalanced datasets (definition below). All entire numbers between one and the number of independent extinction risk predictors available were tested (up to 13, depending on whether predictors with shifted representation or high correlation were included; see above).

During training and testing of the models, binary classification of the species into threatened and non-threatened was based on their estimated probability of being threatened, with a probability threshold of 0.5 between classes. However, this approach may be problematic with unbalanced training sets because the probabilities may be skewed towards the most represented class. For each trained random forest model, we therefore also estimated what probability threshold allowed us to maximize both specificity and sensitivity (see below). This method is hereafter referred to as the ‘RFt method’. This was done by modifying the original RF method so that the class probability threshold was considered as a parameter to tune (among 20 ranging regularly from 0 to 1), in addition to mtry. The parameters were tuned by looking for the mtry and threshold combination conferring both the highest specificity and sensitivity and comparing them to the theoretical perfect specificity and sensitivity values of 1. This additional performance indicator was called distance to perfect model (DPM, below). This approach was adapted from an example described in the caret manual (section 13.8 of https://topepo.github.io/caret/index.html).

To compare very different methods, we also trained a neural network with a single internal layer using the same cross-validation approach as for the random forest. This method is hereafter referred to as the ‘NN method’. The number of neurons in the internal layer (size parameter) and the weight decay were tuned by finding the size and decay values maximizing Kappa, among the following values—size: 1, 2, 3, 5, 7, 9, 11, 13; decay: 0.005, 0.01, 0.05, 0.1, 0.5, 1, 2, 5, 8, 10. This was performed using the R package nnet v.7.3-13 (ref. 122) through implementation in caret. Each network was run for 1,000 iterations, which allowed all runs to reach convergence.

All 48 random forest and neural network models are listed in Supplementary Table 1, together with their dataset and parameter specifications. Additional packages required to perform the model training included gdata v.2.18.0, ranger v.0.12.1, e1071 v.1.7-3 and RANN v.2.6.1 (refs. 123,124,125,126).

Model performance and prediction choice

To select the model to use for our final analyses, the performance of the 48 models was assessed on the test set by using each model to predict the extinction risk of the test species, comparing the predictions to the published (observed) extinction risks and calculating nine performance indicators: (1) area under the receiver operating characteristic curve (AUC)—how well a model distinguishes threatened species from non-threatened ones (ranges between 0 and 1, with 1 indicating perfect discriminatory power); (2) sensitivity, the percentage of threatened species correctly classified; (3) specificity, the percentage of non-threatened species correctly classified; (4) DPM, measuring how far the model is from a perfect classification where sensitivity = specificity = 1; (5) accuracy, the percentage of correct classifications; (6) balanced accuracy, calculated by averaging sensitivity and specificity; (7) Cohen’s Kappa coefficient, comparing the accuracy of the model relative to that of a random classification based only on class frequencies (ranges between −1 and 1, with 1 indicating a perfect classifier); (8) precision for the positive class (hereafter ‘precision’)—the percentage of species predicted to be threatened being indeed threatened; and (9) precision for the negative class (hereafter ‘negative precision’)—calculating the percentage of species predicted as non-threatened being indeed non-threatened. In addition, we also attributed a weight to each model on the basis of its balanced accuracy and binarily coded predictions as 0 when the species was predicted to be non-threatened and 1 when it was predicted to be threatened. We then estimated a weighted average binary prediction across models and estimated the performance of this averaging method using the above indicators.

Performance indicator values for all models are provided in Supplementary Table 1 and show that, although all three methods (RF, RFt and NN) performed well, RF and RFt models performed more similarly to each other than to NN models, regardless of the performance indicator considered. No model scored the highest for all indicators, reflecting trade-offs between sensitivity and specificity. Within a given ML method, performance was more affected by resampling strategy and extinction risk predictor representativeness than by predictor redundancy due to the existence of correlations between predictors (Supplementary Table 1). The model with the highest balanced accuracy (82%) was used to predict the extinction risk of the 1,381 unassessed species. This model was a random forest with optimized class probability threshold and an upsampling strategy, from which correlated and shifted predictors were removed (Supplementary Table 1). Details of the limitations of the different models tested and their robustness to data imbalances or shortages are provided in the Supplementary Note.

Drivers of wrong predictions and predictor importance

We used the framework of Shapley additive explanations (SHAP)127 to interrogate the behaviour of our selected model. We calculated SHAPs for all test set predictions made by the model, using the implementation in the R package fastshap v.0.0.7 (ref. 128). We used these SHAPs to generate explanations for overall model behaviour by calculating predictor importance as the mean absolute SHAP value of each predictor across all test set predictions. We also visualized the distribution of SHAP values to compare the partial dependence of test set predictions on each extinction risk predictor. We used explanations of individual predictions to highlight the prediction pathway of species that the model predicted incorrectly. For comparison, we also calculated the predictor importance with vip v.0.3.2 (ref. 129) by randomly shuffling the values of each predictor in turn and calculating the resulting decrease in accuracy of the test set predictions. We repeated this process 1,000 times per predictor and reported the importance as the mean decrease in accuracy. Although the permutation-based importance should be consistent with the SHAP-based importance, permutation-based predictor importance gives an indication of each predictor’s contribution to the accuracy of a model, while SHAP-based importance indicates the average contribution of each predictor to the predicted values themselves. The results and implications of these analyses are provided in Supplementary Note and Supplementary Fig. 1. Additional R packages used to visualize model performance and behaviour included glm2 v.1.2.1, reshape2 v.1.4.4, pROC v.1.18.0, ggplot2 v.3.3, gplots v.3.0.3, here v.1.0.1, readr v.1.3.1, readxl v.1.3.1, purrr v.0.3.4, ggforce v.0.3.1, patchwork v.1.0.0, glue v.1.3.1, writexl v.1.2, scales v.1.1, plyr v.1.8.6, dplyr v.1.0.7, stringr v.1.4.0, tidyr v.1.1.4, ggfortify v.0.4.8, randomForest v.4.6-14 and ggpubr v.0.2.5 (refs. 98,99,100,107,114,118,130,131,132,133,134,135,136,137,138,139,140,141,142,143,144).

Identification of conservation priorities

Species use and evolutionary and functional distinctiveness

We classified palm species as being ‘of interest’ (Results) on the basis of trait data retrieved from PalmTraits49 and of use data retrieved from the World Checklist of Useful Plant Species54 and the literature37,145,146. A species was considered of interest if it was used by humans and/or had a higher evolutionary and/or functional distinctiveness than the median distinctiveness. Species were considered to be used by humans if they had at least one use recorded among the following categories: ‘animal food’, ‘culture’, ‘environmental’, ‘fuel’, ‘gene sources’, ‘food’, ‘medicine’, ‘toxic’, ‘utensils, tools and construction’ and ‘other uses’ (category matching between the different data sources is described in Supplementary Table 9). This category scheme was adopted because it generally follows a widely used classification147 for palm uses while accounting for the fact that the utensils and tools and construction categories used in that classification were merged in some of our data sources (Supplementary Table 9).

To calculate species evolutionary distinctiveness27 (ED) the ed.calc function of the R package caper v.1.0.1 (ref. 148) was applied to 750 species-level palm-wide phylogenetic trees. These trees were obtained from a study56 presenting multiple Bayesian phylogenetic analyses of all palm species described at the time relying on a compilation of morphological, genetic and taxonomic data. The trees we used correspond to the post-burnin posterior distribution of trees generated by their ‘unconstrained’ analysis with a taxonomic backbone following modifications by ref. 59. The ED value of each species was then obtained by averaging the values obtained for that species in the 750 trees. Ninety-three species recognized by our taxonomic backbone were absent from the trees —even after using the World Checklist of Selected Plant Families73 to identify synonyms—and therefore lacked ED values. A few species from the original trees were considered synonyms in our taxonomic backbone, so their ED values were averaged (this happened 38 times, involving mostly two but up to seven synonyms). While these two problems could bias the ED values of the species remaining in the tree, this bias is likely to be limited because species lacking ED values were spread throughout the phylogenetic trees and represented only 3.7% of the total diversity of the family. We therefore consider that our ED calculations are of sufficient precision to be used for our purpose, which relies on splitting ED values into two categories (higher versus lower than the median) rather than relying on exact ED values.

To calculate functional distinctiveness28 (sensu ref. 28 (p. 1,366) ‘average functional distance of a species to the other species in the community’, with the community here being the global palm species diversity) the ‘distinctiveness’ function of the R package funrar28 v.1.4.0 was used with the Gower distance option. The traits extracted from the PalmTraits49 database comprised: maximum stem height, maximum stem diameter, maximum leaf number, maximum blade length, average fruit length, average fruit width, presence of spines, growth form coded as all possible combinations between climbing and/or acaulescent and/or erect, stem habit coded as all possible combinations between solitary and/or clustering and vegetation stratum coded as all possible combinations between understorey and/or canopy. All continuous trait variables were log-transformed and rescaled to range between 0 and 1. There were 85 species with no trait data at all and 435 species lacking more than three traits. When excluding the former 85 species, the trait data had 21% of missing values. These missing data were estimated by averaging the values from their 11 nearest neighbours in the trait space using the kNN and weightedMean functions of the R packages VIM v.6.1.1 and laeken v.0.5.2 (refs. 149,150). For categorical traits, missing values were obtained by taking the most represented category among the 11 neighbours. Traits for the 85 species with only missing values were not imputed and these species were therefore not considered to be functionally distinct.

Proportions of threatened species and priority regions

Four datasets were produced to explore the influence of using ML predictions on estimates of the proportions of threatened species and on conservation prioritization (Extended Data Fig. 1). The first dataset (all published) only contained extinction risk information for species with published assessments on the Red List or on ThreatSearch as described above. The second dataset (recently published) was a subset of the all published dataset including only extinction risk information for species with assessments published from 2008 onwards, thereby following the IUCN guidelines that >10-year-old assessments need updating16 (https://www.iucnredlist.org/assessment/process accessed 16 September 2021). The third dataset (total evidence) combined the extinction risk information from the recently published dataset with the most accurate ML extinction risk predictions for the species with spatial occurrence data that were either unassessed, assessed as DD or with old assessments. A fourth dataset was produced by combining the ‘total evidence’ dataset with information on the extinction risk of an additional 521 species for which we did not have spatial occurrence data or recent published assessments (Extended Data Fig. 1). This information was gathered from old published assessments for 296 of species or produced as a quick ‘expert guess’ based on our palm expertise for the remaining 225 species. This dataset was only used as an attempt to provide an even more complete view of palms at risk accounting for very poorly documented taxa (Supplementary Note). To remain conservative, this last dataset was not used in the prioritization or use resilience analyses and proportions of threatened species estimated from it are not discussed in the main text. For each dataset, the proportions of threatened species in the evolutionarily distinct, functionally distinct and used species (distinguishing four use categories; Results) were calculated for the world and separately for the America (longitude  (−180°, −25°)), Africa/ West Asia (longitude  (−25°, 68°)) and East Asia/ Pacific (longitude  (68°, 180°)) regions. This splitting scheme was chosen because there are no palm species that have native ranges spanning more than one of these areas76. The same analysis was then performed at the region level (for each botanical country84, see above) on the all published, recently published and total evidence datasets.

In addition to separating proportions of threatened species among evolutionarily distinct, functionally distinct or used species, we also calculated proportions of threatened species among species fitting in at least one of these categories (species of interest). This was calculated for each region and used as a criterion for conservation prioritization (regions with higher proportions of threatened species among species of interest were considered higher priorities). Using proportions instead of number of threatened species of interest allowed us to account for regions where few but valuable species may be threatened and it did not prevent species-rich regions from being captured in the top priorities (Results). However, for comparison, we repeated the ranking with only regions comprising more than ten palm species. These analyses were performed on the all published, recently published and total evidence datasets, to assess if priority regions changed when using ML predictions compared to published assessments only. To estimate the robustness of the results to the fact that some species had their EOO imputed (previous section), proportions of threatened species and region ranks were also inferred on the basis of the total evidence dataset after excluding species that had their EOO imputed. The results are presented in Supplementary Tables 3 and 4 and compared to the main results in the Supplementary Note and Supplementary Fig. 3.

Potential alternatives for threatened used species

Rationale

Traits and genes underpin plant uses to a certain extent37, so data on plant morphology and phylogenetic placement may help to predict if a species could be used for the same use as another species. Although phylogenetic signal in plant use may vary from non-existent to very strong depending on the uses or the taxa considered, accumulating evidence suggests that plant uses are phylogenetically clustered to some extent38,39,40,41,42,151, including in the palm family37. Our data corroborated these findings, as seven out of ten use categories, including the four most represented (food, utensils, tools and construction, medicine and culture), showed some degree of phylogenetic signal based on Fritz and Purvis’s D statistic tests152. The tests were applied to the 750 above-mentioned palm phylogenetic trees obtained from ref. 56 and to a maximum clade credibility consensus of these trees (Supplementary Data) obtained with Tree Annotator (part of BEAST v.1.10.2; ref. 153) and results are provided in Supplementary Tables 9 and 10. The relationship between plant uses and functional traits is more difficult to characterize but previous work on palms has shown that some uses are correlated to morphological traits37.

As a first attempt to explore how these theoretical and empirical relationships can help in predicting regional use resilience, we used functional trait data, phylogenetic information and ethnobotanical information to identify non-threatened species that may be suitable substitutes for a threatened used species occurring in the same region. To account for the fact that species may not be easily moved across different ecological settings, we further restricted the search for potential alternatives to species occurring in the same biome as the threatened used species to replace. Species were therefore considered as potential alternatives only if they fulfilled the following four conditions: (1) being assessed or predicted as non-threatened, (2) occurring in the same region as the species to replace, (3) being known to occur in at least one of the biomes occupied by the species to replace and (4) being significantly close to the species to replace in terms of phylogenetic and/or functional distance. Being significantly phylogenetically or functionally close in this context was understood as having a phylogenetic or functional distance between the species that was inferior or equal to the median of the pairwise distances between species in the considered biome of the considered region minus the standard deviation of these distances. We chose to use local thresholds defined for each region and biome combination for two reasons. First, they allow to account for the fact that species from one biome may not easily be found or grown in another biome. Second, using local phylogenetic and functional distance thresholds instead of global thresholds better reflects the reality of communities who need to find alternatives among the species that are available in their region/biome and who may therefore choose an alternative on the basis of its similarity to the species to replace, regardless of if more similar alternatives occur elsewhere. The search for alternatives was done once using phylogenetic distances and once using functional distances and the union of both lists of alternatives thereby identified was used because it provided a more conservative (optimistic) view of region use resilience and species replaceability across uses. Moreover, more stringent thresholds than the median distance minus standard deviation could be experimented and would result in smaller lists of potential alternative species and lower regional use resilience estimates. We chose to use a threshold providing a relatively optimistic view to obtain a baseline from which more pessimistic scenarios can be envisioned and because larger lists of potential alternatives may facilitate the identification of realistic alternatives at the community level. To estimate the robustness of these lists to the fact that some species had their EOO imputed (see above), the search for potential alternatives, estimations of species replaceability and regional use resilience analyses were also performed based on the total evidence dataset after excluding species that had their EOO imputed. The results are presented in Supplementary Tables 4 and 5 and compared to the main results in the Supplementary Note.

Data used to search for potential alternatives

Phylogenetic distances were calculated as the averaged sum of the branch lengths linking each pair of species in a sample of 100 trees chosen randomly from the above-mentioned 750 palm phylogenetic trees from ref. 56. This was done with the function ‘cophenetic.phylo’ from the R package ape v.5.0 (ref. 154) and the trees sampled are listed in the scripts (Data availability). For species not in the tree, the distance was obtained as the median distance between congeneric species. Only two species not in the tree and with no congeneric species did not have distances and had to be excluded from the calculations: Sabinaria magnifica Galeano & R.Bernal and Wallaceodoxa raja-ampat Heatubun & W.J.Baker. Functional distances were calculated on the basis of three traits that were found to be associated with palm use in a previous study: maximum leaf (blade) length, fruit volume and stem volume (π × r × r × h; with r being the stem radius, derived from maximum stem diameter and h being the maximum stem height)37. The traits were obtained from the trait data matrix with imputed missing values used above for the calculation of functional distinctiveness values and the function compute_dist_matrix from the R package funrar v.1.4.1 (ref. 28) was used to calculate the distance between species in this three-dimensional trait space. The above-mentioned 85 species that could not have their traits imputed were then added to the distance matrix and the distances between these species and the rest of the species were obtained as the median distances between congeneric species. Phylogenetic and functional distances were log-transformed and rescaled to range between 0 and 1. The biome(s) occupied by each species was obtained from the World Checklist of Vascular Plants155 accessed in February 2021 and consisted in a categorical variable with six categories (‘desert or dry shrubland’, ‘montane tropical’, ‘seasonally dry tropical’, ‘subtropical’, ‘temperate’ and ‘wet tropical’). Two species (Acoelorrhaphe wrightii and Dypsis declivium) lacked biome data and had to be excluded from the analysis. Trait and biome data are provided in Supplementary Table 5.

Maps and graphs presenting all above results were obtained using the R packages ggplotify v.0.0.4, aplot v.0.0.3, stringr v.1.4.0, ggrepel v.0.8.1, GGally v.1.4.0, hrbrthemes v.0.6.0, rgdal v.1.5-8, scales v.1.1, reshape2 v.1.4.4, dplyr v.1.0.7, gridExtra v.2.3, ggpubr v.0.2.5, ggstance v.0.3.3, ggtree v.2.4.1, hash v.2.2.6.1, ggplot2 v.3.3 and plyr v.1.8.6 (refs. 98,100,101,107,115,118,131,144,156,157,158,159,160,161,162,163,164). All analyses were performed with R v.4.0.2 in RStudio165,166.

Limitations and guidelines for interpreting the results

Traits and genes are not the only drivers of species use, so we may have over- or under-estimated the availability of potential alternatives in some cases. Ideally, the search for alternatives should consider cultural practices and preferences but these data are currently not available at a global scale. In addition, the occurrence of the species to replace and its potential substitute in different subregions or different ecological conditions could be an obstacle to their interchangeability. If these differences were not captured by the use of biome data, the availability of potential alternative species may have been overestimated. Another concern may be that we do not know if a species reported to be used somewhere is (or could be) used in the same way throughout its distribution range. However, to our knowledge, there is no reason to assume that a species could not be used in a certain way in a region just because its use there is not yet known. In fact, communities are constantly experimenting with new species, as evidenced by the widespread use of non-native species in local pharmacopoeias71 and by a higher likelihood of naturalization in plants with economic value39. Finally, there may be cases where extremely functionally and phylogenetically distant species could successfully be used as substitutes of each other in some regions. These will be missed by the search for alternative species as implemented here, so our results should not be interpreted as evidence that species that were not identified as potential alternatives cannot be useful. Overall, our results illustrate a potential for replacement that will have to be ground checked and discussed with communities. Until then, they remain useful as an (optimistic) estimate of regional resilience of palm uses.

Reporting summary

Further information on research design is available in the Nature Research Reporting Summary linked to this article.