Introduction

Tropical mesic savannas—found at intermediate rainfall levels—are predominantly characterized by a mix of continuous layers of C4 grasses and varying densities of fire-tolerant and shade-intolerant tree species with open tree crowns1. Conceptually, resource-based drivers (climate and soil) and disturbance based drivers (fire and herbivory) are known to regulate tree–grass dynamics in savannas2,3,4,5. However, the relative influence of resource-based and disturbance-based drivers across the savanna biome has been heavily debated6,7, with limited insights from south and southeast Asia. In India, multiple lines of recent evidence, including both paleo ecological data and recent discoveries of endemic species suggest that many of its regions are indeed ancient savanna ecosystems8,9,10,11. However, even while India’s savanna biome potentially covers a vast geography8, the distribution and drivers of tree–grass ratios within the biome are yet to be examined. This information is crucial in the context of tree planting initiatives (considering economic and restoration purposes) for climate change mitigation and issues such as woody encroachment that tend to have negative consequences on the savanna biome12,13,14,15,16.

South and southeast Asian tropical savannas have historically been underrecognized8,17. Many regions of mesic savannas in India have physiognomies that are woody enough to be mistaken for closed canopy forests8. Colonial practices of managing land for forestry18 have resulted in a legacy of misconception of these wooded ecosystems, such that areas with low and medium tree cover are considered degraded forests19 (relative to high tree cover in closed canopy forests). Consequently, many Indian savannas (degraded forests) have become easy targets for afforestation20 to meet India’s ambitious tree planting targets21. Hence, appropriate ecosystem restoration planning that goes beyond tree planting requires information about the distribution of these savannas. Furthermore, a better understanding of the functional ecologies of the tree and grass components in savannas can help to tailor strategies for conservation and restoration of this biome.

Tree cover is commonly used to differentiate between closed canopy forests and open canopy savannas, accounting for biome-specific vegetation composition and dynamics3,22,23. The distinct dichotomy of low tree cover in savannas and high tree cover in forests has been found in Africa, South America and Australia3,23,24,25. A similar pattern has also been confirmed in southeast Asia, where multimodality of tree cover is scale dependent; i.e. distinct at the landscape scale compared to the regional scale17. However, uncertainty remains about the extent and patterns of tree cover across forests and savanna biomes in south Asia, particularly India.

Broadly, resource and disturbance-based drivers regulate the outcomes of tree–grass competition in savannas. From the macroecological and biogeographic perspectives, climate, specifically rainfall and its seasonal distribution, is a key determinant of tree cover3,26. Evidence from global studies and from African savannas, suggests that climatic drivers constrain the maximum potential tree cover by regulating tree growth and mortality27,28,29. Savannas (low tree cover) tend to be found in areas with low rainfall and/or high rainfall seasonality23,26, implying that water use-related physiological processes control tree cover. Furthermore, soil texture, which mediates soil moisture content30, also shapes tree- grass dynamics31. In sandy soils, water percolates to lower depths, favouring deep- rooted trees, while clayey soils that hold moisture below the surface, favour shallow rooted grasses32,33. However, at the global scale, there is limited insight into the relationship between severity of water stress and distribution of tree cover in savannas34, with no insights from India.

Fire and herbivory are also key regulators of tree cover, and maintain savannas by preventing complete canopy closure35,36. Typically, fires are extensive in the tropical savanna biome due to the seasonal environment and abundance of highly flammable C4 grasses1. Fire, when frequent, can regulate tree cover by preventing tree saplings from escaping the ‘fire trap’37. Conversely, fire suppression has the potential to release seedlings from this trap and increase tree cover. Herbivory can also be an influential factor: wild grazing can increase tree cover by suppressing grasses, potentially leading to closed canopy forests. Conversely, wild browsing and mixed feeding may suppress tree sapling growth, thereby limiting tree recruitment and resulting in relatively open savannas38,39,40. This is also likely the case with domestic livestock, the dominant herbivores in most tropical open ecosystems comprising savannas, grasslands and woodlands41. For example, experimental evidence from African savannas in Kenya show that moderate cattle grazing does not have any effect on tree sapling recruitment, but intense levels of grazing increased tree sapling growth42,43. Woody biomass density declined under sustained goat browsing evidenced in multiple experiments in southern Africa44,45. However, despite its long history of fire suppression46 and high densities of domestic livestock, there have been few investigations on the role of fire and domestic livestock herbivory in driving tree cover in Indian savannas.

In this study, we examine the drivers of and distribution of savannas and forests across India, using the metric of tree cover. To the best of our knowledge, the role of these drivers in determining the distribution of forests and savannas across India is not established. This information is crucial, given the interesting historical and contemporary context of fire suppression46 and ongoing pressures of tree planting in non-forest biomes for climate change mitigation19. Furthermore, we advance previous analyses globally and in other regions by incorporating the role of herbivory, specifically domestic livestock, which traditionally has not been included in remote sensing-based analyses of tree cover due to lack of data3,17,23.

First, we plotted remotely sensed current tree cover and site data of Indian savanna vegetation against maximum climate water deficit (MCWD), a measure of severity of water stress. We hypothesized that there will be multimodality of tree cover across the water stress gradient as shown globally3,23. Second, we tested the hypothesis that MCWD regulates tree cover such that tree cover increases as MCWD decreases, using a quantile-based generalized additive model. Then, we quantified the shortfall of the current tree cover from the climatic maximum potential. Third, using generalized additive models, we assessed the relationship between this shortfall in tree cover and remotely sensed information about resource and disturbance-based drivers discussed above and drivers of topography and anthropogenic pressure. Broadly, we hypothesized that shortfall decreases in sandy soils and with grazing herbivory, while it increases with frequent and intense fires. Finally, we explored the relationship between shortfall and topography and anthropogenic pressure because both these factors are known to locally influence tree cover.

Results

Distribution of current tree cover against MCWD

We estimated the current tree cover as the mean annual tree cover (2000–2020) across the study area (Fig. S1) using the MOD44B VCF product at 250 m spatial resolution47. We calculated climatic information including MCWD using the TerraClimate monthly climate and climatic water balance product48 (see Supplementary Methods and Table S3 for details). For a stratified sample of points, percent tree cover across a gradient of MCWD showed an upper limit that increased with decreasing MCWD, with sites at any given MCWD showing the entire range of tree cover below this upper limit (Fig. 1). Contrary to patterns reported from other tropical regions3,23, tree cover in regions with intermediate rainfall did not have two distinct peaks of high tree cover with low tree cover between the two peaks (Figs. 1, S2 and S4 (percent tree cover–mean annual precipitation (MAP) plot)). Using a non-hierarchical clustering analysis and in combination with the percent tree cover at the locations where we know savanna vegetation occurs9, we delineated four zones of tree cover: (1) a low tree cover zone (MCWD < 1209 mm) in which median percent tree cover is ~0.7% (uncoloured); (2) a medium tree cover zone (MCWD 845–1209 mm) with median percent tree cover up to ~9.4% (light mustard) (3) a mixed tree cover zone (MCWD 486–845 mm) with a wide range of percent tree cover ~0–78% (median percent tree cover ~40%) (lavender); (4) a predominant high tree cover zone (MCWD > 486 mm) with maximum percent tree cover ~82% (median percent tree cover ~57%) (light green). The locations of the savanna vegetation were mainly in the medium and mixed tree cover zones (Fig. 1). According to the Köppen classification49, the low and high tree cover zones fall into the dry arid desert and tropical monsoon climate zones, respectively. The two zones of medium and mixed tree cover are found in the dry winter tropical savanna and dry semi-arid or steppe climate zones, respectively. The use of the site location of savanna vegetation helped us overcome the challenge of uncertainty of the MOD44B VCF product in delineating forests and savannas50,51.

Fig. 1: Percent tree cover across a gradient of maximum climate water deficit (MCWD) (black dots, n = 8554/9600, i.e 89.1% of total sampled data).
figure 1

Percent tree cover at locations of savanna vegetation9 are shown as bright orange diamonds. There are four tree cover zones—a low tree cover zone beyond 1209 mm deficit (uncoloured), a medium tree cover zone 845–1209 mm deficit (light mustard), a mixed tree cover zone 486–845 mm deficit (lavender) and a high tree cover zone 0–486 mm deficit (light green). Inset images, clockwise starting top left are examples of the landscape at a sampled point in the medium and mixed tree cover zones- MCWD 912 mm and one observed fire, MCWD 680 mm and no fire, MCWD 887 mm and no fire and MCWD 1059 mm and 7 observed fires. See Fig. S5 for geography of the coloured zones and Fig. S7 for information about the distribution of all drivers across the points in the coloured zones. All inset images are courtesy of Google Earth Image © 2021 Airbus.

We used Google imagery to better understand the reasons for the wide range of tree cover in the mixed tree cover zone, which contains evidence of savanna herbaceous vegetation. We found distinct vegetation patterns along topographical features of ridges and valleys. Also, the presence of paths through the vegetation implies anthropogenic disturbances (Fig. 1). Hence, we included topographical indices of Heat Load Index (HLI) and Topographic Position Index (TPI) and human population density as additional drivers of tree cover. HLI and TPI characterize the land surface in terms of the steepness of slope of the surface. HLI has a range of 0 to 1, where values close to 0 indicate north-easterly cool regions and values close to 1 indicate south-westerly warm regions. TPI has a range of −1 to 1, where low values represent sheltered parts and high values indicate exposed areas of the landscape. Hence, both indices are indicative of direction of incident solar radiation and land drainage52. We used human population density as a proxy for anthropogenic pressures.

In terms of geography, the low tree cover zone stretched across north-western India, including the desert areas of the states of Rajasthan and Gujarat in western India (MAP range: 112.8–1095 mm). The high tree cover zone occurred in the north-eastern region of India and the Malabar coast in the south-western states of Karnataka and Kerala (Fig. S5c; MAP range: 1100–4893 mm). The medium tree cover zone with savanna understorey vegetation stretched across the Deccan plateau, the Central Highlands and the western semi-arid regions of Gujarat, Rajasthan and north-western Madhya Pradesh (Fig. S5a; MAP range: 152.6–3008 mm). The mixed tree cover zone with savanna understorey vegetation mainly separated into two regions: inland from the east coast (northern Coromandel coast) in the states of Odisha, Jharkhand and Chhattisgarh and along the western coast, parallel to the central Western Ghats (Fig. S5b; MAP range: 566–3771 mm).

Estimation of climatic maximum potential tree cover and potential tree cover shortfall

We hypothesized a clear ceiling for the maximum percent tree cover across the four zones of tree cover, implying the influence of MCWD on the maximum current tree cover possible (Fig. 1). We tested this hypothesis by estimating the climatic maximum potential tree cover as the 95th percentile of the predicted current tree cover using the 95th percentile smooth additive quantile regression model between percent tree cover and MCWD using the stratified sample points (n = 8554 show as black points in Fig. 1) (deviance explained = 90.8%; Fig. S6). We then predicted the climatic maximum potential tree cover to the study area. Climatic maximum potential tree cover was the highest in the north-eastern regions (~70–82%), the central and southern sections of the Western Ghats (~72–79%), and along the foothills of the Himalayas in the north (~60–80%). It was the least in the desert and semi-arid zones of western India (~1–10%) and increased in the Central Indian Highlands region (~35–50%) (Fig. 2a).

Fig. 2: Climatic maximum potential tree cover (%) and Potential tree cover shortfall.
figure 2

The climatic maximum potential tree cover is the maximum tree cover based on the maximum climate water deficit (MCWD), fitted with 95th percentile smooth additive quantile regression model. It is the highest in north-eastern India and the stretch of the Western Ghats along the western coast of peninsular India (a). Potential tree cover shortfall is a measure of the convergence/ divergence of the current tree cover with the climatic maximum potential tree cover. It has a range of 0–1, where values close to 1 indicate divergence of current tree cover from the climatic maximum potential tree cover (or there is 0% current tree cover) (shades of red), and values close to 0 indicate convergence (shades of blue). Note that all areas less than 1209 mm MCWD have been excluded in (b), predominantly in north-western India, as these areas have zero tree cover and are areas of caution for tree-planting activities. The above analyses were completed at 250 m spatial resolution. Note that potential tree cover shortfall has been aggregated by factor = 2 for display purposes (b). Study area (coloured in both maps) was delineated by excluding built-up land, water bodies, aquaculture, salt pans, mangroves, permanent wetlands, snow and ice and all forms of agriculture and by excluding all land area with elevation greater than 1200 mm (see Fig. S1).

We calculated the divergence of current tree cover from the maximum possible tree cover, hereafter referred to as ‘shortfall’, as the difference between 1 and the ratio of current tree cover and climatic maximum potential tree cover. Shortfall values close to 1 indicate maximum divergence of current tree cover from the climatic maximum potential tree cover and shortfall values close to 0 mean convergence of current tree cover with the climatic maximum potential tree cover. Shortfall was high in central regions including the Deccan plateau (~0.3–0.6), Central Highlands (~0.2–0.7) and northern section of the Western Ghat (~0.2–0.6). It was low in the north-eastern regions (~0.1–0.3) and southern and central sections (~0.05–0.2) of the Western Ghats. The shortfall was the highest in the desert areas of north-western India due to zero or negligible current tree cover relative to the small climatic maximum potential tree cover. Hence, we excluded these areas when mapping shortfall to avoid confusion about the possibility to restore tree cover in these desert areas (Fig. 2b).

Evaluation of drivers of potential tree cover shortfall

Considering the sampled points in the medium and mixed tree cover zones, we analysed how much of the shortfall can be explained by environmental drivers in these two zones that contain savanna vegetation. We modelled the relationship between shortfall and sand fraction, burn frequency and fire intensity and herbivory pressure of buffalo, sheep and goat, HLI, TPI and human population density (deviance explained = 50.4%; Table S1). The explanatory powers of the model using training and testing data were 45.4% and 45.1%, respectively, while the root mean square errors of the model using training and testing data were 39.6% and 39.2%, respectively.

We analysed the partial effects of the drivers on shortfall using a generalized additive model. Shortfall decreased linearly with increase in sand fraction. However, we cautiously interpret this result considering the uncertainty of the effect (Fig. 3c). Shortfall increased significantly up to HLI = 0.87, beyond which it decreased, i.e. shortfall was least in cool north-easterly facing regions (HLI ~ 0), increased linearly with increase in incident solar radiation and decreased beyond a threshold towards warm south-westerly facing regions (HLI ~ 1) (Fig. 3f). There was a similar threshold-based relationship between shortfall and TPI: shortfall was the least in the bottom of valleys but increased in lower slopes and flat areas (up to TPI = 0) and then decreased in middle slopes, upper slopes, and ridges (Fig. 3g). Shortfall increased linearly with increasing human population density (Fig. 3e). Interestingly, there was a threshold- based relationship between herbivory pressure of sheep and shortfall. For low levels of sheep herbivory pressure, shortfall decreased with increasing herbivory. However, beyond 1 tonnes km−2 year−1, the shortfall significantly increased (Fig. 3d). Lastly, there was a significant relationship between shortfall and burn frequency and fire intensity. However, considering the highly skewed information of burn frequency (0–13 fires in 2000–2020) and fire intensity (0–54.5 MW of all fires in 2012–2020), we concluded the trends to be unreliable (Fig. 3a, b).

Fig. 3: Partial effect plots of only significant drivers of tree cover shortfall at the sampled points in the medium and mixed tree cover zones (n = 2977).
figure 3

Each plot shows the additive contribution of the respective driver on the linear component of tree cover shortfall with the shaded area indicating 95% confidence intervals. Within each plot, the density plot (grey) shows the distribution of the respective driver. Note that the data used in the model and the density plot for fire intensity, fire frequency, sheep dry matter intake and population density is log transformed (see Supplementary Methods for details). Due to the skewed data of fire intensity and fire frequency, even though significant, these trends are unreliable (a, b). There was a decreasing trend in shortfall with increase in sand fraction (c). For high sheep pressure, the shortfall increased steeply (d). There was a linear increase in shortfall with anthropogenic pressures (e). Lastly, a threshold response of shortfall was observed with topographic variables Heat Load Index and Topographic Position Index (f, g).

Discussion

To the best of our knowledge, this is the first India wide study examining the distribution and drivers of savanna and forest biomes using a remotely sensed tree cover metric. First, we demonstrate that there are four distinct zones of tree cover—a low, high and medium tree cover zones where MCWD, a measure of severity of water stress regulates tree cover and a mixed tree cover zone where MCWD does not have any influence on tree cover. The medium and mixed tree cover zones have savanna vegetation (Fig. 1). Second, using the tested relationship between tree cover and MCWD, we predicted the climatic maximum potential tree cover (Fig. 2a). Consequently, we quantified the shortfall of current tree cover from the climatic maximum potential. This shortfall is high in the central regions of the Deccan Plateau and is low in north-east India and specific sections of the Western Ghats (Fig. 2b). Third, in the medium and mixed tree cover zones, we show that shortfall decreases with increase in soil sand fraction (Fig. 3c). High levels of grazing and anthropogenic pressures increase the shortfall (Fig. 3d, e). At the coarse scale of our analyses, we demonstrate the distinct role of topography in regulating shortfall (Fig. 3f, g). Furthermore, we could not ascertain a relationship between fire and shortfall due to limited data on fires (Fig. 3a, b).

Distribution of tree cover driven by severity of water stress (MCWD)

The low and high tree cover zones clearly delineate the desert and tropical forest biomes respectively53, while the biomes in the zones characterized by medium and mixed tree cover are less clear. The unclear biome status in these two zones is supported by the lack of bimodality in tree cover, typical of co-existing forest and savanna biomes54. However, as shown in this study, these zones contain evidence of herbaceous life forms of graminoids and forbs that are characteristic of savannas9, indicating the co-existence of both savannas and forests in these zones. South Asian savannas include dipterocarp savannas, mixed savannas, fine leaved and spiny savannas and pine savannas with dominant tree clades spanning the entire range of tree canopy cover and physiognomies similar to forest trees8. Similarly, different forest types, especially that of secondary regrowth due to intensive and extensive land use and land cover changes, have a wide range of tree canopy cover. This wide range of tree cover across the variety of forest and savanna formations is difficult to assess using coarse scale remote sensed products contributing to the lack of bimodality and clear biome delineation. Furthermore, it is difficult to interpret the bistability of forest and savanna states in the medium and mixed tree cover zones, due to the uncertainties of the MOD44B VCF product used to estimate percent tree cover50,51. We suggest the use of independent tree cover estimates with additional information about the tree species as validation of the co-existence of savannas and forests. Nonetheless, we suspect that forest and savannas coexist in the medium and mixed tree cover zones at fine spatial scales, i.e. at the landscape scale, such as in southeast Asia17 and the forest–grassland mosaics in north-eastern India55.

Overall, the sigmoidal response of current tree cover to MCWD confirms that severity of water stress is an important driver of savanna biome distribution. This is evident by the clear delineation of the desert and forest biomes at high and low water stress, respectively, and in the remaining two zones containing savanna herbaceous vegetation. The medium tree cover zone (~40% tree cover) is in relatively drought prone areas and the mixed tree cover zone (~0–80% tree cover) in relatively less drought prone areas, akin to findings globally about the distribution of the savanna biome and possibility of alternative biome states with forests36,56,57. This is because water stress drives differences in survival and growth rates of forest and savanna trees and the savanna herbaceous layer2. Mechanistically, trees and grasses coexist in savannas by partitioning limited resources such as water, i.e. there is hydrologically driven resource competition2. Additionally, climate variability and climate driven disturbances (including water resource availability) cause demographic bottlenecks at various life stages of trees, thereby maintaining tree–grass coexistence2. Hence, predicted changes in water stress due to climate change58 are likely to drive future changes in the distributions of savanna and forest in south Asia59.

Topography and anthropogenic pressures drive tree cover shortfall

Topography regulates tree–grass dynamics in the medium and mixed tree cover zones containing savanna vegetation. In areas with water availability, forest trees persist irrespective of terrain, i.e. tree cover converges with its climatic maximum in hilly terrain (including upper slopes and ridges) and valleys as shown. Indeed, our findings here agree with those of similar studies across tropical South America, which show that areas with high rainfall have more forests than savannas irrespective of the depth of the water table60. Low hill slopes might have alternating periods of waterlogging and drought stress amplified by rainfall seasonality; however, savannas can withstand these stresses as shown in tropical South America56, corroborating the increasing trend in shortfall in low slopes and flat areas shown in this study. Furthermore, temperature gradients and the resulting heterogeneity in energy distribution created by topography explains the pattern of shortfall61, with shortfall being less in north-easterly and south-westerly facing areas. We posit the interactive effects of seasonality and aspect will explain the threshold response of tree cover shortfall and HLI, with increased seasonality limiting tree cover in a wide range of terrain aspects. Clearly, interactive effects of topography and rainfall seasonality regulate tree and grass co-existence60,62,63. These effects can be further explored by assessing tree–grass dynamics across topographic and elevational gradients62, in turn accounting for water depth60.

Anthropogenic pressures such as small-scale clearing and fuel wood collection limit tree cover64 explaining the linear increase in shortfall from the climatic maximum potential shown. Hence, in areas where forest and savanna biomes coexist in India, anthropogenic pressure is a regulator of tree cover, like regional evidence from Africa24 and South America65.

Resource and disturbance factors regulate tree cover shortfall

Within the medium and mixed tree cover zones, sandy soils allow tree cover to reach the climatic maximum potential, i.e. decrease in shortfall, similar to findings in Africa27,31,66 and southeast Asia17. By increasing drainage, high sand content can drain moisture in shallow soil layers but can lead to an accumulation of water in deeper layers accessed by deep-rooted forest trees2,33,67. However, the interactive effects of soil moisture and rainfall seasonality can have varying effects on tree cover across forests and savannas when considering rainfall intensity31. Lastly, a better understanding of the functional rooting characteristics of all life forms of forests and savannas can yield interesting insights about the factors responsible for the distribution of forests and savannas. Root traits such as rooting depth of forest and savanna trees and savanna grasses can explain the extent to which varying water depths are being tapped across different soil types and rainfall regimes.

Herbivory directly affects tree cover shortfall and hence regulates forest and savanna biome distribution in the medium and mixed tree cover zones. Generally, at moderate densities, grazers such as sheep, can increase tree growth rates by consuming grasses1,4,28,40, explaining the initial decreasing trend in shortfall. However, at intense levels of grazing, high densities of sheep limit tree recruitment into adult tree size classes thereby promoting savanna grasses39,40. Complimentary to livestock, it would be valuable to understand the impacts of wild herbivores on biome distribution, considering the relatively distinct and last remnant populations of wild herbivores in south and southeast Asia68. This is crucial because functional changes in herbivore communities i.e. wild native herbivores to domestic livestock can have differential impacts on distribution of forests and savannas69 and in turn on ecosystem functions such as accumulation of soil carbon70.

Although there is a significant and decreasing trend in tree cover shortfall with increasing fire frequency in the medium and mixed tree cover zones, these results are difficult to interpret for multiple reasons. For one, the information about fire in the medium and mixed tree cover zones is highly skewed. The fire return interval in south Asia is significantly shorter and the fires are of smaller spatial extent and of reduced intensity, compared to vegetation fires in Africa and possibly South America71,72. For example, in 2001–2017, 16% of the African savanna area burned as opposed to 1.6% in south and southeast Asia73. Additionally, small fires accounted for ~90% of the burned area in south and southeast Asia versus 30% in Africa74. Hence, small and less intense fires might continue to maintain or even increase tree cover. Second, even though we have used complementary satellite derived information about fire intensity and frequency, the small extent and reduced intensity of fires in India might be difficult to detect accurately in remote sensed products75,76,77. Third, reduced fire activity has been reported in savannas and grasslands, mainly because of fire suppression associated with agricultural expansion and intensification globally78 and in savannas of South America79. India’s extensive and intensive historic and current conversion of land especially for agriculture80,81 might have altered fire regimes contributing to uncertainty in detection and the consequent impacts on vegetation.

India’s complex historic and contemporary practices of land management for forests, fire suppression and ongoing anthropogenic pressures including high densities of domestic livestock and ambitious tree-planting initiatives are important factors to consider for appropriate conservation and restoration of its savannas. Here, considering these factors we show that there are intermediate zones of tree cover with savanna vegetation, indicating possible co-existence of forests and savannas. Furthermore, once climatic drivers of tree cover are accounted for, topography, soil texture (sandy soils), anthropogenic pressures and sheep domestic livestock herbivory regulate tree cover in the intermediate tree cover zones. This information can help us to go beyond the ‘forest centric’ approaches to ecosystem restoration13,82. For example, our results of delineation of intermediate zones of tree cover containing savanna vegetation and the low tree cover zone can inform the design of ‘no-go’ areas for tree planting schemes. When considering woody encroachment, insights from our study in India about the role of intense sheep herbivory in reducing tree cover provides useful evidence for the design of appropriate livestock grazing management regimes to control the biomass of palatable dominant trees, thereby maintaining healthy tree–grass dynamics and the persistence of the savanna biome. Hence, by understanding the distribution of the savanna biome and the drivers of its life forms (tree and grass component), we can design nuanced ecosystem restoration strategies that go beyond simplistic tree-planting initiatives in the UN Decade on Ecosystem Restoration.

Methods

Study area delineation and point sampling strategy

We considered the entire terrestrial area of India, from which we excluded land uses and covers of built-up area, croplands (including abandoned and current jhum—a shifting agriculture practise in India—and irrigated croplands), aquaculture, mangroves, salt pans, permanent wetlands and snow and ice at the native resolutions of 60 and 100 m83,84, which we then resampled to 250 m (see Supplementary Methods). Additionally, we excluded all areas with elevation greater than 1200 m where low temperatures might limit tree growth85, which is beyond the scope of this study (Fig. S1). We stratified the study area into 150 mm bins of mean annual precipitation (MAP) (n = 31) to account for the wide range of MAP across the study area (0–>4000 mm; Fig. S2a). We sampled 300 random points in each bin (total n = 9600), for an unbiased representation of tree cover across the precipitation gradient (Fig. S1). We excluded 10.8% of the sampled points due to lack of comprehensive information of resource and disturbance drivers considered (n = 8554).

Processing of tree cover, resource and disturbance-based drivers

We used the Mod44BB.006 MODIS Terra Vegetation Continuous Fields product to extract percent tree cover. This tree cover product is advantageous due information available over a long time period (2000 onwards) and because the spatial resolution appropriately discriminates regional patterns of land uses and covers86, as opposed to tree cover available only at certain time snapshots and at finer spatial resolution. This product has also been used in studies about forest and savanna biome distribution at the global3,23,29 and regional scales17,24,55. Current tree cover was calculated as the mean of the annual percent tree cover (2000–2020) across the study area.

For resource based drivers of climate, we first processed information about vapour pressure deficit, MAP, dry season length, mean annual aridity and MCWD from Terraclimate48. After accounting for correlated variables, we retained MAP and MCWD for further analyses. MCWD is a measure of absolute seasonality87 providing a nuanced metric of dry season severity and resulting water stress88 and reliably predicting forest–climate thresholds89. MCWD is calculated as the most negative value of the cumulative difference between precipitation and potential evapotranspiration88,90. We used SoilGrids250m v2.091 estimates of sand fraction at 0–30 cm soil depth.

We calculated fire and herbivory pressure as the disturbance drivers. For fire we calculated metrics of fire intensity and frequency because of evidence of effects of both fire parameters on open tree canopy structure that is characteristic of savanna trees25,92. Fire frequency was calculated as the number of times a pixel in the study area burned (2000–2021) using the MCD64A1.006 MODIS burned area monthly global product. We used fire radiative power (megawatts (MW)) derived from VIIRS 375 m active fire products93 as a proxy for fire intensity. We followed a two-step process to estimate herbivory pressure across the study area. First, herbivory pressure was calculated as the dry matter intake of buffalo, goat, sheep, cattle and horses integrating livestock census94 and the metabolic demand of each herbivore type as per animal biomass allometric relationships95. Second, to reduce the dimensionality of the 5 different types of herbivory pressure at each sampled point and to account for the effect traits of the functional groups of browsers, grazers and mixed feeders, we conducted a principle component analysis on the herbivory pressure data. Based on the loadings on the two axes, we retained herbivory pressure of buffalo, sheep and goats for further analyses (Fig. S8 and Table S2).

Distribution of tree cover, use of savanna herbaceous vegetation information, delineation of zones of tree cover and visual inspection of sampled points to include additional drivers of tree cover

We evaluated the distribution of percent tree cover against MCWD (Fig. 1) and MAP (Figs S2 and S4) of the final sample points as typically done in studies about forest and savanna biome distribution3,17,23,60. Relative to percent tree cover–MAP (Figs. S2 and S4), there was a clearer pattern of distribution of percent tree cover against MCWD, which we used for further visualization and interpretation. We used non-hierarchical clustering analyses, specifically the partitioning around medoids technique to delineate any zones of tree cover due to its robustness to outliers96. We determined the presence of four clusters as the total within sum of squares error was the lowest with four clusters and because there was negligible drop in total within sum of squares errors subsequently (Fig. S3). We then extracted the minimum and maximum values of MCWD of each of the four clusters to be the thresholds of the zones.

However, remote sensed tree cover products including that used in this study, do not provide information about savanna understorey vegetation in the zones. And since Indian savanna trees have physiognomies very similar to closed canopy forests8, we included information of the savanna herbaceous vegetation to better delineate the forest and savanna biomes. We used the locations of graminoids (species n = 5 annuals and n = 8 perennials), forbs (species n = 8 annuals and n = 55 perennials) and woody species (n = 22) from Nerlekar et al.9 present within the study area (47.5% of complete data published in Nerlekar et al.9). Additional information about the locations of Indian savanna woody and grassy species is a limitation we acknowledge as this data9 is not exhaustive. At the locations of the savanna vegetation, we extracted the percent tree cover and included it in the percent tree cover- MCWD distribution plot (Fig. 1).

We qualitatively assessed the tree cover of 160 random sampled points across the tree cover zones, using Google Earth imagery. We saw clear evidence of anthropogenic activities such as paths and abandoned agriculture. Also we noted topographical relief features such as ridges and valleys with varying vegetation. Hence, we decided to include anthropogenic pressure and topography as additional drivers of savanna and forest biome distribution in India. We extracted human population density for 202097 as a proxy of anthropogenic pressure. We characterized topography using the HLI and the TPI. Both indices characterize the land surface in terms of the steepness of slope of the surface. HLI was calculated as per McCune et al.52 while TPI was calculated as per De Reu et al.98.

Estimation of climatic maximum potential tree cover and potential tree cover shortfall

We tested the relationship and estimated the climatic maximum potential tree cover by using the 95th percentile smooth additive quantile regression model between percent tree cover and MCWD, as implemented in the qgam R package99 (deviance explained = 90.8%; Fig. S6a). We mapped the model predictions to the study area as the climatic maximum potential tree cover (Fig. 2a). Diagnostic plots were used to assess that 5% of the sampled points fall below the fitted quantile (Fig. S6b).

We calculated potential tree cover shortfall as the difference between 1 and the ratio of current tree cover and the climatic maximum potential tree cover across the study area. Values close to 1 indicate divergence of current tree cover from the climatic maximum potential tree cover (i.e. tree cover is well below the potential) and values close to 0 indicate convergence of current tree cover with the climatic maximum potential tree cover (Fig. 2b).

Evaluation of drivers of potential tree cover shortfall

Using the sampled points in the medium and mixed tree cover zones (final size n = 2977), we used generalized additive models (GAMs) to determine the relationship between potential tree cover shortfall and sandy fraction, all disturbance based drivers, topographical drivers and anthropogenic pressure (Eq. 1). We used GAMs because of its flexibility to accommodate non-linear relationships and non-Gaussian data, and to account for the spatial autocorrelation structure in the sampled points100. We reduced overfitting of the models in three ways by (1) penalizing the cubic regression smoothing function for each driver (2) using a double penalty approach implemented using the select() parameter in the mgcv R package101 and (3) using the Restricted Maximum Likelihood criterion for convergence102. We partitioned the data into 80% training and 20% testing portions. We used the beta family distribution with a logit link in the analyses and specified an approximation of 5 or fewer knots in the basis function for each driver. Diagnostic plots of residuals were assessed, and the model was validated against the testing data by comparing estimates of root mean square error and explanatory power.

$$\log \frac{\,{{{{\rm{PTCS}}}}}}{1-{{{{\rm{PTCS}}}}}}\, = \, \alpha +{s}_{1}\left({{{{\rm{fire}}}}\; {{{\rm{radiative}}}}\; {{{\rm{power}}}}}\right)+{s}_{2}\left({{{{\rm{fire}}}}\; {{{\rm{frequency}}}}}\right) \\ +{s}_{3}\left({{{{\rm{fire}}}}\; {{{\rm{radiative}}}}\; {{{\rm{power}}}}}\right)\left({{{{\rm{fire}}}}\; {{{\rm{frequency}}}}}\right) {s}_{4}\left({{{{\rm{sand}}}}\; {{{\rm{fraction}}}}}\right)\\ + +{s}_{5}\left({{{{\rm{buffalo}}}}\; {{{\rm{dry}}}}\; {{{\rm{matter}}}}\; {{{\rm{intake}}}}}\right) +{s}_{6}\left({{{{\rm{goat}}}}\; {{{\rm{dry}}}}\; {{{\rm{matter}}}}\; {{{\rm{intake}}}}}\right)\\ +\,{s}_{7}\left({{{{\rm{sheep}}}}\; {{{\rm{dry}}}}\; {{{\rm{matter}}}}\; {{{\rm{intake}}}}}\right) +{s}_{8}\left({{{{\rm{human}}}}\; {{{\rm{population}}}}\; {{{\rm{density}}}}}\right) \\ +{s}_{9}\left({{{{\rm{HLI}}}}}\right)+{s}_{10}\left({{{{\rm{TPI}}}}}\right)+\,{s}_{11}\left(x\right)\left(y\right)$$
(1)

where PTCS stands for potential tree cover shortfall, α is the intercept and si(driveri) denotes the smooth non-parametric functions.

Please refer to the Supplementary Information for additional details about the resource, disturbance, topographical and anthropogenic pressures processed (Table S3) and about the methods, figures and tables. All figures and maps were created using ggplot103 and tmap104 R packages. All analyses were completed in R statistical software105 unless specified otherwise.