Introduction

Half of global photosynthesis is performed by single-celled organisms in the oceans (phytoplankton)1. Phytoplankton form the base of the ocean food chain, making them central players in both global carbon cycling and ocean ecosystem function. Assessing how phytoplankton and, more generally, ocean microbial ecosystems respond to changes in climate and impact biogeochemical cycling requires integrating the combined effects of molecular, physiological, ecological, and evolutionary dynamics. These micro-scale processes must then be scaled up in order to assess the global scale impact of climatic shifts. Numerical models provide an invaluable tool for addressing this challenge2. However, the current generation of models aimed at capturing and predicting global marine microbial ecology and biogeochemical cycling do not incorporate plastic or evolutionary shifts in trait relationships and, thus, are missing a critical process.

The timescales of climate change and evolution have the potential to be similar for phytoplankton, given their short generation time (on the order of days) and high-standing genetic diversity3,4,5,6,7. In fact, experimental evolution studies have shown that phytoplankton can adapt rapidly to environmental shifts by changing their trait values. Trait changes can occur either immediately through reversible (plastic) trait change, even in the absence of pre-existing genetic variation8,9, or over dozens to hundreds of generations through heritable change (such as genetic mutations, heritable epigenetics, or plastic modifications). Both traits and the relationships between them can shift10,11,12. Thus, it is critical to consider the impact of both short-term or reversible (usually plastic) and long-term or irreversible (usually evolutionary) phenotype shifts as we formulate predictions for how the ocean carbon cycle and ecosystem structure will shift as the climate warms.

Care must be taken when selecting the appropriate model for predicting evolved phytoplankton phenotypes and the resulting impact on biogeochemical cycling. Purely statistical models can synthesize large amounts of data but can fail outside present-day conditions13. Current state-of-the-art ocean biogeochemical models (i.e., classic Nutrient-Phytoplankton-Zooplankton-Detritus models14) have been developed to provide insight into the current rates of biogeochemical cycling, and how these rates might vary on seasonal and interannual timescales. These models are not well suited for studying the impact of evolution as they do not resolve complex phenotypes and thus cannot constrain their evolution. In our view, trait-based models provide the most promising framework for studying how shifting environmental conditions will impact phytoplankton phenotypes and rates of biogeochemical cycling. These models parameterize multiple (N = 1 to hundreds) plankton types and then simulate how these types are influenced by their environment, interact with other microbes through competition for resources and grazing, and set rates of biogeochemical cycling. Below, we describe these models, the benefits of using these models for addressing the challenge of modeling phytoplankton adaptation, and current shortcomings.

We propose expanding the current trait-based approach for studying and modeling microbial (in particular phytoplankton) plastic and evolutionary trait changes by moving beyond static pairwise trait relationships (i.e., statistical relationships between two traits; Fig. 1). This perspective outlines how integrating statistical methods and trait-based models can help us represent changes in multi-trait phenotypes in a dynamic ocean. In particular, we posit that this approach will allow for better integration between evolution experiments, field observations of multi-trait phenotypes, and numerical models and provide more robust predictions for future ecosystem states and rates of carbon cycling. Specifically, these models will generate new insight into the potential for novel phytoplankton multi-trait phenotypes to contribute to ecosystem function and nutrient cycling.

Fig. 1: Conceptual diagram of conventional approach and proposed framework.
figure 1

The ideal method for robustly projecting evolutionary trait changes is to generate a trait-based model directly from the complete integrated phenotype, indicated by the blue arrow. As the complete integrated phenotype is not fully observable, the conventional approach is to empirically observe pairwise relationships for N traits and use a master trait to connect the traits in a trait-based model. In the figure, trait A would be used as the master trait. We suggest a reframed approach that defines a multi-trait landscape (trait-scape) from the same N traits and trait measurements. These reduced axes provide a correlation matrix between the traits that can be used to generate a trait-based model. Both options can use the same empirical dataset.

Current model assumptions

Marine microbes are often described using a set of functional traits, such as cell size, nutrient affinities, and metabolic rates15. These traits are, by definition, numerous and interconnected (here, we refer to this as the integrated phenotype of an organism). There is a long history of studying how these traits relate to each other16,17,18,19,20,21,22 and the physiological underpinnings of these relationships (e.g., metabolic scaling theory23). This understanding of trait relationships has led to the development of numerical trait-based models24. Such models capture the response of multi-trait phenotypes to shifts in biotic and abiotic conditions and have been used to study patterns of biodiversity, and biogeochemical and ecological dynamics. They can also provide powerful insight into how phytoplankton will respond to a changing ocean by modifying trait values25,26,27. However, there are three primary assumptions in the current formulation of trait-based models that limit the ability to use these models for studying evolutionary trait changes.

First, contemporary trait-based frameworks rely on a small number of pairwise trait relationships. This is linked historically to the approaches used by researchers to understand organismal responses to their environment15. Pairwise trait relationships have been used to determine how different traits are related to one another, to understand their potential interactions and how this influences the phenotype, and to identify whether there are trade-offs between traits where improving one trait comes at the expense of another. Modelers have adopted this framework because it provides a tractable theoretical framework for simplifying phytoplankton phenotypes, allowing for the easy incorporation of multiple phytoplankton groups into models24.

Many trait-based models use size as a master trait to which all other traits are linked through allometric relationships28,29,30. For example, a strong positive relationship between phytoplankton size and maximum nitrogen uptake rate has been observed (r2 = 0.96, p << 0.01)21,22, as well as a significant negative relationship between size and nitrogen affinity19. These pairwise relationships can then be used to parameterize phytoplankton types in trait-based models where nutrient uptake and affinity can be calculated for each size group based on a simple power function (aVb)28,30. For other pairwise trait combinations, the relationships are not as clear, such as phosphorus half-saturation constant versus cell size19 and biomass-specific respiration rate versus cell size21. While utilizing pairwise trait relationships can successfully describe current phenotypes, if selection acts on more complex relationships, or on the trait relationship itself, the master-trait approach will be unable to represent these dynamics.

Second, trait-based frameworks often assume that trait relationships vary in the same way across different taxonomic groups, or between diverse ecologies for the same group (such as temperate and polar phytoplankton). Specifically, the pairwise trait relationships, such as those examples provided above and that form the basis of trait-based models, are derived from experimental studies using large datasets that include many taxa (sometimes even spanning across phyla or kingdoms)19,22. These trait relationships reflect selection that has occurred over millennia and can provide key insight into fundamental evolutionary trade-offs at these levels of taxonomic separation. However, there is increasing evidence that trends of trait variation observed between species (interspecific) deviate significantly from trends within a species (intraspecific)31,32,33,34. Thus, interspecific trait relationships do not always predict individual lineage behaviors on shorter timescales (both plastic and adaptive responses16,35). Using these relationships to predict how individual phytoplankton phenotypes will shift in response to different temporal scales of environmental change makes the implicit assumption that selection acts consistently across all taxa/species. In other words, if an environmental change favors smaller cell sizes, then in a trait-based model, a diatom will not only get smaller but will functionally look more and more like a cyanobacterium in terms of all of its traits, not just cell size.

Finally, trait correlations are static in trait-based models. Trait correlations determine the possible directions along which phenotypes can “evolve” ultimately limiting which future phenotypes are possible10,36,37,38,39. In fact, experimental studies have shown that the relationships between traits can evolve within lineages10,34. This suggests that pairwise trait relationships that emerge when looking across functional groups (i.e., at high taxonomic levels) are potentially much more flexible at lower taxonomic levels (within species or genera). These challenges raise concerns that models based on pairwise interspecific relationships may have poor predictive power, both in capturing shorter-term changes if there are transient shifts in trait relationships and on longer (decadal to centennial) timescales. Specifically, if environmental or biologic change selects for novel phenotypes, current trait-based models will underestimate shifts in ecological and biogeochemical dynamics because they don’t allow correlations to evolve. Here, we propose an approach that can be used to address these three limitations and innovate trait-based models.

Quantifying multi-trait phenotypes

Improving the representation of integrated phenotype evolution in trait-based models (i.e., addressing the limitations described above) requires altering both how we collect trait data and how we analyze the data. Specifically, we need to increase the number of comprehensive datasets that accurately assess integrated phenotypes across different taxonomic scales. Such datasets are facilitated by recent technological advances that allow for increased automation in many trait measurements, smaller sample sizes (i.e., numbers of individuals sampled), and more diverse sampling locations16,40. The second challenge is in innovating how we analyze this multidimensional data in order to assess how multiple traits and trait relationships may simultaneously shift.

Embracing the complexity of an integrated phenotype is challenging. However, conventional statistical approaches (e.g., PCA, nMDS) can be used to simplify the system and determine a set of reduced axes which capture both trait values and trait interactions. Specifically, statistical methods can identify the key axes of variation that best define shifts in multi-trait phenotypes. This is a logical extension of the classic pairwise relationships that are currently used but allows for more complex relationships between multiple traits. We have recently applied this approach to a large dataset where we quantified 9 traits for 13 diatom strains. We demonstrated that principle component analyses could be used to generate a “trait-scape” where multi-trait phenotypes can be projected on a reduced set of axes16. Moreover, these trait-scapes can provide a framework for assessing shifts in the integrated phenotype16 and can also be used to understand how phytoplankton traits evolve10,41,42. These trait-scapes allowed us to capture shifts not only in trait values but also relationships between traits to better describe how the integrated phenotype changes.

Our proposed approach has several similarities to Ecological Niche Models (ENMs) which have been used for over a century by ecologists to define ecological niches43,44. There are many different types of models that fall under the broad category of ENMs, but they all use statistical approaches to asses suitabilities and/or species distributions across landscapes in order to gain insight into ecological dynamics and determine ecological niches. Similarly, we propose to use statistical approaches to assess multidimensional data and identify predictive groupings. A key distinction is that ENMs are focused on linking species abundance data with environmental factors, mapping suitable regions for species, and identifying co-occurring groups of species. Here, we propose to use similar techniques to understand phenotypic trait groupings and the relationship between these traits within a single species. This insight would then be used to parameterize mechanistic trait-based models, which can in turn be used to prognostically simulate growth and food web dynamics and the resulting ecological co-occurrences (described below).

A key advantage to leveraging statistical approaches for the challenge of understanding integrated phenotype adaptation is that these approaches can both facilitate the interpretation of multi-trait data and provide insight into phenotypic change even in the absence of mechanistic knowledge of physiology or genetics. For example, it is not necessary to understand the underlying mechanisms that cause a pair of traits to be correlated in order to assess shifts in an integrated phenotype in a dataset and incorporate this shift into trait-based models. Thus, this approach does not require knowing the genetic basis of trait values or being able to map genotype to phenotype. While our approach is not reliant on mechanism, using statistical approaches to uncover relationships between traits can then lead to testable hypotheses as to the underlying physiological or genetic mechanisms driving these relationships. Ultimately, testing these hypotheses experimentally can result in an improved understanding of genetic and physiological mechanisms by providing observations that are consistent or inconsistent with current mechanistic knowledge.

When selecting traits to be analyzed, a key concern is that bias can be introduced by not quantifying the appropriate traits. If one had a complete mechanistic understanding of microbes, the choice of ‘key traits’ to quantify for studying a given physiological, ecological, or biogeochemical process would be simple—unfortunately, our understanding is incomplete. Care must therefore be taken to ensure that the reduced dimension trait-scape is robust to the choice of parameters quantified. Argyle et al. 16 tested this by removing traits individually from the PCA and determining how this influenced the trait-scape. A key benefit to using a reduced dimension trait-scape is that it decreases the potential for bias associated with an incomplete understanding of phenotypes11,45, i.e., associated with selecting a suboptimal set of traits. As we describe below, using the trait-scape approach also allows for the assessment of whether the phenotype has been sufficiently sampled to capture the relevant variation for the specific question at hand (i.e., whether a study has adequately captured the phenotypic changes of interest).

First, we assume that, within the complete set of possible traits describing an integrated phenotype, multiple traits (>1) will relate to different ecological and biogeochemical functions of interest. This is to say that there is no single trait that is required to assess a particular aspect of the integrated phenotype. This allows that there might be an ‘ideal’ trait for describing a particular biogeochemical function of interest but assumes that the function can also be described using another set of traits. When we tested this on the diatom trait dataset, we showed this was indeed the case16. Thus, when the goal is to assess an integrated phenotype, one is required only to select a subset of traits that are sufficient for describing the biogeochemical set of functions of interest—these traits do not necessarily need to be unique or even optimal in their ability to describe the functions. Importantly, quantifying any additional traits that are related to the set of functions already described by the trait-scape will not alter the trait-scape because it will not provide any new information45. Given that trait measurements can be both slow and costly to perform, it is valuable to have an objective indication of a minimal but informative set of measurements for a given question.

One can assess whether a sufficient number of traits have been sampled by either subsampling the traits and determining if the trait-scape shifts (such as done by Argyle et al. 16 and described above) or by analyzing the movement of phenotypes within the constructed trait-scape. If phenotypes are differentiated within a trait-scape and are constrained in how they move within the trait-scape when subjected to new environmental conditions, then the trait-scape is able to sufficiently capture the integrated phenotype. However, if the integrated phenotypes appear stable (i.e., do not move within the trait-scape) but the individual traits comprising the integrated phenotypes are not stable across environments, then this suggests that the system was under-sampled (i.e., the experiment did not sample key trait(s) involved in the environmental response). Thus, the user can design a “minimal” experiment for a given question and avoid sampling traits that do not provide additional information.

In summary, multivariate statistical approaches provide a framework for analyzing multi-trait data and determining ‘key traits’ to quantify. It allows us to move away from master-trait approaches and can provide valuable insight into integrated phenotype adaptation even in the absence of a complete mechanistic understanding.

Innovating biogeochemical and ecological models

We know that evolution will alter the integrated phenotype of phytoplankton and that these shifts can involve changes in the ways that traits relate to one another (i.e., shifts in pairwise trait relationships)10,34,41. Innovating marine ecosystem models to allow for the investigation of microbial responses to environmental change, in both abiotic and biotic dimensions, will thus require altering the underlying assumptions in current trait-based models. Specifically, we must address the three key limitations highlighted above by: (1) moving away from the reliance on a master trait; (2) understanding intraspecific relationships between traits; and (3) allowing the relationships between traits to shift (and continue shifting) in response to selective pressure.

We propose that the underpinnings of trait-based models should be modified to integrate information from multivariate statistical analyses of multi-trait datasets. The first step is to generate trait-scapes for a range of phytoplankton species over a range of growth conditions and, if possible, from a variety of isolates from different environments under similar growth conditions. This will provide insight into how key traits relate to each other within species, and how much variation there is in these relationships between species. Given that microbes interact with one another, and that their biogeochemical responses are strongly influenced by community composition and abundance46, a recommended future direction is for trait measurements to also consider changes in the biotic environment. We can then start to understand the first-order trait relationships driving phenotypic shifts by analyzing how phenotypes move across a trait-scape as cells acclimate to shifts in environmental conditions (abiotic and biotic) and adapt to new environments. Critically, this will allow us to model phenotypic shifts in a way that is not restricted to predetermined trait relationships. For example, for certain environmental changes, the classically used rate-affinity trade-off may not be the primary driver of integrated phenotype variation, and so a model in which phytoplankton functional groups are primarily differentiated along this axis will fail to predict shifts in the integrated phenotype and the potential emergence of novel phenotypes.

Our approach focuses on trait-scapes derived for a single species thus, the resulting trait relationships are intraspecific. This addresses the challenge that interspecific relationships may not reflect plastic or evolutionary trait changes within a species, especially over timescales much shorter than those typically associated with trait divergence between species or functional groups. These multi-trait relationships derived from single species could be compared across species to generate generalized relationships, or group-specific relationships, to be used in models (Fig. 1). For some groups (e.g., diatoms), we currently have sufficient data to begin this analysis while for other groups defining the primary axes of variation (and how key traits project on these axes) will rely on the generation of additional multi-trait datasets.

Finally, by removing the assumption of universal pairwise trait relationships, our approach allows for flexible/evolving trait relationships. This allows modeled evolved phenotypes to emerge as a function of both selective pressure and the ancestral phenotype—including the underlying trait correlations. Some preliminary modeling work suggests that there are some hard constraints related to space and energy, which determine the relationships between certain traits that a cell cannot escape47. However, for many trait combinations the relationship is determined by the metabolic strategy employed by the cell to address the environmental stress it faces and thus can be altered through selection.

Here we propose embracing the full complexity of multi-trait phenotypes in order to understand the primary axes of variation and the salient features to include in simplified models, thus capturing first-order ecosystem dynamics. This is the inverse of the conventional way of developing models—start with the simplest representation and then add complexity. Importantly, this alternative approach is not necessarily more computationally expensive because of the reduced dimensionality of the trait data. Specifically, we are not necessarily advocating for incorporating additional traits into numerical models but rather replacing the existing static trait relationships with a more dynamic representation of integrated phenotypes. This may also involve altering the specific traits that are represented in models to capture orthogonal traits that better describe observed variation amongst individual phenotypes. While this will increase the computational cost of the model due to additional calculations, the cost increase will not be dramatic as the cost primarily scales with the number of new tracers (pools or state variables) being carried in the model, which would not significantly change. Thus, a model using a trait-based framework with two major axes of trait variation will have a similar computational load as a model using a single pairwise trait.

Advancing our understanding of biogeochemical cycling

Ecosystems are comprised of intricate webs of connected organisms which generate non-linear and often threshold-type behavior. Models provide a means for understanding these connections and identifying critical tipping points. While there are many types of models that can be used to investigate ecological dynamics at different scales, here we focus on the specific example of trait-based models, which are used to link phytoplankton growth and mortality to biogeochemical cycles. We believe these are the models which are best suited to incorporate phytoplankton phenotypic plasticity and evolution into global carbon cycle models and Earth System models. By improving the representation of phytoplankton phenotypic variability in trait-based models we will be able to better capture different ecological states and thus improve the representation of key behaviours such as thresholds and non-linear dynamics.

All models require simplifying assumptions in order to construct tractable modelling frameworks. However, we have identified three key issues with the current assumptions used in trait-based models. We propose that a path forward leverages multivariate statistical analyses to understand the key axes of variation for phytoplankton phenotypes. Reframing trait trade-offs to shift from thinking about pairwise relationships to considering integrated phenotypes parallels the conceptual shift that has occurred in the global change community to consider multiple drivers of organismal change7,48,49,50,51,52. It also reflects the utility of response curves53 and surfaces54,55 in understanding organismal responses. Several studies have shown that assessing organismal growth responses to shifts in environmental conditions requires sampling at the right resolution. For example, data assessing growth responses can produce different results if the full response curve (e.g., spanning sub-, optimal and supra-optimal temperatures) is not adequately sampled49,56,57,58. Similarly, we argue that assessing phenotypic change only through pairwise relationships can obscure more complex phenotypic responses due to correlated trait changes, including compensatory responses.

The approach we outline provides a means for identifying targeted trait measurements that are particularly informative for understanding shifts in integrated phenotypes. This will assist in ensuring that new datasets are maximally effective in helping to define and constrain integrated phenotypes and their plasticity—a first step necessary for incorporation of these dynamics into trait-based models. While we have focused here on physiological traits, it is possible that a similar approach could be used with molecular (e.g., transcriptomic) data to simplify the complexity of gene regulation patterns into a tractable set of reduced axes42, which would allow for the integration of genetic information into large-scale models such as trait-based phytoplankton models. Finally, we anticipate that the framework will also generate new hypotheses related to trait shifts and biogeochemical sensitivity that can be further tested in the lab or field.

This proposed approach of combining a statistical interpretation of phenotypic data with trait-based models effectively merges two dichotomous approaches for predicting future ecological and biogeochemical changes. Recent years have seen large gains in the application of statistical methods and machine learning techniques to oceanographic data including for the use of ecological predictions13,59,60. These approaches are powerful tools for synthesizing large amounts of information and uncovering underlying structures in observations. However, using present-day correlations to predict future changes can be problematic13. We believe that leveraging statistical approaches to elucidate multi-trait phenotypic variation and incorporating those statistical relationships into mechanistic models can provide a promising path forward for improving our projections of future ecosystem states. In addition, this approach allows us to better link trait measurements, ‘omics datasets12, and numerical models. As new sets of integrated measurements are collected61, these statistical relationships can be improved and used to revise our models.