Multivariate information theory uncovers synergistic subsystems of the human cerebral cortex

Varley, Thomas F.; Pope, Maria; Faskowitz, Joshua; Sporns, Olaf

doi:10.1038/s42003-023-04843-w

Download PDF

Article
Open access
Published: 24 April 2023

Multivariate information theory uncovers synergistic subsystems of the human cerebral cortex

Thomas F. Varley ORCID: orcid.org/0000-0002-3317-9882^1,2^na1,
Maria Pope^1,3^na1,
Joshua Faskowitz^2,3 &
…
Olaf Sporns^1,2,3

Communications Biology volume 6, Article number: 451 (2023) Cite this article

5385 Accesses
15 Citations
17 Altmetric
Metrics details

Subjects

An Author Correction to this article was published on 07 June 2023

This article has been updated

Abstract

One of the most well-established tools for modeling the brain is the functional connectivity network, which is constructed from pairs of interacting brain regions. While powerful, the network model is limited by the restriction that only pairwise dependencies are considered and potentially higher-order structures are missed. Here, we explore how multivariate information theory reveals higher-order dependencies in the human brain. We begin with a mathematical analysis of the O-information, showing analytically and numerically how it is related to previously established information theoretic measures of complexity. We then apply the O-information to brain data, showing that synergistic subsystems are widespread in the human brain. Highly synergistic subsystems typically sit between canonical functional networks, and may serve an integrative role. We then use simulated annealing to find maximally synergistic subsystems, finding that such systems typically comprise ≈10 brain regions, recruited from multiple canonical brain systems. Though ubiquitous, highly synergistic subsystems are invisible when considering pairwise functional connectivity, suggesting that higher-order dependencies form a kind of shadow structure that has been unrecognized by established network-based analyses. We assert that higher-order interactions in the brain represent an under-explored space that, accessible with tools of multivariate information theory, may offer novel scientific insights.

Geometric learning of functional brain network on the correlation manifold

Article Open access 22 October 2022

Functional brain networks reflect spatial and temporal autocorrelation

Article 24 April 2023

Null models in network neuroscience

Article 31 May 2022

Introduction

Perhaps the most ubiquitous model used in complex systems is the network, which represents pairwise interactions between different elements of a system as directed or undirected graphs^1,2. While network models can be extremely powerful, they are also fundamentally limited by the rule that every interaction between elements is strictly bivariate. Hence, interactions between three or more nodes must be indirectly inferred, using methods such as motifs³, transitivity or clustering coefficients⁴, and mapping cores or mesoscale communities^5,6. Increasingly, statistical interactions involving more than two elements (termed higher-order interactions) are recognized as being a key feature of complex systems^7,8. This makes the task of recognizing and modeling higher-order structures an important, developing field. However, a lack of well-developed, formal tools, as well as the inherent computational and combinatorial difficulties associated with higher-order interactions, have limited their application. In neuroscience, higher-order interactions have been theoretically implicated as building blocks of complexity^9,10 and functional integration¹¹. Empirically, they have been found at multiple scales, including in neuronal networks^{12,13,14,15,16,17}, electrophysiological signals^18,19, and fMRI BOLD data^20,21,22, where higher-order interactions have been proposed to relate to emergent mental phenomena and consciousness²³.

Recently, Rosas and Mediano²⁴ proposed that information theory could be used to identify higher-order interactions in multivariate systems, and furthermore, that it is possible to disentangle qualitatively different kinds of interactions, characterized by pairwise redundant and synergistic modes of information sharing. Intuitively, redundant information corresponds to information that is copied over many different elements such that the observation of a single element resolves the corresponding uncertainty in all of the other elements. In contrast, synergistic information sharing occurs when uncertainty can only be resolved by considering the joint state of two or more variables. This space of redundant and synergistic interactions in the brain remains largely unexplored, as it comprises interactions that are typically inaccessible to a bivariate, functional connectivity network analysis. Synergy is of potential interest because it tracks the ability of the brain to generate novel information through the interactions of multiple brain regions (sometimes called information modification)²⁵. In studies of cortical neural networks, synergy has been associated with neural computation (the genesis of new information through a non-trivial interaction of multiple inputs)^{12,13,14,15,17,26}.

Much of the previous work on higher-order information in neuroscience has used the partial information decomposition (PID) framework^27,28, which provides a complete decomposition of the joint mutual information into atomic information components. While powerful, the PID framework has some fairly strict limitations that have hindered its adoption by the wider complex systems community. The first is that it requires partitioning a system into sources and targets, and does not allow analysis of the whole system qua itself. The second is that, due to the combinatorial explosion of information atoms, analysis of more than five or six elements is impossible. Given that even small systems can have hundreds, or even thousands of elements, this is a severe limitation. Finally, the PID is unusual in that, while it reveals the structure of multivariate information, actually calculating values from data requires an additional step: the selection of a redundancy function that quantifies some notion of redundant information. This is a surprisingly difficult task, as many redundancy functions have been proposed, and different choices can lead to radically different descriptions of the same system^29,30.

Rosas et al. introduced the O-information²⁴ as an alternative measure, which gives an overall estimate of the extent to which a system is redundancy dominated or synergy dominated, without requiring the incredible computational cost or ad hoc choices required by the PID. The O-information reveals the global structure of the information-sharing dependencies in a system. Negative O-information indicates the presence of predominantly synergistic interactions, while positive O-information indicates predominately redundant interactions. Despite its strong appeal as a quantitative metric, the origins and neural manifestations of O-information have remained elusive, if not enigmatic³¹.

In this work, we apply a range of information-theoretic measures to resting state fMRI data acquired from human cerebral cortex with the aim of identifying ensembles of regions (subsystems) that express specific modes of higher-order statistical dependencies. First, we introduce the mathematical machinery required to derive the O-information, and its interpretation in the context of multivariate information sharing processes. We derive an analytic relationship betwee the O-information and other, more well-known, multivariate metrics such as the Tononi-Sporns-Edelman complexity¹⁰. Then we apply multivariate information metrics to brain data and uncover the presence of abundant and widely distributed subsystems expressing synergy (negative O-information) across the entire cerebral cortex. Finally, we discuss what our insights reveal about the structure and functional roles of higher-order relations in brain activity.

Results

Theory

Integration, segregation, redundancy, synergy

A fundamental idea in modern theoretical neuroscience states that the nervous system maintains a balance between integration and segregation⁹. The integration-segregation balance principle is based on the insight that the nervous system combines regional elements of functional specialization, with system-wide functional integration. Considerable empirical work has gone into the neural integration-segregation hypothesis, and the on-going balance of integrated and segregated dynamics has been found to be regulated by distinct neuromodulatory systems^32,33, and correlates with conscious awareness^{34,35,36,37,38}.

The segregation-integration spectrum is typically visualized as a one-dimensional space: on one extreme the system is totally dis-integrated and every element is behaving entirely independently of all the others. On the other extreme is the case of total integration: every element synchronizes with every other element so that the whole system is densely connected. In the middle there is a complex regime where the system combines elements of independence and integration. As it was originally formulated, integration and segregation were discussed in the contexts of networks, and higher-order interactions were inferred via partitioning the system into subsets of varying numbers of nodes⁹. These arguments pre-dated the rigorous, mathematical distinction between redundancy and synergy, introduced in the work of Williams and Beer almost two decades later²⁷. Building on these foundations, as well as the definition of O-information from Rosas et al.²⁴, we argue that the notion of integration can be expanded to include redundant integration and synergistic integration. The result is a rich space described by distinct dimensions of integration, segregation, redundancy, and synergy (although these do not form an orthogonal basis). This high-dimensional, qualitative configuration space may be viewed as an informational morphospace^39,40,41 and provides a framework for the detailed comparison of different systems.

Information theory and higher-order information-sharing

In this section, we introduce the basics of information theory necessary to understand its application to higher-order relationships. For a more thorough introduction, readers may be interested in Cover & Thomas⁴². The basic object of study in information theory is the entropy⁴³, which quantifies the uncertainty that we, as observers, have about the state of a variable X. If the states of X are drawn according to the probability distribution P(X = x) with Support Set ${{{{{{{\mathcal{X}}}}}}}}$, then the entropy of X is:

$$H(X)=-\mathop{\sum}\limits_{x\in {{{{{{{\mathcal{X}}}}}}}}}P(x){\log }_{2}P(x)$$

(1)

This classic formulation of entropy assumes that X is a discrete random variable, although for continuous data, the generalization to differential entropy is reasonably straightforward (see Sec. Gaussian Information Theory).

Now consider two variables X₁ and X₂: how does knowing the state of X₁ reduce our uncertainty (the entropy) about the state of X₂? The answer is given by the mutual information⁴³, which can be written in two mathematically equivalent forms:

$$I({X}_{1};{X}_{2})=H({X}_{1})+H({X}_{2})-H({X}_{1},{X}_{2})$$

(2)

$$=H({X}_{1},{X}_{2})-[H({X}_{1}| {X}_{2})+H({X}_{2}| {X}_{1})]$$

(3)

The bivariate mutual information is often applied in the study of complex systems for the inference of functional connectivity networks (e.g., refs. ^{44,45,46,47,48}), which can reveal the structure of dyadic interactions between different elements⁴⁹. While functional connectivity networks are extremely powerful, they are fundamentally limited by their pairwise structure and are insensitive to higher-order interactions between three or more variables.

The natural place to begin an analysis of higher-order structures in neural data, then, is by attempting to generalize the mutual information to account for more than two variables. Unfortunately, there is no single unique generalization, and at least three are known to exist: the total correlation, the dual total correlation, and the interaction/co-information (which we will not explore in detail here)⁴². The total correlation (also referred to as the integration in ref. ⁹), is formally a straightforward generalization of Eq. (2):

$${{{{{\mathrm{TC}}}}}}({{{{{{{\bf{X}}}}}}}})=\mathop{\sum }\limits_{i=1}^{N}H({X}_{i})-H({{{{{{{\bf{X}}}}}}}})$$

(4)

$$={D}_{KL}(P({X}_{1},\ldots ,{X}_{N}))| | \mathop{\prod }\limits_{i=1}^{N}P({X}_{i})\left.\right)$$

(5)

where X is a macro-variable comprised of an ensemble of multiple random variables: X = {X₁, X₂, …, X_N} and D_KL() is the Kullback-Leibler divergence from prior distribution Q(x) to posterior distribution P(x):

$${D}_{KL}(P| | Q)=\mathop{\sum}\limits_{x\in {{{{{{{\mathcal{X}}}}}}}}}P(x)\log \frac{P(x)}{Q(x)}$$

(6)

The total correlation is low when every variable is independent, and high when every variable is individually highly entropic but the joint-state of the whole has low entropy. This occurs when the whole system is dominated by redundant interactions: the state of a single variable discloses a large amount of information about the state of every other variable.

The second generalization of mutual information is the dual total correlation, formally a generalization of Eq. (3):

$${{{{{\mathrm{DTC}}}}}}({{{{{{{\bf{X}}}}}}}})=H({{{{{{{\bf{X}}}}}}}})-\mathop{\sum }\limits_{i=1}^{N}H({X}_{i}| {{{{{{{{\bf{X}}}}}}}}}^{-i})$$

(7)

where H(X_i∣Xⁱ) refers to the residual entropy⁵⁰: the uncertainty intrinsic to the the i^th element of X that is not resolved by any other variable, or collection of variables, in X. The difference between the joint entropy and the sum of the residual entropies is all the entropy that is shared between at least two elements of X (i.e., is redundantly common to two or more elements). Curiously, while total correlation monotonically increases as X transitions from randomness to synchrony, the dual total correlation is low both for totally random, and totally synchronized systems, peaking when X is dominated by shared information.

Rosas et al.²⁴, propose that the difference between TC(X) and DTC(X) (first explored by James and Crutchfield as the enigmatic information³¹) could provide a measure of the overall balance between redundancy and synergy in multivariate systems: if TC(X) > DTC(X), then the global constraints on the system dominate and force a redundant dynamic, while if TC(X) < DTC(X) the system is dominated by information that is both shared, but not redundant. Rosas et al., rechristen this measure the organizational information:

$$\Omega ({{{{{{{\bf{X}}}}}}}})=TC({{{{{{{\bf{X}}}}}}}})-{{{{{\mathrm{DTC}}}}}}({{{{{{{\bf{X}}}}}}}})$$

(8)

In the specific case of three variables, Ω(X₁, X₂, X₃) is equivalent to the co-information²⁴, which Williams and Beer showed is itself equivalent to the redundancy minus the synergy⁵¹. This is in keeping with the intuition that positive O-information implies a redundancy-dominated structure and a negative O-information implies a synergy-dominated structure, although the direct link between Ω and the co-information is only direct for three variables and the measures are not identical for larger sets.

While O-information has been applied in a variety of contexts (such as to questions about the aging brain²², information flow in neuronal circuits⁵², and music composition¹⁶), there remains considerable uncertainty around how synergy should be intuitively understood. To help elucidate the answer, we relate O-information to the original measure of integration/segregation balance proposed by Tononi, Sporns, and Edelman: the TSE complexity⁹ and show that a geometric interpretation of the O-information exists that brings with it a novel perspective on redundancy and synergy.

The TSE-complexity admits two formulations:

$${{{{{\mathrm{TSE}}}}}}({{{{{{{\bf{X}}}}}}}})=\mathop{\sum }\limits_{i=1}^{\lfloor N/2\rfloor }{\mathbb{E}}{[I({{{{{{{{\bf{X}}}}}}}}}^{\gamma };{{{{{{{{\bf{X}}}}}}}}}^{-\gamma })]}_{| \gamma | = i}$$

(9)

$$=\mathop{\sum }\limits_{i=1}^{N}{\left(\frac{i}{N}{{{{{\mathrm{TC}}}}}}({{{{{{{\bf{X}}}}}}}})-{\mathbb{E}}[{{{{{\mathrm{TC}}}}}}({{{{{{{{\bf{X}}}}}}}}}^{\gamma })]\right)}_{| \gamma | = i}$$

(10)

The first (Eq. (9)) defines the TSE complexity as the average mutual information between the pairs of every possible bipartition of the system X. For every integer i between 1 and ⌊n/2⌋, we compute all possible subsets of X with i elements (notated by X^γ) and compute the mutual information between that set and it’s complement (X^−γ). The second equation (Eq. (10)) provides an alternative interpretation: the TSE complexity quantifies the difference, at every scale, between the expected integration of the scale if the system were fully integrated, and the actual integration of that scale (calculated as the average total correlation of every subset of size k). In this interpretation, the TSE complexity is highest when the smallest scales are relatively dis-integrated, but the macro-scales are relatively more integrated. This balance of integration and segregation is emblematic of TSE complexity. For a visualization of the TSE complexity calculation as the difference between the expected and empirical values, see Fig. 1.

**Fig. 1: Understanding O-information in the context of the TSE complexity.**

Computing the full TSE complexity itself requires analyzing every possible subsystem (or bipartition) of X: an insurmountable task for all but the smallest networks, as the combinatorics grow super-exponentially. A useful approximation is to look only at the second-to-top layer of the full TSE complexity summation, which only requires finding the average total correlation for the N sets X⁻ⁱ (where X⁻ⁱ is every X ∈ X excluding X_i). We refer to this measure as the description complexity of X^10,53. Formally:

$$C({{{{{{{\bf{X}}}}}}}}):= {{{{{\mathrm{TC}}}}}}({{{{{{{\bf{X}}}}}}}})-\frac{{{{{{\mathrm{TC}}}}}}({{{{{{{\bf{X}}}}}}}})}{N}-{\mathbb{E}}[{{{{{\mathrm{TC}}}}}}({{{{{{{{\bf{X}}}}}}}}}^{-i})]$$

(11)

The definition of C(X) can be understood as the successive pruning of information: the first term, TC(X), is the total integration of X. The second term, − TC(X)/N, is the expected decrease in integration associated with a single element (on average). Finally, $-{\mathbb{E}}[{{{{{\mathrm{TC}}}}}}({{{{{{{{\bf{X}}}}}}}}}^{-i})]$, is the actual decrease in integrated associated with removing every element on its own. C, then, computes the difference between the expected decrease in integration associated with removing a single node and the actual decrease. C has several obvious conceptual parallels with the DTC and there is indeed an analytic relationship between DTC and C (for proof, see Supplementary Note 1):

$${{{{{\mathrm{DTC}}}}}}({{{{{{{\bf{X}}}}}}}})=N\times C({{{{{{{\bf{X}}}}}}}})$$

(12)

This result was independently derived in ref. ⁵⁴. The relationship between DTC and C allows us to rewrite the O-information purely in terms of total correlations:

$$\Omega ({{{{{{{\bf{X}}}}}}}})={{{{{\mathrm{TC}}}}}}({{{{{{{\bf{X}}}}}}}})-N\times C({{{{{{{\bf{X}}}}}}}})$$

(13)

$$=(2-N){{{{{\mathrm{TC}}}}}}({{{{{{{\bf{X}}}}}}}})+\mathop{\sum }\limits_{i=1}^{N}{{{{{\mathrm{TC}}}}}}({{{{{{{{\bf{X}}}}}}}}}^{-i})$$

(14)

This allows us to re-conceptualize redundancy- and synergy-dominance in terms of just redundancy: synergistic information is information that is redundantly present in large ensembles of elements considered jointly but not in any subset of those ensembles. This is conceptually very similar to the definition of synergy provided by the partial information decomposition²⁷, which defines synergy in terms of redundant information shared by higher-order collections of elements. We can also propose a geometric interpretation of the sign of the O-information: based on Eqs. (8) and (13), we can see that Ω(X) < 0 ⇔ TC/N < C and Ω(X) > 0 ⇔ TC/N > C. This means that a system X is synergy-dominated if the removal of a single element (on average) decreases the integration of the remaining N − 1 elements more than would be expected in the null case of a totally integrated system. The two possible cases (redundancy-dominated, with Ω > 0 and synergy-dominated, with Ω < 0) are visualized and discussed in the context of the TSE complexity in Fig. 1.

The framing of the O-information in terms of the change to integration after removal of individual elements also has conceptual links to the so-called gradients of O-information⁵⁵. Scagliarini et al., explore how individual elements can contribute redundantly or synergistically to the O-information, defining the gradient as the difference between the O-information of an ensemble X and the O-information when single elements X_i are excluded. While a detailed analytic exploration of the link is beyond the scope of this paper, the property of gradients yield valuable insights into the structure of higher-order dependencies in complex systems.

Another heuristic approximation of the TSE complexity is the sum of the total correlation and dual total correlation. Following the notation from Rosas et al.:

$$\Sigma ({{{{{{{\bf{X}}}}}}}})={{{{{\mathrm{TC}}}}}}({{{{{{{\bf{X}}}}}}}})+{{{{{\mathrm{DTC}}}}}}({{{{{{{\bf{X}}}}}}}})$$

(15)

James et al. previously termed this measure the exogenous information and described it as a very mutual information: quantifying all of the shared dependencies between each single variable and every other subset of the system:

$$\Sigma ({{{{{{{\bf{X}}}}}}}})=\mathop{\sum }\limits_{i=1}^{N}I({X}_{i};{{{{{{{{\bf{X}}}}}}}}}^{-i})$$

(16)

Given the obvious similarity to Eq. (9), Rosas et al., hypothesized that Σ(X) ∝ TSE(X), which was verified to hold in simple simulations with small N²⁴. By leveraging the Gaussian assumptions here, we can empirically estimate the correlation between TSE and exogenous information and assess how well the relationship holds as N gets large. Figure 2 confirms the strong correlations between TSE complexity with both TC + DTC and DTC alone. These correlations hold over a range of subset sizes, from three to fifteen elements.

**Fig. 2: Approximating TSE complexity with total correlation and dual total correlation.**

fMRI results

We set out to identify subsystems (subsets of dynamically interacting elements) that express negative O-information (synergy) in the human brain. Leveraging Gaussian assumptions⁴² (see Methods), multivariate information theoretic measures can be estimated from covariance (correlation) matrices expressing empirically recorded functional connectivity (FC). We computed long-time averages of FC derived from two normative samples of human resting-state fMRI, the Human Connectome Project (main data set⁵⁶) and an open-source multimodal MRI dataset for Microstructure-Informed Connectomics (MICA-MICs; replication data set;⁵⁷). For both data sets we computed a single FC matrix (HCP: 95 participants, 4 runs each; MICA-MICs: 50 participants, 1 run each). Both FC covered the entire cerebral cortex parcellated into a common set of 200 nodes⁵⁸ and node time series were derived from BOLD signals after performing global signal regression, which removes signal components that are common to all nodes in the system, i.e., globally redundant (Supplementary Fig. 1)⁵⁹. For a brief discussion of global signal regression in this context, see Supplementary Fig. 6).

Computing O-Information on the full-size 200-node FC matrix results in positive quantities for both data sets (HCP: Ω = 79.16 nats; MICA: Ω = 46.69 nats), indicating that the full structure is redundancy-dominated, which might potentially obscure the presence of higher-order, synergistic correlations. We asked if smaller subsets of nodes were present within the full-size FC that generated synergy, or negative O-information. Random sampling of small subsets (between 3 and 16 nodes) indeed yields abundant subsets that express negative O-information (Fig. 3a). Their relative abundance declines rapidly with growing subset size, reflecting the increasing dominance of redundant information and exhaustive capture of unique information. While synergistic subsets account for rapidly diminishing fractions of all subsets, their total number can be non-negligible (10-node subsets: 0.41 percent and 9.23 × 10¹³, respectively). In a large random sample of 10-node subsets, the O-information is positively correlated with TSE complexity (Fig. 3b; ρ = 0.642, p = 0; HCP data). Focusing on a separate random sample of 5000 10-node subsets with negative O-information, we asked if the frequency with which pairs of nodes participate in such subsets is related to their pairwise FC. Indeed, the absolute pairwise FC is strongly negatively correlated with the frequency of participation in synergistic subsets (ρ = − 0.504, p = 0, HCP; ρ = − 0.485, p = 0, MICA, HCP; Fig. 3c). This indicates that strongly positive or negative FC between two nodes makes their joint inclusion in a synergistic subset unlikely, while node pairs with low FC magnitude could either be truly disintegrated, or participating in a highly synergistic subsystem.

**Fig. 3: Information measures computed from randomly sampled 10-node subsets.**

Participation of nodes in randomly sampled synergistic subsets varies systematically across the cortex. Over a large random sample of 100,000 10-node subsets, all nodes participate at least once, with several nodes participating in more than 10,000 distinct synergistic subsets. Hence, the complete repertoire of co-expressed synergistic subsets covers the entire cortex, with some overlap between subsets, centered on high-participation nodes that form “focal points” or clusters (Fig. 4a). Projecting the participation of individual nodes (brain regions) onto the cortical surface shows significant consistency between HCP (Fig. 4b) and MICA data (Supplementary Fig. 4) (the two maps are correlated with ρ = 0.579, p = 2.5 × 10⁻¹⁹, between the two data sets). Functional systems⁶⁰ distribute unevenly as well, with highest frequencies of participation found in the frontoparietal (FP) system, for synergistic subsets of 10 nodes (HCP: Fig. 4c; MICA: Supplementary Fig. 2). For larger subset sizes, participation of limbic (LIM) regions dominates over FP regions.

**Fig. 4: Topography and functional specialization of randomly sampled synergistic subsets in the brain.**

Combinatorics prevent exhaustive exploration of subsets of even modest sizes, and the random sampling strategy employed so far is likely to miss subsets that express maximal synergy. To identify subsets with maximally negative O-information (maximal synergy), we used an optimization algorithm based on simulated annealing (references and details are contained in the Methods section). Multiple runs of the algorithm yielded consistent and highly similar outcomes (Supplementary Fig. 3), indicating convergence of the optimization while again highlighting the existence of a large reservoir of non-identical (degenerate) subsets, all expressing highly negative O-information. Deploying this algorithm while varying subset sizes between 3 and 30 nodes, we identified large numbers of subsets that express highly negative O-information, for subset sizes 3-24 nodes (HCP; Fig. 5a) and 3–27 nodes (MICA; Supplementary Fig. 4). No synergistic subsets are found for subset sizes greater than 27 nodes, as redundancy starts to overwhelm the unique informational contributions of individual nodes at larger subset sizes.

**Fig. 5: O-Information, brain topography, and functional specialization of optimally synergistic subsets identified by simulated annealing All panels show data from the HCP sample.**

To validate that our optimization algorithm was observing truly synergistic ensembles, we tested each optimized subsystem against a null (see Materials and Methods E). Since the ensemble size k is fixed by the optimization algorithm, it is possible that the apparent synergy of that ensemble is actually due to some subset of nodes within that ensemble (for example, a system of three synergistic elements and two independent elements will appear to be a synergistic system of k = 5, however, the real synergy is only in the three entangled elements). To ensure that all elements in the ensemble contributed to the synergy, we only considered a set valid if it was impossible to remove any node without the O-information increasing (i.e., the contribution of each element was synergy-dominated). We found that, for small value of k, the vast majority of optimized ensembles were valid (≈99.08% for ensembles of size four, ≈92.92% for ensembles of size six, ≈84.14% for ensembles of size eight, and ≈64.04% for the maximally synergistic ensemble size of ten). For collections much larger than ten, the proportion of valid systems decreased rapidly: for ensembles of size fifteen, only ≈0.04% were valid, and there were no valid ensembles of size greater than fifteen, despite the fact that the simulated annealing algorithm returned a large number of results with Ω < 0. This implies that, although these larger subsystems are synergy dominated, that synergy is restricted to a core set of components and not attributable to the whole.

Minimal O-information was achieved for subsets comprising ~10 nodes for both data sets. Mapping subsets of nodes expressing near minimal O-information onto a surface plot of the cerebral cortex reveals consistent topography. Figure 5b shows the frequency with which individual cortical parcels (nodes) were identified across 5000 runs of the optimization algorithm, yielding 4021 unique solutions (HCP Fig. 5b; 4166 unique solutions for MICA data, Supplementary Fig. 4b). Brain-wide nodal frequencies are significantly correlated across HCP and MICA data sets (Spearman’s ρ = 0.522, p = 2.2 × 10⁻¹⁵). When mapping these nodal frequencies to seven canonical resting-state functional systems⁶⁰, we find that each of these seven systems contributes, but to different extent. In HCP data, for optimally synergistic 10-node subsets, the visual, frontoparietal and default mode networks are over-represented, while only the FP system appears over-represented in the MICA data (Supplementary Fig. 5).

The nature of negative O-information (synergy) requires that individual nodes make largely unique (non-redundant) contributions to the multivariate information metric. This suggests that nodes derived from different, informationally distinct (intrinsically redundant, but extrinsically non-redundant) functional communities might be favored as constituents of synergistic subsets. To test this hypothesis, we created sets of 20,000 randomly sampled subsets that were comprised of nodes derived from between 1 and all 7 canonical functional systems (HCP, Fig. 5c; MICA, Supplementary Fig. 4c). The mean O-information, across all randomly chosen subsets, was found to be positive regardless of how many FC systems were included in the subsets. For samples derived from just 1 FC system, the O-information was most positive (i.e., subsets were most redundancy-dominated) for visual, somatomotor and attention systems, and they were least redundancy-dominated for default, frontoparietal and limbic systems. Importantly, the mean O-information decreased, and the fraction of synergstic subsets increased, as subsets were sampled from larger numbers of canonical systems. No subset derived from a single functional system was capable of expressing synergy. Subsets spanning 6 or 7 canonical systems were most likely to express synergy, as indexed by the fraction of negative O-information encountered in the sample. The finding supports the notion that dividing the brain into canonical functional systems prioritizes grouping nodes by redundant over synergistic information, hence missing a potentially important substrate for neural computation.

Discussion

In this paper, we have shown how the O-information²⁴, a measure of higher-order interactions in multivariate data, can reveal synergistic ensembles of brain regions that are invisible to bivariate functional connectivity analyses. Our primary theoretical result is to provide a geometric interpretation of the O-information. The interpretation unifies multiple disparate measures of multivariate information into a single framework, built around the Tononi-Sporns-Edelman complexity⁹. By re-writing Ω(X) and DTC(X) in terms of the total correlation between multiple subsets of X, we find that synergy occurs when removing any single element causes the system to become less integrated, and crucially, more-so than would be expected if structure was uniformly distributed over X. Said differently, synergy can be intuitively understood as that integration that is present in the whole but not smaller subsets (in this case, the N subsets created by removing each X_i). In this sense, synergy captures how the whole can be greater than the sum of it’s parts⁶¹. This intuition is conceptually similar to the formal definition of synergy from the partial information decomposition framework²⁷, which defines synergy as the information left over when everything accessible in simpler combinations of sources has been accounted for. The exclusive use of total correlations also allows us to consider the O-information purely in terms of Kullback-Leibler divergences from independent to joint probability distributions (Eq. (5). This shows us that all of these measures can be understood in the context of inferences about structure (relative to a disintegrated prior). In the context of synergy, the extra information in the joint state is information about something: specifically about the relative likelihood of a configuration with respect to the maximum entropy case.

Applied to two separate fMRI brain data sets we find that synergistic subsets of brain regions are ubiquitous and abundant, spanning scales between 3 and 25 regions and extending over the entire cerebral cortex. While redundant interactions dominate functional connectivity at larger subset sizes, the application of multivariate information measures demonstrates a previously hidden repertoire of synergistic ensembles, each integrating diverse and distinct sources of information. Recent work by Luppi et al.,²¹ proposed a synergistic core to the human brain where complex processing occurs. While we found that there is significant over-representation of specific regions (including portions reported by Luppi et al., such as prefrontal cortex, occipital pole, the precuneus, and cingulate regions), synergy-dominated subsystems could include any region of cortex, although some regions contribute more reliably than others. This suggests that synergy is a widespread property of multivariate information emerging from resting-state brain activity. While there is discrepancy between our results and those of Luppi et al., this difference is likely a reflection of the different analytical pipelines, rather than a true conflict. Their approach was based on decomposing the temporal mutual information, which considers dependencies between past and future, while our approach does not. Luppi et al., also only considered pairs of regions co-evolving together, while we considered larger ensembles. Our approach brings into view synergies of a much higher order than would be possible in the approach by Luppi et al. Finally, the prior analysis is based on a generalization of the partial information decomposition and requires choosing one of several redundancy functions. It is unknown whether the reported results would hold for all plausible redundant information functions or not. Consequently, different results likely reflect different kinds of synergies (temporal, pairwise vs. instantaneous, higher-order) that can co-exist in the brain.

Information theoretic measures are not the only approaches to higher-order structure in brain activity. Recently, Santoro et al.,⁶² proposed a metric they term hyper-coherence, which describes higher-order co-fluctuations between sets of three and four regions. Based on the edge time series framework^63,64,65, the hyper-coherence defines a higher-order activity in terms of simplicial complexes of coherent activations. When applied to brain data, Santoro et al., found that hyper-coherence was relatively lower in systems that we found to be relatively higher in synergy (frontal and default mode regions), and relatively higher in regions we found to have high average O-information (somato-motor regions). We conjecture that the hyper-coherence framework is probably preferentially sensitive to redundancies rather than synergies. This would be consistent with recent results from Varley et al., who found that pairwise co-fluctuations were positively correlated with redundant information and anti-correlated with synergistic information⁶⁶. An avenue of future research may be to attempt to apply the filtering approach Santoro et al., use to the O-information or other multivariate information measures.

Interestingly, the randomly sampled ensembles that were most likely to be synergy dominated were those that involved nodes that spanned multiple canonical subsystems, while sets of regions all within one system were strongly dominated by redundancy (Fig. 5c). This would be consistent with the hypothesis that functional connectivity, when viewed entirely as bivariate interaction, is largely sensitive to redundant, but insensitive to synergistic, dependencies between brain regions. Consequently, the functional connectivity matrix is not a complete map of the statistical structure in a dataset, but only of dependencies characterized by redundancy. This is consistent with findings from Ince⁶⁷, Finn and Lizier⁶⁸, and Varley et al.⁶⁶, who argued that bivariate correlations are intrinsically redundancy-dominated. Higher-order synergies represent, in a sense, a kind of shadow structure and consequently are missed by network-focused approaches that omit higher-order interactions. This hypothesis finds some support in ref. ²¹, who found that the distribution of synergies was anticorrelated with the functional connectivity network structure, while the distribution of redundancies was positively correlated.

Given the novelty of tools like the O-information, the significance of these synergistic dependencies remains almost entirely unknown, although the small number of studies to date suggest intriguing patterns. One study found alterations to the redundancy/synergy bias across the human lifespan²², while other studies have suggested that loss of consciousness induced by propofol is associated with decreased synergistic dynamics²⁰. Future avenues of work include deeper analyses of how higher-order dynamics change between rest and task conditions, in cases of psychopathology or brain injury, and non-human animals. We should note that, in the context of the O-information, synergy is not necessarily a causal measure: in related contexts, synergy has been discussed as a measure of computation in neural circuits by Sherrill et al., Newman et al., and others^{13,14,15,17,26,69}, although it remains unexplored how exactly these two approaches relate to each-other. The O-information is an atemporal measure, sensitive to instantaneous, higher-order correlation structures, but with no notion of dynamics or a flow from past to future. In contrast, the work by Sherill et al., is done in the context of information dynamics²⁵ and considers how the past informs on the future. Future research may explore how a synergistic correlation structure might facilitate computations within the system over time.

In addition to the insights into synergy specifically, the results presented here also have implications for researchers interested in multivariate information theoretic analyses. For example, the TSE complexity has long been an object of theoretical interest⁵⁴, but the intractable combinatorics have limited its applicability in empirical data (although its use is not unheard of⁷⁰). The finding that the exogenous information Σ(X) ∝ TSE(X) for reasonably large N (first reported in ref. ²⁴), even more so than the original heuristic C, opens the door to applications in experimental neuroscience. The nature of this correlation merits further study as well. One outstanding question is how redundancies and synergies in the data differentially influence the relationship between Σ(X) and TSE(X). Unlike the O-information, the S-information does not obviously link to redundancies or synergies, and so how these kinds of integration impact the relationship to TSE remains unknown. Future work developing generative models with precisely controllable distributions of redundancies and synergies may shed light on this question.

In a broader scientific context, our work contributes to the increasing interest in higher-order interactions, beyond the standard, pairwise network model^8,71. The information-theoretic approach (such as the work reported here, as well as in refs. ^{16,20,21,52,69,72}) is based largely on a statistical inference, while alternative frameworks based on simplicial complexes, algebraic topology, and hypergraphs has been developed largely in parallel^{7,41,73,74,75,76}. How these different mathematical frameworks relate to each other remains an open question, and the potential for a more unified approach to understanding higher-order interactions both in terms of topology and statistical inferences is an alluring promise.

The optimization of maximally synergistic subsets via simulated annealing can be thought of as an attempt to find a maximally efficient, dimensionally reduced representation of a potentially large data set: when modeling a system, it is generally desirable to capture as many statistical dependencies as possible with the fewest required degrees of freedom. By finding a representation that incorporates synergies while simultaneously pruning redundant information that would be double counted, we can attempt to build the most computationally efficient model of a system under study^77,78. While dimensionality reduction and feature selection algorithms are widespread in many computational sciences, a rigorous treatment of the ways that synergistic and redundant information can inform the analysis of brain dynamics and functional networks remains a space of active development (for an example, see refs. ^78,79).

The O-information scales far more gracefully than related measures of synergistic information (such as the partial information decomposition, which is practically impossible to apply to systems larger than 5 elements²⁸). However, the combinatorics associated with assessing every possible subsystem becomes intractable as the system size grows, an issue first noted for the TSE complexity. In standard functional and effective network research, it is common to compute all pairwise interactions (which only grows with N²), and then filter out spurious edges as needed⁷⁹. While this may be possible for very small subsystems, it is intractable for larger ones. If one can pre-select a set of elements, then the computation of O-information is trivial up to hundreds of items. However, the requirement to select subsets of interest can itself be computationally intensive and time-consuming. Consequently heuristic measures such as optimization, random sampling, or pre-filtering subsystems to exclude collections of elements will be required.

Since the O-information is a measure of relative redundancy/synergy dominance, in highly redundant data, synergistic structures may not be strong enough to dominate the signal, resulting in a positive (redundancy-dominated) O-information. By adding increasing amounts of low-frequency redundancy to the BOLD data, and re-running the optimizations, we found that the maximally synergistic subsets extracted from the uncontaminated data became impossible to retrieve (see Supplementary Fig. 6). Those synergies still existed, they were merely swamped by redundancy and made invisible. Adding global signal back in this way provides a new insight into a commonly used step in fMRI image pre-processing: global signal regression (GSR)⁸⁰. We argue that GSR can be understood as scrubbing global redundancies from the data, and in doing so may reveal previously buried synergies that would not have been accessible in the original, unprocessed data.

One limitation of this study is that it is hard to disambiguate between information that reflects computation in neural tissue, versus what is attributable to the vascular physiology of the BOLD signal. Recent work by Colenbier has shown that there are synergistic interactions between the global signal, blood arrival times, and functional connectivity structure⁵⁹. Since the pairwise covariance forms the foundation of the multivariate Gaussian entropy estimator, it is likely that the same confounds influence the estimates of entropy and mutual information. Future work replicating these results using electrophysiological recordings such as M/EEG should help untangle this issue. Another limitation is that it operates on static distributions: every frame is assumed to have been drawn from an unchanging multivariate Gaussian distribution, with no memory or dynamics from moment to moment. This is a standard assumption in functional connectivity analyses, although there is growing interest in the limitations this assumption produces and the need for analyses that explicitly account for dynamics⁸¹. The field of information dynamics provides a number of relevant analyses^82,83, and there is already interest in higher-order dynamics in the brain: in addition to the aforementioned work by Luppi et al., recent work by Faes et al. proposed a derivative of the O-information for rhythmic processes (the O-information rate, or OIR)⁸⁴. The OIR has been used the describe brain-heart interaction dynamics and opens up the frequency domain to higher-order, informational analysis in addition to the time domain. Similarly, an application of the O-information to the dynamic measure of transfer entropy has been proposed and applied to optimizing ensembles of maximally synergistic or redundant neurons⁵². Both of these measures could be incorporated into a pipeline line the one described here and may shed light on the similarities (and differences) between dynamic and static analyses. Despite its limitations, however, we are confident that the classic, static O-information likely contains a wealth of as-yet unexplored structure and will continue to provide insights into brain structure and function.

In this article, we demonstrate how an information-theoretic measure of multivariate interactions (the O-information or synergy) can be used to uncover higher-order interactions in the human brain dynamics. We analytically show that the O-information can be related to an older measure of systemic complexity, the TSE complexity, and from this derive a novel geometric interpretation of redundancy- and synergy-dominated systems. With a combination of random sampling and optimization, we show that a large number of subsystems displaying synergistic dynamics exist in the human brain and that these systems form a highly distributed shadow structure that is entirely overlooked in standard, bivariate functional connectivity models. We conclude that the space of higher-order interactions in the human brain represents a large, and under-explored area of study with a rich potential for new discoveries and experimental work.

Methods

Gaussian information theory

In this paper, we focus on higher-order information sharing in fMRI BOLD signals. Since BOLD data is continuous (rather than discrete), to quantify the entropy of a continuous signal, we use a generalization of the the classic, discrete Shannon entropy (Eq. (1)): the differential entropy:

$$H(X)={\int}_{x\in {{{{{{{\mathcal{X}}}}}}}}}P(x)\log P(x)dx$$

(17)

Computing the differential entropy from empirical data is generally difficult, as it requires estimating P(x). However, if one is willing to make assumptions of multivariate normality, closed-form estimators of the Gaussian joint entropy can be leveraged.

Prior work has established that BOLD data is well-modeled by multivariate Gaussian distributions^85,86 and that more complex and highly parameterized models provide little additional benefit⁸⁷. While information theory was originally formalized in the context of discrete random variables, in the specific case of Gaussian random variables, closed-form estimators exist for almost all the standard information measures (for an accessible review, see⁸² supplementary material). For a univariate, Gaussian random variable $X \sim {{{{{{{\mathcal{N}}}}}}}}(\mu ,\sigma )$, the entropy (given in nats) is defined as:

$${H}^{{{{{{{{\mathcal{N}}}}}}}}}(X)=\frac{\ln (2\pi e{\sigma }^{2})}{2}$$

(18)

For a multivariate Gaussian random variable X = {X₁, X₂, . . . X_N}, the joint entropy is given by:

$${H}^{{{{{{{{\mathcal{N}}}}}}}}}({{{{{{{\bf{X}}}}}}}})=\frac{\ln [{(2\pi e)}^{N}| \Sigma | ]}{2}$$

(19)

where ∣Σ∣ refers to the determinant of the covariance matrix of X. The bivariate mutual information (nats) between X₁ and X₂ is:

$${I}^{{{{{{{{\mathcal{N}}}}}}}}}({X}_{1};{X}_{2})=\frac{-\ln (1-{\rho }^{2})}{2}$$

(20)

where ρ is the Pearson correlation coefficient between X₁ and X₂. Note that, since the mutual information is a function of ρ for Gaussian variables, this special case of mutual information is not generally sensitive to non-linear relationships in the data in the way that non-parametric estimators are. Finally, the Gaussian estimator for total correlation is:

$$T{C}^{{{{{{{{\mathcal{N}}}}}}}}}({{{{{{{\bf{X}}}}}}}})=\frac{-\ln (| \Sigma | )}{2}$$

(21)

From these, it is possible to calculate all of the measures described above (dual total correlation, description complexity, O-information, and TSE complexity) for multivariate Gaussian variables. While the assumption of linearity that comes with a parametric Gaussian model can be limiting, the standard technique for assessing functional connectivity (the Pearson correlation coefficient) makes identical assumptions, so our work is consistent with assumptions made when applying standard approaches to FC analysis.

Building an intuitive understanding of synergy in the context of linear systems is difficult, since a multivariate Gaussian is defined in terms of pairwise covariances. Barrett showed that higher-order synergies can exist in purely Gaussian systems and that redundancy is related to the mutual information⁸⁸, so even in linear systems, beyond-pairwise dependencies can exist. This can be partly understood by recognizing that the multivariate Gaussian is the maximum entropy distribution subject to the constraints of pairwise covariance⁴². So, while pairwise linear relationships are enough to uniquely specify the distribution, they do not rule out the possibility that beyond-pairwise relationships exist. They do, however, fix the structure of those higher-order dependencies.

Datasets

Two independent fMRI resting state data sets were employed in the empirical analyses, one derived from the Human Connectome Project (HCP data⁵⁶) and the other from a recently published open-source repository (MICA⁵⁷). The HCP data, derived from a set of 100 unrelated subjects, have been used in several previous studies (for more detailed description see ref. ⁸⁹). All participants provided informed consent, and the Washington University Institutional Review Board approved all of the study protocols and procedures. A Siemens 3T Connectom Skyra equipped with a 32-channel head coil was used to collect data. Resting-state functional MRI (rs-fMRI) data was acquired during four scans on two separate days. This was done with a gradient-echo echo-planar imaging (EPI) sequence (scan duration: 14:33 min; eyes open). Acquisition parameters of TR = 720 ms, TE = 33.1 ms, 52^∘ flip angle, isotropic voxel resolution = 2 mm, with a multiband factor of 8 were used for data collection. A parcellation scheme covering the cerebral cortex developed in ref. ⁵⁸ was used to map functional data to 200 regions. This parcellation can also be aligned to the canonical resting state networks found in ref. ⁶⁰.

Of the 100 unrelated subjects considered in the original dataset, 95 were retained for inclusion in empirical analysis in this study. Exclusion criteria were established before the present study was conducted. They included the mean and mean absolute deviation of the relative root mean square (RMS) motion across either four resting-state MRI scans or one diffusion MRI scan, resulting in four summary motion measures. Subjects that exceeded 1.5 times the interquartile range (in the adverse direction) of the measurement distribution in two or more of these measures were excluded. Following these criteria, four subjects were excluded. Due to a software error during diffusion MRI processing, one additional subject was excluded. The remaining 95 subjects were 56% female, had a mean age of 29.29 ± 3.66, and an age range of 22 to 36.

The MICA dataset includes 50 unrelated subjects, who also provided written informed consent. The study was approved by the Ethics Committee of the Montreal Neurological Institute and Hospital. Resting state data was collected in a single scan session using a 3T Siemens Magnetom Prisma-Fit with a 64-channel head coil. Resting state scans lasted for 7 minutes during which participants were instructed to look at a fixation cross. Imaging was completed with an EPI sequence, and acquisition parameters of TR = 600 ms, TE = 48 ms, 52^∘ flip angle, isotropic voxel resolution = 3 mm, and multiband factor 6. The parcellation used in this dataset was the same as the one used for the HCP data (described above).

Preprocessing

Minimal preprocessing of the HCP rs-fMRI data followed these steps⁹⁰: (1) distortion, susceptibility, and motion correction; (2) registration to subjects’ respective T1-weighted data; (3) bias and intensity normalization; (4) projection onto the 32k_fs_LR mesh; and (5) alignment to common space with a multimodal surface registration⁹¹. The preprocessing steps described produced an ICA+FIX time series in the CIFTI grayordinate coordinate system. Two additional preprocessing steps were performed: (6) global signal regression and 7) detrending and band pass filtering (0.008 to 0.08 Hz)⁹². After confound regression and filtering, the first and last 50 frames of the time series were discarded, resulting in a final scan length of 13.2 min (1100 frames).

Preprocessing of the MICA dataset was performed as described in ref. ⁵⁷ for resting state data. Briefly, the data was passed through the Micapipe⁹³ processing pipeline, which includes motion and distortion correction, as well as FSL’s ICA FIX tool trained with an in-house classifier. Time series were projected to each subject’s FreeSurfer surface, where nodes were also defined. Further details about the processing pipeline can be found in ref. ⁹³. The data was global signal regressed in addition to the other preprocessing steps described in this pipeline.

For calculating the covariance matrix used in computing O-information, total correlation and dual total correlation, the functional data from all scans and all subjects were combined to create a single COV or FC matrix. Aggregation was carried out by appending the nodal time series across all subjects and runs and then calculating a single Pearson correlation for each node pair. An alternative approach (taking the mean over the single-run, single-subject COV/FC matrices) yielded virtually identical results. Following preprocessing and using the common 200-node parcellation of cerebral cortex, the mean COV/FC matrices for the HCP and MICA data sets were highly correlated (R = 0.851, p = 0).

Random sampling and optimization

Subsets of regions were selected from the full-size (200 nodes/regions) FC matrices in two ways, by random sampling and by search through optimization. Random sampling is simple to implement but because of the vast repertoire of potential subsets ($N\choose{k}$) it cannot fully disclose the extent of variations in informational measures present in the data. Instead, search under an objective function (optimization) can guide exploration to specific sub-spaces enriched in subsets with distinct informational signatures.

To perform optimizations we implemented a variant of simulated annealing⁹⁴. As objectives we chose multivariate informational measures such as the O-information (OI), total correlation (TC), and dual total correlation (DTC), which could be maximized or minimized. Each run of the simulated annealing algorithm was carried out in one FC matrix and for one subset size. We carried out 5000 runs, with subset sizes ranging from 3 to 30 nodes. A random selection of nodes was chosen according to the given subset size to initiate each run. The corresponding covariance matrix was extracted from the full COV/FC and used to compute the information theoretic metric of interest. The composition of the subset was then varied and variations were selected under the objective function. Annealing operates by selecting variations stochastically, depending on a temperature parameter that determines the amount of noise permitted in the selection process. Initially, the temperature is high, resulting in the somewhat random exploration of the landscape. As the temperature is lowered, the optimization becomes more deterministic, focusing more and more on local gradient descent. For each run the algorithm proceeded for a maximum of 10,000 steps. At each step, a new set of nodes was generated by randomly replacing nodes, with the number determined by a normal distribution (frequencies of 1, 2, and 3 element flips were 0.68, 0.27 and 0.04, respectively). A new covariance matrix was computed for the new set of nodes and the objective function was calculated for that set. The set was retained if its cost was lower than the current set or if a random number drawn from the uniform distribution between 0 and 1 was less than $\exp \left.\right(-(({C}_{n}-C)/{T}_{c})$, where C_n is the cost of the new set of nodes, C_L is the cost of the current set of nodes and T_c is the current temperature. At each step, the current temperature decays to a fraction of the initial temperature, as a function of the number of steps completed:

$${T}_{c}(h)={T}_{0}\times {({T}_{{{{{{\mathrm{exp}}}}}}})}^{h}$$

(22)

where T_c is the current temperature, T₀ is the initial temperature (set to T₀ = 1), T_exp governs the steepness of the temperature gradient, and h is the current iteration step. By decreasing the temperature at every step, the algorithm becomes progressively more deterministic.

Null model

Given some set X of k nodes with Ω(X) < 0, it is possible that not every X_i ∈ X actually contributes to the synergy (for example, if there are some X_i that are independent from every other node). However, this set may still be found as a solution for an optimally synergistic subset of k nodes by the simulated annealing algorithm. To ensure that the synergy found in a given subset is the maximally synergistic set of nodes, each node in the subset was removed from the subset in turn by setting the Pearson correlation of that node with all other nodes to zero. After removal of a node, the O-information was recalculated. If removal of any node decreased the O-information of the subset, then the subset was considered reducible to the k − 1 subset, and was not included in further analyses.

Statistics and reproducibility

All statistics were computed using MATLAB 2020 and MATLAB 2021. The code for reproducing results is provided Supplementary Software 1. Covariance matrices were computed from z-score BOLD time series and squared off to ensure symmetry by averaging each matrix and its own transpose. Information-theoretic estimators were computed using the formulae given in Sec. Gaussian Information Theory (all code provided).

Random sampling of ensembles was done for ensembles of size 3–16, with 100,000 samples done for each size. Annealing was done using the provided code, with 5000 replications for each ensemble size. All correlations computed using Spearman’s ρ.

Reporting summary

Further information on research design is available in the Nature Portfolio Reporting Summary linked to this article.

Data availability

All the data used here are available from the Human Connectome Project⁵⁶ (http://www.humanconnectomeproject.org/) and the Microstructure-Informed Connectomics Project⁵⁷ (https://osf.io/j532r/). Data for reproducing figures Figs. 3, 5a, c is included Supplementary Data 1 and Supplementary Data 2, respectively.

Code availability

MATLAB code for computing the TC, DTC, O-information, and S-information, as well as the simulated annealing, is attached to this manuscript as Supplementary Software 1.

Change history

07 June 2023
A Correction to this paper has been published: https://doi.org/10.1038/s42003-023-04984-y

References

Barabási, A. L., Pósfai, M. Network Science (Cambridge University Press, 2016).
Menczer, F., Fortunato, S. & Davis, C. A. A First Course in Network Science (Cambridge University Press, 2020).
Sporns, O. & Kötter, R. Motifs in brain networks. PLOS Biol. 2, e369 (2004).
Article PubMed PubMed Central Google Scholar
Watts, D. J. & Strogatz, S. H. Collective dynamics of ‘small-world’ networks. Nature 393, 440–442 (1998).
Article CAS PubMed Google Scholar
Fortunato, S. Community detection in graphs. Phys. Rep. 486, 75–174 (2010).
Article Google Scholar
Betzel, R. F. Community detection in network neuroscience. https://arxiv.org/abs/2011.06723 (2020).
Battiston, F. et al. Networks beyond pairwise interactions: structure and dynamics. Phys. Rep. 874, 1–92 (2020).
Article Google Scholar
Battiston, F. et al. The physics of higher-order interactions in complex systems. Nat. Phys. 17, 1093–1098 (2021)
Tononi, G., Sporns, O. & Edelman, G. M. A measure for brain complexity: relating functional segregation and integration in the nervous system. Proc. Natl Acad. Sci. USA 91, 5033–5037 (1994).
Article CAS PubMed PubMed Central Google Scholar
Tononi, G., Edelman, G. M. & Sporns, O. Complexity and coherency: integrating information in the brain. Trends Cogn. Sci. 2, 474–484 (1998).
Article CAS PubMed Google Scholar
Tononi, G. & Edelman, G. M. Schizophrenia and the mechanisms of conscious integration. Brain Res. Rev. 31, 391–400 (2000).
Article CAS PubMed Google Scholar
Timme, N. M. et al. High-degree neurons feed cortical computations. PLOS Comput. Biol. 12, e1004858 (2016).
Article PubMed PubMed Central Google Scholar
Faber, S. P., Timme, N. M., Beggs, J. M. & Newman, E. L. Computation is concentrated in rich clubs of local cortical networks. Netw. Neurosci. 3, 1–21 (2018).
Google Scholar
Sherrill, S. P., Timme, N. M., Beggs, J. M. & Newman, E. L. Partial information decomposition reveals that synergistic neural integration is greater downstream of recurrent information flow in organotypic cortical cultures. PLOS Comput. Biol. 17, e1009196 (2021).
Article CAS PubMed PubMed Central Google Scholar
Sherrill, S. P., Timme, N. M., Beggs, J. M. & Newman, E. L. Correlated activity favors synergistic processing in local cortical networks in vitro at synaptically relevant timescales. Netw. Neurosci. 4, 678–697 (2020).
Article PubMed PubMed Central Google Scholar
Scagliarini, T., Marinazzo, D., Guo, Y., Stramaglia, S. & Rosas, F. E. Quantifying high-order interdependencies on individual patterns via the local O-information: theory and applications to music analysis. Phys. Rev. Res. 4, 013184 (2022).
Article CAS Google Scholar
Varley, T. F., Sporns, O., Schaffelhofer, S., Scherberger, H. & Dann, B. Information-processing dynamics in neural networks of macaque cerebral cortex reflect cognitive state and behavior. Proc. Natl Acad. Scie. USA 120, e2207677120 (2023).
Rosas, F. E. et al. Reconciling emergences: an information-theoretic approach to identify causal emergence in multivariate data. PLOS Comput. Biol. 16, e1008289 (2020).
Article CAS PubMed PubMed Central Google Scholar
Varley, T., Sporns, O., Puce, A. & Beggs, J. Differential effects of propofol and ketamine on critical brain dynamics. PLOS Comput. Biol. 16, e1008418 (2020).
Article CAS PubMed PubMed Central Google Scholar
Luppi, A. I. et al. A synergistic workspace for human consciousness revealed by integrated information Decomposition. https://doi.org/10.1101/2020.11.25.398081 (2020).
Luppi, A. I. et al. A synergistic core for human brain evolution and cognition. Nat. Neurosci. 25, 771–782 (2022).
Gatica, M. et al. High-order interdependencies in the aging brain. Brain Connect. 1, 734–744 (2021).
Luppi, A. I. et al. What it is like to be a bit: an integrated information decomposition account of emergent mental phenomena. Neurosci. Conscious. 2021, niab027 (2021).
Rosas, F., Mediano, P. A. M., Gastpar, M. & Jensen, H. J. Quantifying high-order interdependencies via multivariate extensions of the mutual information. Phys. Rev. E 100, 032305 (2019).
Article CAS PubMed Google Scholar
Lizier, J. T., Flecker, B. & Williams, P. L. Towards a synergy-based approach to measuring information modification. https://arxiv.org/abs/1303.3440 (2013).
Newman, E. L., Varley, T. F., Parakkattu, V. K., Sherrill, S. P. & Beggs, J. M. Revealing the dynamics of neural information processing with multivariate information decomposition. Entropy 24, 930 (2022).
Article PubMed PubMed Central Google Scholar
Williams, P. L. & Beer, R. D. Nonnegative decomposition of multivariate information. https://arxiv.org/abs/1004.2515 (2010).
Gutknecht, A. J., Wibral, M. & Makkeh, A. Bits and pieces: understanding information decomposition from part-whole relationships and formal logic. Proc. R Soc. A Math. Phys. Eng. Sci. 477, 20210110 (2021).
Kolchinsky, A. A novel approach to the partial information decomposition. Entropy 24, 403 (2022).
Article PubMed PubMed Central Google Scholar
Kay, J. W., Schulz, J. M. & Phillips, W. A. A comparison of partial information decompositions using data from real and simulated layer 5b pyramidal cells. Entropy 24, 1021 (2022).
Article PubMed PubMed Central Google Scholar
James, R. G., Ellison, C. J. & Crutchfield, J. P. Anatomy of a bit: information in a time series observation. Chaos: Interdiscip. J. Nonlinear Sci. 21, 037109 (2011).
Article Google Scholar
Deco, G., Tononi, G., Boly, M. & Kringelbach, M. L. Rethinking segregation and integration: contributions of whole-brain modelling. Nat. Rev. Neurosci. 16, 430–439 (2015).
Article CAS PubMed Google Scholar
Shine, J. M. Neuromodulatory influences on integration and segregation in the brain. Trends Cogn. Sci. 23, 572–583 (2019).
Article PubMed Google Scholar
Massimini, M. et al. Breakdown of cortical effective connectivity during sleep. Science (N. Y., NY) 309, 2228–2232 (2005).
Article CAS Google Scholar
Casali, A. G. et al. A theoretically based index of consciousness independent of sensory processing and behavior. Sci. Transl. Med. 5, 198ra105–198ra105 (2013).
Article PubMed Google Scholar
Sarasso, S. et al. Consciousness and complexity: a consilience of evidence. Neurosci. Conscious. https://doi.org/10.1093/nc/niab023 (2021).
Luppi, A. I. et al. Consciousness-specific dynamic interactions of brain integration and functional diversity. Nat. Commun. 10, 1–12 (2019).
Article CAS Google Scholar
Luppi, A. I. et al. LSD alters dynamic integration and segregation in the human brain. NeuroImage 227, 117653 (2021).
Article PubMed Google Scholar
McGhee, G. R. Theoretical morphology: the concept and its applications. Short. Courses Paleontol. 4, 87–102 (1991).
Article Google Scholar
Avena-Koenigsberger, A., Goñi, J., Solé, R. & Sporns, O. Network morphospace. J. R. Soc. Interface 12, 20140881 (2015).
Article PubMed PubMed Central Google Scholar
Varley, T. F., Denny, V., Sporns, O. & Patania, A. Topological analysis of differential effects of ketamine and propofol anaesthesia on brain dynamics. R. Soc. Open Sci. 8, 201971 (2021).
Article CAS PubMed PubMed Central Google Scholar
Cover, T. M. & Thomas, J. A. Elements of Information Theory (Wiley, 2012).
Shannon, C. E. A mathematical theory of communication. Bell Syst. Tech. J. 27, 379–423 (1948).
Article Google Scholar
Friston, K. J. Functional and effective connectivity in neuroimaging: a synthesis. Hum. Brain Mapp. 2, 56–78 (1994).
Article Google Scholar
van Diessen, E. et al. Opportunities and methodological challenges in EEG and MEG resting state functional brain network research. Clin. Neurophysiol. 126, 1468–1481 (2015).
Article PubMed Google Scholar
Ursino, M., Ricci, G., Magosso, E. Transfer entropy as a measure of brain connectivity: a critical analysis with the help of neural mass models. Front. Comput. Neurosci. 14, 45 (2020).
Barnett, L., Muthukumaraswamy, S. D., Carhart-Harris, R. L. & Seth, A. K. Decreased directed functional connectivity in the psychedelic state. NeuroImage 209, 116462 (2020).
Article CAS PubMed Google Scholar
Fornito, A., Zalesky, A., Bullmore, E. Fundamentals of Brain Network Analysis (Academic Press, 2016).
Sporns, O. Networks of the Brain (The MIT Press, 2010).
Abdallah, S. A. & Plumbley, M. D. A measure of statistical complexity based on predictive information with application to finite spin systems. Phys. Lett. A 376, 275–281 (2012).
Article CAS Google Scholar
Williams, P. L. & Beer, R. D. Generalized measures of information transfer. https://arxiv.org/abs/1102.1507 (2011).
Stramaglia, S., Scagliarini, T., Daniels, B. C. & Marinazzo, D. Quantifying dynamical high-order interdependencies from the O-information: an application to neural spiking dynamics. Front. Physiol. 11, 595736 (2021).
Sporns, O., Tononi, G. & Edelman, G. M. Theoretical neuroanatomy and the connectivity of the cerebral cortex. Behav. Brain Res. 135, 69–74 (2002).
Article CAS PubMed Google Scholar
Ay, N., Olbrich, E., Bertschinger, N. & Jost, J. A unifying framework for complexity measures of finite systems. ECCS’06 : Proceedings of the European Conference on Complex Systems 2006. (2006).
Scagliarini, T. et al. Gradients of O-information: low-order descriptors of high-order dependencies. http://arxiv.org/abs/2207.03581 (2022).
Van Essen, D. C. et al. The WU-Minn human connectome project: an overview. NeuroImage 80, 62–79 (2013).
Article PubMed Google Scholar
Royer, J. et al. An open MRI dataset for multiscale neuroscience. Sci. Data 9, 569 (2021).
Schaefer, A. et al. Local-global parcellation of the human cerebral cortex from intrinsic functional connectivity MRI. Cereb. Cortex 28, 3095–3114 (2018).
Article PubMed Google Scholar
Colenbier, N. et al. Disambiguating the role of blood flow and global signal with partial information decomposition. NeuroImage 213, 116699 (2020).
Article PubMed Google Scholar
Yeo, B. T. et al. The organization of the human cerebral cortex estimated by intrinsic functional connectivity. J. Neurophysiol. 106, 1125–1165 (2011).
Article PubMed Google Scholar
Griffith, V. & Harel, J. Irreducibility is minimum synergy among parts. https://arxiv.org/abs/1311.7442 (2013).
Santoro, A., Battiston, F., Petri, G., & Amico, E. Higher-order organization of multivariate time series. Nat. Phys. 19, 221–229(2023).
Zamani Esfahlani, F. et al. High-amplitude cofluctuations in cortical activity drive functional connectivity. Proc. Natl Acad. Sci. USA 117, 28393–28401 (2020).
Article PubMed PubMed Central Google Scholar
Faskowitz, J., Esfahlani, F. Z., Jo, Y., Sporns, O. & Betzel, R. F. Edge-centric functional network representations of human cerebral cortex reveal overlapping system-level architecture. Nat. Neurosci. 23, 1644–1654 (2020).
Article CAS PubMed Google Scholar
Betzel, R. F., Cutts, S. A., Greenwell, S., Faskowitz, J. & Sporns, O. Individualized event structure drives individual differences in whole-brain functional connectivity. NeuroImage 252, 118993 (2022).
Article PubMed Google Scholar
Varley, T. F., Pope, M., Puxeddu, M. G., Faskowitz, J. & Sporns, O. Partial entropy decomposition reveals higher-order structures in human brain activity. http://arxiv.org/abs/2301.05307 (2023).
Ince, R. A. A. The partial entropy decomposition: decomposing multivariate entropy and mutual information via pointwise common surprisal. https://arxiv.org/abs/1702.01591 (2017).
Finn, C. & Lizier, J. T. Generalised measures of multivariate information content. Entropy 22, 216 (2020).
Article PubMed PubMed Central Google Scholar
Varley, T. F. Decomposing past and future: integrated information decomposition based on shared probability mass exclusions. https://arxiv.org/abs/2202.12992 (2022).
Timme, N. M. et al. Criticality maximizes complexity in neural tissue. Front. Physiol. 7, 425 (2016).
Rosas, F. E. et al. Disentangling high-order mechanisms and high-order behaviours in complex systems. Nat. Phys. 18, 476–477 (2022).
Varley, T. F. & Kaminski, P. Untangling Synergistic Effects of Intersecting Social Identities with Partial Information Decomposition. Entropy 24, 1387 (2022).
Sizemore, A. E., Phillips-Cremins, J., Ghrist, R. & Bassett, D. S. The importance of the whole: topological data analysis for the network neuroscientist. Netw. Neurosci. 3, 656–673 (2019).
Article PubMed PubMed Central Google Scholar
Saggar, M. et al. Towards a new approach to reveal dynamical organization of the brain using topological data analysis. Nat. Commun. 9, 1399 (2018).
Billings, J., Saggar, M., Hlinka, J., Keilholz, S. & Petri, G. Simplicial and topological descriptions of human brain dynamics. Netw. Neurosci. 5, 549–568 (2021).
PubMed PubMed Central Google Scholar
Stolz, B. J., Emerson, T., Nahkuri, S., Porter, M. A. & Harrington, H. A. Topological data analysis of task-based fMRI data from experiments on schizophrenia. J. Phys. Complex. 2, 035006 (2021).
Article Google Scholar
Varley, T. F. & Hoel, E. Emergence as the conversion of information: a unifying theory. Philos. Trans. R. Soc. A: Math., Phys. Eng. Sci. 380, 20210150 (2022).
Article Google Scholar
Wollstadt, P., Schmitt, S. & Wibral, M. A rigorous information-theoretic definition of redundancy and relevancy in feature selection based on partial information decomposition. https://arxiv.org/abs/2105.04187 (2021).
Novelli, L., Wollstadt, P., Mediano, P., Wibral, M. & Lizier, J. T. Large-scale directed network inference with multivariate transfer entropy and hierarchical statistical testing. Netw. Neurosci. 3, 827–847 (2019).
Article PubMed PubMed Central Google Scholar
Liu, T. T., Nalci, A. & Falahpour, M. The global signal in fMRI: nuisance or information? NeuroImage 150, 213–229 (2017).
Article PubMed Google Scholar
Novelli, L. & Razi, A. A mathematical perspective on edge-centric functional connectivity. http://arxiv.org/abs/2106.10631 (2021).
Lizier, J. T. JIDT: an information-theoretic toolkit for studying the dynamics of complex systems. https://arxiv.org/pdf/1408.3270.pdf (2014).
Bossomaier, T., Barnett, L., Harré, M. & Lizier. J. T. An Introduction to Transfer Entropy: Information Flow in Complex Systems (Springer, 2016).
Faes, L. et al. A new framework for the time- and frequency-domain assessment of high-order interactions in networks of random processes. IEEE Trans. Signal. Process. 70 (IEEE, 2022).
Hlinkaa, J., Paluša, M., Vejmelkaa, M., Mantini, D. & Corbetta, M. Functional connectivity in resting-state fMRI: Is linear correlation sufficient? NeuroImage 54, 2218–2225 (2011).
Article Google Scholar
Liégeois, R., Yeo, B. T. T. & Van De Ville, D. Interpreting null models of resting-state functional MRI dynamics: not throwing the model out with the hypothesis. NeuroImage 243, 118518 (2021).
Article PubMed Google Scholar
Schulz, M. A. et al. Different scaling of linear models and deep learning in UKBiobank brain images versus machine-learning datasets. Nat. Commun. 11, 4238 (2020).
Article CAS PubMed PubMed Central Google Scholar
Barrett, A. B. Exploration of synergistic and redundant information sharing in static and dynamical Gaussian systems. Phys. Rev. E 91, 052802 (2015).
Article Google Scholar
Sporns, O., Faskowitz, J., Teixeira, A. S., Cutts, S. A. & Betzel, R. F. Dynamic expression of brain functional systems disclosed by fine-scale analysis of edge time series. Netw. Neurosci. 5, 405–433 (2021).
Article PubMed PubMed Central Google Scholar
Glasser, M. F. et al. The minimal preprocessing pipelines for the Human Connectome Project. NeuroImage 80, 105–124 (2013).
Article PubMed Google Scholar
Robinson, E. C. et al. MSM: a new flexible framework for Multimodal Surface Matching. NeuroImage 100, 414–426 (2014).
Article PubMed Google Scholar
Parkes, L., Fulcher, B., Yücel, M. & Fornito, A. An evaluation of the efficacy, reliability, and sensitivity of motion correction strategies for resting-state functional MRI. NeuroImage 171, 415–436 (2018).
Article PubMed Google Scholar
Cruces, R. R. et al. Micapipe: a pipeline for multimodal neuroimaging and connectome analysis. Neuroimage 263, 119612 (2022).
Metropolis, N., Rosenbluth, A. W., Rosenbluth, M. N., Teller, A. H. & Teller, E. Equation of state calculations by fast computing machines. J. Chem. Phys. 21, 1087–1092 (1953).
Article CAS Google Scholar

Download references

Acknowledgements

T.F.V. and M.P. are supported by the NSF-NRT grant 1735095, Interdisciplinary Training in Complex Networks and Systems. The funders had no role in study design, data collection and analysis, decision to publish, or preparation of the manuscript.

Author information

These authors contributed equally: Thomas F. Varley, Maria Pope.

Authors and Affiliations

School of Informatics, Computing & Engineering, Indiana University, Bloomington, IN, 47405, USA
Thomas F. Varley, Maria Pope & Olaf Sporns
Department of Psychological & Brain Sciences, Indiana University, Bloomington, IN, 47405, USA
Thomas F. Varley, Joshua Faskowitz & Olaf Sporns
Program in Neuroscience, Indiana University, Bloomington, IN, 47405, USA
Maria Pope, Joshua Faskowitz & Olaf Sporns

Authors

Thomas F. Varley
View author publications
You can also search for this author in PubMed Google Scholar
Maria Pope
View author publications
You can also search for this author in PubMed Google Scholar
Joshua Faskowitz
View author publications
You can also search for this author in PubMed Google Scholar
Olaf Sporns
View author publications
You can also search for this author in PubMed Google Scholar

Contributions

T.F.V., M.E.P., and O.S. conceived of the project. T.F.V. and O.S. performed the formal, mathematical analysis. M.E.P. and O.S. analyzed the data. J.F. preprocessed and collated the fMRI data. T.F.V. and M.E.P. wrote the initial manuscript. O.S., J.F. provided editorial feedback. O.S. supervised the project.

Corresponding author

Correspondence to Thomas F. Varley.

Ethics declarations

Competing interests

The authors declare no competing interests.

Peer review

Peer review information

Communications Biology thanks the anonymous reviewers for their contribution to the peer review of this work. Primary handling editors: Enzo Tagliazucchi and George Inglis. Peer reviewer reports are available.

Additional information

Publisher’s note Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Supplementary information

Peer Review File

Supplementary Information

Description of Additional Supplementary Files

Supplementary Software 1

Supplementary Data 1

Supplementary Data 2

Reporting Summary

Rights and permissions

Open Access This article is licensed under a Creative Commons Attribution 4.0 International License, which permits use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons license, and indicate if changes were made. The images or other third party material in this article are included in the article’s Creative Commons license, unless indicated otherwise in a credit line to the material. If material is not included in the article’s Creative Commons license and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder. To view a copy of this license, visit http://creativecommons.org/licenses/by/4.0/.

Reprints and permissions

About this article

Cite this article

Varley, T.F., Pope, M., Faskowitz, J. et al. Multivariate information theory uncovers synergistic subsystems of the human cerebral cortex. Commun Biol 6, 451 (2023). https://doi.org/10.1038/s42003-023-04843-w

Download citation

Received: 17 October 2022
Accepted: 14 April 2023
Published: 24 April 2023
DOI: https://doi.org/10.1038/s42003-023-04843-w

This article is cited by

Behavioural relevance of redundant and synergistic stimulus information between functionally connected neurons in mouse auditory cortex
- Loren Koçillari
- Marco Celotto
- Stefano Panzeri
Brain Informatics (2023)
Neural complexity through a nonextensive statistical–mechanical approach of human electroencephalograms
- Dimitri Marques Abramov
- Constantino Tsallis
- Henrique Santos Lima
Scientific Reports (2023)

Comments

By submitting a comment you agree to abide by our Terms and Community Guidelines. If you find something abusive or that does not comply with our terms or guidelines please flag it as inappropriate.