  Article
  Published:

Semi-orthogonal subspaces for value mediate a binding and generalization trade-off


When choosing between options, we must associate their values with the actions needed to select them. We hypothesize that the brain solves this binding problem through neural population subspaces. Here, in macaques performing a choice task, we show that neural populations in five reward-sensitive regions encode the values of offers presented on the left and right in distinct subspaces. This encoding is sufficient to bind offer values to their locations while preserving abstract value information. After offer presentation, all areas encode the value of the first and second offers in orthogonal subspaces; this orthogonalization also affords binding. Our binding-by-subspace hypothesis makes two new predictions confirmed by the data. First, behavioral errors should correlate with spatial, but not temporal, neural misbinding. Second, behavioral errors should increase when offers have low or high values, compared to medium values, even when controlling for value difference. Together, these results support the idea that the brain uses semi-orthogonal subspaces to bind features.

Fig. 1: Task outline, behavioral model and brain areas.
Fig. 2: Diverse value–response functions produce semi-orthogonal subspaces for value.
Fig. 3: Subspace correlation mediates a trade-off between the reliability of binding and generalization.
Fig. 4: The theory predicts misbinding and generalization rates for each region.
Fig. 5: The representational geometry of the offer sequence.
Fig. 6: The representation of past and current offers predicts key elements of animal behavior.

Data availability

The raw data analyzed in this work are fully available on Figshare via (ref. 94).

Code availability

The code underlying this work relies on the Python scientific computing environment, including: python (3.8.2), numpy (1.23.5), scipy (1.10.1), sklearn (1.3.0), rsatoolbox and matplotlib (3.7.2). The custom code written to generate the figures and analyze the data is available at The version of the code used to generate the figures is available on Zenodo via (ref. 95).


We thank B. Vinje, who led an excellent summer journal club on the binding problem at UC Berkeley in 2001. We thank S. Fusi for useful discussions of previous versions of this paper. We thank M. Wang, T. Cash-Padgett, M. Mancarella, C. Strait, T. Blanchard and B. Sleezer for assistance with data collection. We also thank L. Mickiewicz, A. Ong and A. Silcott for administrative support. This research was supported by NIDA R01 DA038615 (to B.Y.H.) and MH124687 (to B.Y.H.) W.J.J. was supported by NSF 1707398, Simons Foundation 542983SPI, Gatsby Charitable Foundation GAT3708, NIMH R01 MH129031 and the Kavli Foundation. We acknowledge computing resources from Columbia University’s Shared Research Computing Facility project, which is supported by NIH Research Facility Improvement Grant 1G20RR030893-01, and associated funds from the New York State Empire State Development, Division of Science Technology and Innovation (NYSTAR) Contract C090171, both awarded 15 April 2010.

Author information

Authors and Affiliations



W.J.J., J.M.F., S.B.M.Y., R.B.E. and B.Y.H. conceived of the project. S.B.M.Y. and B.Y.H. designed the experiments. S.B.M.Y. and R.B.E. collected the data. W.J.J. and J.M.F. designed and performed the analyses. W.J.J. developed the theoretical approach. W.J.J. and J.M.F. created the figures. W.J.J., J.M.F. and B.Y.H. wrote and edited the paper.

Corresponding author

Correspondence to W. Jeffrey Johnston.

Ethics declarations

Competing interests

The authors declare no competing interests.

Peer review

Peer review information

Nature Neuroscience thanks the anonymous reviewers for their contribution to the peer review of this work.

Additional information

Publisher’s note Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Extended data

Extended Data Fig. 1 Behavioral consistency across subjects and regions.

A. The time-average proportion of optimal choices made by each subject (columns) during experiments in each region (rows). The proportion is smoothed with a 50 trial-wide boxcar filter. The black traces (n = 8 across all regions and animals) are sessions where performance dropped below a threshold of 60% optimal choices for at least one time bin. B. The subspace correlation analysis performed for the subset of sessions where performance did not drop below the 60% threshold. The results are similar to the full set of sessions.

Extended Data Fig. 2 Example neurons, model comparison, and subspace correlations for the linear models with a nonlinear value representation.

A. The firing rates of example neurons from each region during the offer window, shown for high and low value offers presented on the left or right side (100 ms boxcar filter, shaded area is SEM). B. The value-response function for each neuron in A. The value-response function fit by the linear regression model with a b-spline value encoding and an interaction term is overlaid (dashed lines). The b-spline representation uses 4 knots and is degree 2. C. A simplex showing the weight given to each of the noise-only, linear, and interaction regression models by the Bayesian model stacking analysis. The points corresponding to the example neurons shown in A and B have dark outlines here. Both the linear and interaction categories include both linear and spline value representation models. D. Schematic of three different representational geometries that would lead to different subspace correlation results. (top) Two perfectly aligned value vectors vl and vr in population space (left) would produce a subspace correlation close to 1 (right). (middle) Two partially aligned value vectors vl and vr in would produce a subspace correlation between 0 and 1 (note there is an additional possibility: partially aligned but negatively correlated subspaces; not schematized). (bottom) Two unaligned value vectors vl and vr would produce a subspace correlation close to 0. E. Alignment indices for all regions for the offer presentation window (top) and the delay period (bottom). The upper gray point is the alignment index expected if the left- and right value representations were aligned and corrupted only due to noise – the lower gray point is the noise floor. In these models, pgACC, VS, and the combined population are semi-orthogonal, while PCC, OFC, and vmPFC are indistinguishable from orthogonal.

Extended Data Fig. 3 Variations on the subspace correlation analysis.

A. Subspace correlation for the different time points of the experiment (that is, offer 1 and offer 2) as well as a combined set of trials from both offers, but given the same total number of trials as either offer individually. B. Subspace correlation for the best fitting models given the model comparison analysis. OFC, vmPFC, VS, and the combined population (“all”) have semi-orthogonal representation, while PCC has orthogonal representations and pgACC has parallel representations in the first time period, but orthogonal representations in the second time period. C. Subspace correlation for separate monkeys and regions. D. Subspace correlation for the set of sessions included in the simultaneous population error analysis in Fig. 5. E. The average subspace correlation for the full set of trials shown as a timecourse analysis; the shaded area represents the 95% confidence interval.

Extended Data Fig. 4 Accurate recovery of linear and nonlinear distances from simulated data.

The true (dashed line) and estimated (solid line with error bars) linear (left) and nonlinear (right) distances from simulated data. The error bars represent 95% intervals around the estimated values from n = 100 different synthetic datasets with matched statistics. Our decomposition accurately recovers the linear and nonlinear distances, as the true value is always within error bars of the estimated value and the estimate is typically unbiased.

Extended Data Fig. 5 Distance estimated from subsampled neural populations.

The distances are estimated from subsampled populations of 80 neurons. Otherwise, this plot is the same as Fig. 4c.

Extended Data Fig. 6 Estimated distances for individual subjects.

This figure is analogous to Fig. 4c, but where the data for each region is separated into the constituent subjects.

Extended Data Fig. 7 Value decoding, value generalization, and predicted value generalization of the code within each recorded region.

(Top) Pseudopopulation value decoding performance (circles), generalization performance (squares, trained on offers from one side, tested on offers from the other side), and predicted generalization performance (stars) shown for each region and the neural population combined across regions (“all”), shown for the left and right value comparison. The violin plot shows the values produced from two hundred bootstrap resamples of the trials. (Bottom) The same as (top) except shown for the offer 1 and offer 2 comparison.

Extended Data Fig. 8 Dependence of the behavioral decoding result (Fig. 6f) on the required number of neurons in the simultaneously recorded population.

The gray crosses indicate significance under a one-sided T-test. The individual circles show sessions, the error bar shows the average across cross-validation runs of those sessions. The grey stars indicate significance at the p < .05 level.

Extended Data Fig. 9 The main results replicate when safe trials are included.

This figure replicates Fig. 4 from the main text but including the safe trials in the dataset. The results are qualitatively similar in both conditions. A, Shows offer value decoding and generalization across left and right offers. B, Shows offer value decoding and generalization across offer 1 and offer 2. C, Shows the linear and nonlinear distances estimated for both left and right offers (left) and offer 1 and 2 (right). D, Shows the predicted binding error rate for each region in both the spatial (left) and temporal (right) configurations as a function of subspace correlation. E, The same as (D) but shows the predicted and empirical generalization error rates. F, Shows the position of each region on the binding and generalization error rate plane.

Extended Data Fig. 10 The available neurons for a particular required number of trials.

As more trials for each condition are required, fewer neurons are available for inclusion in the pseudopopulation. The default throughout the paper is 160, shown in grey.

Supplementary information

Supplementary Information

Supplementary Figs. 1–3, Table 1 and Discussion.

Reporting Summary

