Brief history of common era temperature reconstructions in IPCC Reports

More than two decades ago, the Intergovernmental Panel on Climate Change (IPCC) featured a single millennium-long reconstruction of Northern Hemisphere (NH) mean annual temperatures in its 2001 Working Group I Summary for Policymakers (WGI SPM) associated with their Third Assessment Report (Fig. 1a). This reconstruction1 – colloquially known as the Hockey Stick – subsequently became one of the most iconic illustrations in climate change research2,3, and sparked a lively debate within and beyond the paleoclimate research community regarding the reconstruction’s methodology and accuracy, as well as the proxy interpretations that supported its development4,5,6,7,8,9,10,11,12,13,14,15,16,17,18,19.

Fig. 1: Common Era temperature reconstructions featured in IPCC reports since 2001.
figure 1

a IPCC 2001: reconstruction of NH annual mean temperatures back to 1000 CE (purple; ref. 1.) shown together with the annually-resolved instrumental NH land and marine temperatures beginning in 1860 CE (red) as featured in the 2001 SPM. b IPCC 2007: twelve reconstructions of NH annual and warm season temperatures since 700 CE (blue; ref. 1. in purple) shown together with the instrumental NH annual mean land and marine temperatures beginning in 1856 CE (red). Note that Figure 5.7 in the 2007 IPCC WGI report also provided separate presentations of SH and global reconstructions. c IPCC 2013: fifteen reconstructions of NH annual and warm season temperatures since 1 CE (blue) shown together with the instrumental annual mean land temperatures of the NH beginning in 1881 CE (red). d IPCC 2021: median reconstruction of global annual mean temperatures since 1 CE (blue) shown together with instrumental global annual mean land and marine temperatures beginning in 1850 CE (red). All curves, except for the instrumental data in (a) and borehole temperatures in (b) and (c), were smoothed using 30-year low-pass filters. Shaded regions in (a) and (d) are the respective 95% and 90% confidence intervals for the individual mean reconstructions in each panel. The reference period used to calculate temperature anomalies in °C varies among the reports, from 1961-1990 CE used in IPCC 2001 and 2007, to 1881-1980 in IPCC 2013, and 1850–1900 in IPCC 2021.

Despite featuring the Hockey Stick exclusively in the 2001 WGI SPM3, other NH temperature reconstructions were also available at the time. Elsewhere within the 2001 WGI report was an additional graph (2001 WGI Figure 2.21) that compared the ref. 1. reconstruction against two other temperature reconstructions20,21. A 500-year-long reconstruction of NH temperature trends derived from temperature measurements in terrestrial boreholes22 was also included in the report in another stand-alone figure (2001 WGI Figure 2.19). Important differences existed among these reconstructions that emphasized active areas of research, uncertainty, and debate within the paleoclimatic research community. Even in 2001, representing the state of last-millennium paleoclimate temperature reconstructions with a single timeseries in the SPM was therefore not necessary and may have inadvertently yielded an impression that there were few uncertainties associated with the science of reconstructing large-scale temperatures. It should also be noted that the 2001 report did not have a separate chapter on paleoclimate, although it used paleoclimate data throughout the report to provide historical and dynamical context for recent observations and future predictions23.

Featuring the Hockey Stick in the 2001 WGI SPM subsequently led to various points of criticism that became expanded areas of investigation in the years that followed. Researchers began to analyze the effects of (i) tree-ring detrending, which can affect the amplitude of past temperature variability8,24, (ii) incomplete spatial coverage due to the limited number of proxy records in the tropics and Southern Hemisphere (SH4,25,26), (iii) a focus on annual instead of summer temperatures, despite the majority of proxies being primarily sensitive to warm season conditions10,27, and (iv) methodological choices that impact the spectral character of the resulting reconstruction12,13,14,15,19,28,29,30. These discussions spurred efforts to develop new large-scale reconstructions for different domains of the NH extra-tropics8,24,31,32,33,34. Many of these reconstructions also targeted summer land-only temperatures, rather than annual land and marine temperatures, acknowledging the fact that the majority of high-resolution CE proxy data came from high-latitude terrestrial environments24,35,36,37. Several attempts used tree-ring chronologies exclusively as predictors, because these are annually resolved and precisely dated38. The tree-ring measurements were detrended (sensu refs. 39,40) to preserve low-frequency variance, although uncertainties remain in this proxy’s ability to preserve multi-centennial to millennial-scale temperature variability8,41,42. Several of the new temperature reconstructions extended further back in time, due to the continuous development of new proxy records and the improvement of existing ones43. In summary, these advancements resulted in a more deliberate representation of large-scale temperatures, while still demonstrating the most recent decades to be likely unprecedented within the context of the past one to two millennia3,44,45.

The investigations summarized above were part of a concerted research focus in the early 21st century that yielded rapid and widespread development of new temperature reconstructions for the CE allowing better characterizations of their limitations. The fruits of these labors were apparent in the 2007 and 2013 IPCC WGI reports46,47, both of which adopted different approaches to presenting the expanded number of large-scale reconstructions and the associated uncertainties around estimates of past temperature variability. Specific paleoclimate chapters in these reports included summary figures of the contemporary reconstruction ensembles for NH annual and warm-season temperatures extending back to 700 CE and 1 CE, respectively (Fig. 1b, c; figures TS.20 and 5.7 in the 2007 and 2013 IPCC WG1 reports, respectively). The 2007 report still included the Hockey Stick reconstruction, which exhibited the smallest pre-instrumental temperature variability when compared to the other reconstructions (Fig. 1b). This direct comparison emphasized some of the outstanding uncertainties that were not articulated in the 2001 IPCC report. Other reconstructions (all shown in Fig. 1b without uncertainty estimates) revealed a pre-instrumental temperature range >1 °C for the NH extra-tropics including substantially colder conditions during the Little Ice Age (LIA)48 between the 14th and 19th centuries (a feature already characterized back to the 16th century by the terrestrial borehole reconstructions in the 2001 IPCC report), preceded by comparatively warmer conditions during the Medieval Warm Period (MWP) centered around 1000 CE. This general picture did not change with the 2013 IPCC report but was given further context by highlighting overall colder conditions prior to the MWP, albeit with a larger range of diverging estimates among the single reconstructions during the first millennium CE. We note, however, that the 2007 and 2013 reports did not include uncertainty estimates for each of the individual reconstructions, which would have been the most comprehensive representation of the uncertainties across the available reconstruction ensembles.

The evolution of how large-scale temperature reconstructions have been characterized in previous IPCC reports is important for understanding the limitations of the most recent IPCC report49. In an approach that parallels the 2001 SPM, the 2021 WGI report returned to a single representation of purportedly annual temperature variability, this time a global estimate over the entire Common Era (Fig. 1d) derived from a single study50 that uses a subset of the 692 proxy records compiled by the wider paleoclimate community51. Similar to the case in 2001, this was not a necessary choice, given that several reconstructions of global temperature fields over the CE were available52,53, in addition to earlier index reconstructions of global mean CE temperatures by refs. 18,54. In contrast to the 2001 report, however, the use of a single mean global temperature reconstruction was the only representation of large-scale CE temperature history provided in the entire WGI report, thus overlooking the wealth of other hemisphere-scale temperature reconstructions that incorporated two decades of knowledge about the strengths and limitations of CE proxy data45. Moreover, this single global reconstruction was used throughout the 2021 WGI report, versions of which appear in the SPM (Figure SPM.1), the Technical Summary (Box TS.2, Fig. 2; Box TS.1, Fig. 1) and in Chapters 1-3 of the WGI report (Figures 1.25, 1.26, 2.11, and 3.2).

Fig. 2: Two different summaries of the PAGES2k Consortium (2019) reconstructions of global temperature variability over the Common Era.
figure 2

a The summary as presented in the 2021 IPCC report in which the entire 7000-member ensemble from PAGES2k Consortium (2019) is used to determine both the median reconstruction and the 90% confidence interval (shown in light blue shading). b Same as in the top panel but derived for each 1000-member reconstruction ensemble individually. All curves were smoothed using 20-year low-pass filters to match the presentation in the IPCC 2021 report (in all cases the ensembles were first filtered before determining the confidence intervals from the filtered reconstructions). Note that Figure SPM.1 appears to be lowpass filtered with a period cutoff of 20 years, consistent with its caption description, but WGI Figure 3.2 appears to have a cutoff frequency with a period that is much longer than the stated 20 years (other instances of the reconstruction in the IPCC report do not indicate the cutoff frequency of the bandpass filtering, but most appear similar to the 20-yr lowpass result in Figure SPM.1). Anomalies in the top panel are shown relative to the mean of the annual median reconstruction over the period 1850-1900, again consistent with the 2021 IPCC report. Anomalies in the bottom panel are all relative to the same reference mean in the top panel, thus preserving the relative differences between the reconstructions as presented in PAGES2k Consortium (2019).

Visual summaries of the current scientific consensus in IPCC reports are powerful and impactful. IPCC figures are used widely by scientists, policy makers, and climate communicators, and are often the go-to source for the state-of-the-art representation of our scientific understanding in many subdisciplines of climate research. It is therefore worth noting that the visual impression of the reconstructions shown in the 2007 and 2013 IPCC reports, themselves an important evolution from the 2001 representation, vividly demonstrate a very different sense of the uncertainties in our understanding of large-scale CE temperature variability than the 2021 representation. The latter conveys a sense of agreement within the field of study around a singular estimate over the entire duration of the CE, which is also accompanied by an unrealistically narrow and near-constant uncertainty band back in time (the shading in Fig. 1d). We argue below that the 2021 visual representation of the state of the science pertaining to large-scale CE temperature reconstructions was insufficient, misleading, and cut against efforts in previous reports to represent the range of estimates and uncertainties associated with these reconstructions. Importantly, we use this discussion to consider what might have been done differently and provide guidance for how the impending Assessment Report 7 (AR7) can better represent both agreement and dissent in this area of paleoclimate research.

IPCC representation of the PAGES2k consortium (2019) reconstructions

It is important to begin by noting that while the reconstructions from PAGES2k Consortium (2019)50 (P2k19 hereinafter) reflect a substantial effort to synthesize large amounts of proxy data and apply a variety of statistical techniques, their representation in the 2021 IPCC WGI report is a simplification of the range of uncertainties in the original paper. P2k19 produced eight global temperature reconstructions of tropical-year (April-March) temperatures derived from the same network of proxies and eight different reconstruction methods, seven of which were used in the 2021 IPCC WGI report (BHM, DA, CPS, OIE, PAI, PCR, and M08; see P2k19 and Table 3.SM.1 in the supplementary material of the 2021 IPCC WGI report for definitions and listing). Each of these reconstructions comprised a 1000-member reconstruction ensemble, representing multiple methodological choices for deriving ensemble uncertainties. There is no a priori reason to favor one reconstruction method over another and this large ensemble provides a useful assessment of the methodological influences associated with the production of large-scale temperature reconstructions.

Given the large ensemble of reconstructions in P2k19, it is reasonable to ask how one should summarize an ensemble of 7000 estimates of global temperatures over the CE derived from seven different methods using the same network of proxy records comprising tree rings, ice cores, corals, and lake and marine sediments as predictors (Table 1). The manner in which the question is addressed has implications for how the information is communicated and digested. The IPCC chose to show the median of all 7000 reconstructions and represented the uncertainties around the median using the 90% confidence interval (CI) derived from the full 7000-member ensemble (this range was estimated as the ±5% values of the 7000-member ensemble in each year). The result is shown in Figs. 1d and 2a and characterized by largely stable temperatures during the first millennium CE, followed by a prolonged cooling trend throughout much of the second millennium and unprecedented warming since the mid to late 20th century, the latter feature being a consistent characteristic of other large-scale temperature reconstructions2,3,44,45,55. Notably, the uncertainty range is also largely constant back in time, except for a small asymmetric increase in the colder range of uncertainties during the LIA. This narrow and constant uncertainty band is atypical and unexpected for large-scale reconstruction uncertainties, because the integrated proxy data decrease in number, spatial coverage, and quality (e.g. age uncertainty) back through time. Under these conditions, the uncertainty range should increase substantially in the first millennium (see dashed curves in Fig. 1a as an example).

Table 1 Reconstructions of large-scale temperature variability

An alternative presentation of the P2k19 results is provided in Fig. 2b. In this summary, each 1000-member reconstruction ensemble, which is individually associated with a specific method and uncertainty estimate, is summarized by its median and individual 90% CI (again estimated as the ±5% values of the reconstruction ensemble in each year, this time for the 1000-member ensembles associated with each method). The representation of the reconstructions in Fig. 2b gives a different visual impression of the differences characterized by the P2k19 results than the version in Fig. 2a, with the version in Fig. 2b being more akin to the manner in which the NH temperature reconstructions were represented by the IPCC in 2007 and 2013 (Fig. 1b-c). The ensemble of reconstructions in Fig. 2b give comparable estimates of LIA temperatures, but one reconstruction (BHM) yields a strong cold departure during this interval. Agreement among reconstructions reduces over the first millennium of the CE, characteristic of the expected larger uncertainties due to fewer proxies during that period26. The collective CIs are also wider, driven primarily by the broad and evenly distributed CI associated with the DA reconstruction. There is nevertheless a sense of CI overlap and various levels of agreement based on the darkness of the CI shading, which represents the amount of overlap between different estimates.

P2k19 is a single study but consists of an ensemble of global reconstructions based on the same proxy network and different reconstruction methodologies. A single median reconstruction summarizing the entire P2k19 ensemble does not adequately represent the methodological uncertainties that are evident in Fig. 2b and in the original P2k19 publication. Moreover, the single CI estimate from the entire ensemble yields a narrower and more even uncertainty band, again giving an impression of smaller uncertainties collectively and back in time. As we argue below, this representation and exclusive focus on the global P2k19 reconstructions overlooked any of the context or advances provided by more than two decades of work on hemispheric-scale reconstructions, which approached and interrogated challenges to both methodology and paleoclimate data. The AR6 is therefore missing important information on how best to interpret and contextualize our understanding of the climate of the CE.

Large-scale temperature reconstructions through a wider lens

In addition to the P2k19 reconstructions that targeted global land and marine temperatures, several large-scale temperature reconstructions, published since the 2013 IPCC report, were not included in AR6: six reconstructions targeting NH extra-tropical mean land temperatures (ref. 56. hereafter Sch15, ref. 57. Sto15, ref. 58. Wil16, ref. 59. Xin16, ref. 60. Gui17, ref. 61. Bün20)56,57,58,59,60,61 and one reconstruction targeting SH land and marine temperatures (ref. 26. hereafter Neu14)26 (Table 1). Three of these reconstructions extend back to 1 CE, including Xin16, who report a deterioration of calibration fidelity before 850 CE. The seasonality of reconstructed temperatures also varies among the estimates. Five of the six NH reconstructions represent warm season temperatures (either JJA or MJJA), except for the Xin16 reconstruction that targets annual temperatures. The SH Neu14 and global P2k19 reconstructions both target annual temperatures, although the target season differs marginally from May-April to April-March, respectively. The seasonality of the climate signal in the global P2k19 reconstructions is additionally complicated because the SH tree-ring predictors are weighted towards austral summers (DJF), while the NH tree-ring predictors are weighted towards boreal summers (JJA). P2k19 and Neu14 are also the only reconstructions that potentially include substantial numbers of non-tree-ring records, typically characterized by reduced dating accuracy, dating biases (e.g. ice cores contain these biases because of mistakenly fixing chronologies to historical volcanic eruptions)62, and overall lower age resolution through processes such as diffusion, melting, or bioturbation. It nevertheless is difficult to interpret from the results presented in each paper the degree to which these records were incorporated into the reconstructions that they produced.

We plot all of the above discussed large-scale temperature reconstructions in Fig. 3 to provide a perspective on CE climate variability as it could have been represented in the 2021 WGI report (this is a format similar to the NH-SH-Global presentation in Figure 5.7 in the 2013 WGI report). Because of the different target domains (NH, SH, global) and target seasons (warm season to annual), comparisons of the reconstructions in Fig. 3 should be interpreted cautiously (as they were in the 2013 WGI report) because the temperature trends and variability are impacted by the choice of target domain and season63,64. General features of the reconstructions are nevertheless consistent and instructive. Figure 3a is characterized by a distinct progression from the MWP to LIA to recent warming across the different reconstructions and regions, and there tends to be more consistency in the estimates of mean temperature and variability through the LIA to the present, a result perhaps expected due to the increased number of available proxy records during that time. Differences in Fig. 3a tend to increase during the MWP and even more so during the first millennium, when the NH reconstructions and global P2k19 ensemble in Fig. 3c all diverge the most. The nature of the temperature progression during the first millennium is also different in the global P2k19 estimate in Fig. 3c and the few NH estimates in Fig. 3a that span the entire CE. All the P2k19 estimates suggest a relatively stable mean temperature during the first millennium, although they diverge in terms of the absolute value of the mean and their estimated range of uncertainties. This stability contrasts with the Xin16 and Bün20 NH reconstructions in Fig. 3a that indicate substantially colder conditions centered over the 6th century (suggested by the authors to be the Late Antique Little Ice Age)65, prior to the warmer conditions during the MWP. Colder conditions preceding the MWP have already been demonstrated in several of the longer reconstructions included in the 2013 IPCC report34,37,54,66, although these reconstructions were also spatially restricted to the NH.

Fig. 3: Reconstructions of large-scale temperature variability over the last 2000 years that were published since AR5 of the IPCC.
figure 3

Reconstructions variably target seasonal to annual mean temperatures in the (a) Northern Hemisphere (Sch15, Sto15, Wil16, Xin16, Gui17, Bün20), and annual temperatures for the (b) Southern Hemisphere (Neu14) and (c) globally (P2k19; as shown in Fig. 2) over varying periods of the Common Era (see Table 1 for details). All reconstructions were smoothed using a 20-year low-pass filter and temperatures are shown as anomalies from their 1850–1900 means. Hemispheric and global means of land and ocean temperatures derived from HadCRUT5 instrumental analysis1 are also shown in each respective panel from 1850-2020 (red). Instrumental temperatures were also referenced to zero mean in the 1850–1900 interval and filtered with a 20-year lowpass filter. These instrumental representations are all consistent with the 2021 IPCC report.

Despite some of the large-scale differences evident in Fig. 3, the reconstructions display positive and statistically significant correlations over most periods of the past two millennia. Figure 4a presents the correlations between the eight reconstructions shown in Fig. 3, except only using the median of the P2k19 ensemble as employed by the IPCC and shown in Fig. 2a. These correlations are perhaps not surprising given that all reconstructions use many of the same proxy records as predictors, particularly prior to 1400 CE when relatively fewer proxies are available67. The inter-series correlations decline back in time, driven partly by reduced replication. For example, there is an 80% and 95% reduction in sample depth between 1800 CE and the first years of the Sch15 and Sto15 reconstructions, respectively (Table 1). The only reconstruction that is free of these predictor losses is Bün20, which included only seven tree-ring chronologies that covered the entire reconstruction period. Each of these seven predictors is nevertheless characterized by an equally dramatic loss in the number of single trees included in their chronology construction61,68, a feature that is characteristic of all large-scale temperature reconstructions60,69,70 and difficult to represent in conventional uncertainty estimates. For the P2k19 reconstructions, this translates into a ~ 95% reduction of predictors compared to 1800 CE, considering the dropout of shorter proxy records and replication changes inherent to the tree-ring chronologies. This substantial reduction in the numbers of predictors back in time effectively translates into a reduced understanding of first-millennium temperature variability at large spatial scales that is not represented by the single median time series and the narrow error ranges represented in the AR6 report.

Fig. 4: Reconstruction covariance.
figure 4

a Inter-series correlations of the eight reconstructions shown in Fig. 3 calculated over 50-year intervals lagged by 25 years, including ±2SE bars. The one-tailed 95% significance threshold for 50-year windows is a correlation of approximately 0.24, which is represented by the solid pink line. b Comparison of the normalized P2k19 ensemble median and Bün20 reconstructions from 450 to 650 CE, and the Wil16 and P2k19 reconstructions from 1420 to 1620 and 1800 to 2000 CE. Dashed lines indicate the timing of major volcanic eruptions.

The common inter-annual to multi-decadal signals between the global P2k19 and NH extra-tropical reconstructions become further visible after normalizing the reconstructions and using Wil16 and Bün20 as comparison examples (Fig. 4b). We show three 200-year segments characterized by large volcanic eruptions (red dashed lines in Fig. 4b) and, in the case of 1800-2000 CE, the period of anthropogenic greenhouse gas forcing. The reconstruction correlations during these segments range from r450-650 = 0.60 to r1800-2000 = 0.85. Despite these high correlations, interpreting common signals is difficult. For example, tree rings, essentially a summer temperature proxy, are the dominant annually-resolved proxy and the reuse of the same tree-ring chronologies in all of these reconstructions makes it unlikely that any substantial cold season information is recorded in the coherent inter-annual variations. The P2k19 global April-March mean temperature reconstructions also include 40 predictors from tropical corals, but these proxy records only reach back to the 17th century, and many are much shorter. This likely imparts a weighting towards the annual season for the most recent centuries, and as these shorter records drop out, it will result in a spatial shift in proxy weighting from the tropics towards higher latitudes as well as a seasonal shift towards warm season temperatures retained in the remaining annually-resolved tree-ring data45. Despite these details, the common signals across the global P2k19 ensemble median and the various NH reconstructions as represented by the correlation results are a stark reminder that current global reconstructions are biased by NH sampling and therefore are impacted by the same challenges of previous NH work. The isolated presentation of a global temperature reconstruction in the 2021 IPCC WGI report thus removed important context from the critical discussion of interpreting and improving large-scale temperature reconstructions.

While common signals exist across the various reconstructions, they differ substantially in their magnitudes of decadal and lower-frequency variance, a result that is tied in part to the different target domains (global versus NH extra-tropics) and hemispheres, with the SH temperature variability being reduced due to the dominance of the oceans. These differences nevertheless cannot be exclusively explained as differences in target domains. We compare the standard deviations (SD) of the annually-resolved instrumental and reconstructed timeseries without any prior smoothing in Fig. 5. The SD of the NH land-only summer instrumental temperatures exceeds the SD of the global land and marine annual instrumental temperatures by 0.05 °C. The SH land and marine observations show an even smaller SD than the global data. These differences are also represented in the reconstructions (even though the precise instrumental target might differ; see Table 1), as the ensemble median of P2k19 global and Neu14 SH reconstructions have the smallest SD during the 1878-2000 CE instrumental period (blue bars in Fig. 5). Sto15, Wil16, Gui17 and Bün20 match the larger NH instrumental temperature SD, whereas Sch15 underestimates the target variance, likely due to post-1960 divergence71, which is characteristic of the maximum latewood density data exclusively used in the study (Sch15 was originally calibrated over 1901 to 1976 CE only).

Fig. 5: Standard deviations in observed temperature data and Common Era temperature reconstructions.
figure 5

Estimates for the observed and reconstructed temperatures are determined over the 1878-2000 CE (blue), 1001-1877 CE (orange) and 1-1000 CE (gray) periods. Instrumental records shown on the left side include mean annual temperatures averaged over 90°S-90°N land and marine areas (global), mean annual temperatures averaged over 0°−90°S land and marine areas (SH), and mean summer (JJA) temperatures averaged over 30°−90°N land-only areas (NH).

The differences between the SD of the reconstructions during the instrumental period become substantially larger during the pre-instrumental 1001-1877 and 1-1000 CE periods. Whereas the SD of the NH reconstructions either remains relatively stable (Sch15, Wil16) or increases (Sto15, Gui17, Bün20), the SD of the global ensemble median of P2k19 and SH Neu14 reconstructions decreases during the second millennium CE. During the first millennium CE, the SD difference between the annually-resolved P2k19 and Bün20 reconstructions equals 0.34 °C, largely exceeding the 0.05 °C SD difference between the global and NH instrumental targets. The important aspect of these comparisons is that the relative variance losses are different across the reconstructions, regardless of the fact that the variance of the target domains are different. These relative variance losses are potentially due to many factors, including reconstruction methodologies and employed proxy networks, the dependencies of which are only evident when analyzed across an ensemble of reconstructions as shown in Fig. 5.

Implications of variance losses back in time have been addressed in ref. 72., who pointed to the impact of dating uncertainties, biases, and inclusion of lower resolution proxies in P2k19 on the temperature variability in the first millennium and the limited ability to realistically depict post-volcanic temperature reductions56,62,73 compared to the second millennium CE74,75,76,77,78. Variance reductions in mean estimates are not necessarily problematic and can indeed be imposed by methodological formulations, but require careful and appropriate treatment in reconstruction uncertainty estimates79. This is not the case in the 2021 SPM figure, however, in which the uncertainty estimates represent the range among different methods and do not account for temporally changing losses in reconstruction skill and available proxies back in time.

While interpretations of the similarities and differences across the various domains and reconstructions, as shown in Fig. 5, remains the subject of important and interesting research, diagnosing the differences is not the focus of our commentary herein. Our primary concern is that substantial uncertainty exists. The consequence is that there are notable differences in the representation of large-scale estimates of CE temperature variability, as shown in Figs. 2 and 3, that were overlooked and poorly communicated by the 2021 IPCC WGI report. Both the different summary of the global P2k19 ensemble provided in Figs. 2b and 3c, and the inclusion of the additionally available NH and SH temperature reconstruction estimates in Fig. 3, imply substantial uncertainties in large-scale temperature reconstructions that better summarize the existing challenges associated with the science.

Conclusions and future priorities

We propose that a visualization of the contemporary research, as in Fig. 3, offers a more accurate depiction of the uncertainty and temporal evolution of CE temperature variability compared to any single reconstruction. A general feature of Fig. 3 is that long-term trends during the second millennium CE are more coherent and robust, but major discrepancies still exist during the first millennium CE. These uncertainties in the first millennium are the product of severe reductions in the availability of high-resolution proxy records, which affects all large-scale temperature reconstructions. The SH also remains grossly under-sampled. It is therefore premature, and possibly incorrect, to conclude that the first millennium was free of centennial-scale temperature trends and that the decadal variations were systematically smaller than during subsequent centuries, as detailed in the 2021 SPM.

Regarding global temperature reconstructions specifically, we also highlight the following limitations that must continue to be contextualized in consensus reports on CE temperature reconstructions: (i) warm season biases due to the dominance of tree-ring records during the CE, (ii) spatial biases in proxy sampling, with a persistent lack of high-resolution proxy records from the tropics and SH, which are needed for accurately representing lower-latitude and SH temperatures over the past 2000 years, (iii) the likely loss of variability when including time-uncertain and smoothed proxies in a large-scale reconstruction, (iv) the potential limited ability of conventional tree-ring records to capture millennial-scale trends in climate, and (v) the need to more accurately estimate reconstruction uncertainties that reflect changes in replication and statistical model fidelity of the underlying proxy network back in time (a constant uncertainty range back in time is unlikely to accurately represent the increasing uncertainties that exist). With any set of methods, however, their outcome is ultimately dependent on the data that they incorporate and the assumptions that underpin the statistical model. A major initiative to produce new high-resolution proxy records that span the entire CE is therefore necessary if we are to fundamentally improve our understanding of pre-instrumental temperature variations at policy-relevant timescales. It otherwise remains uncertain how warm and cold first millennium CE temperatures actually were and what caused these earlier changes at hemispheric to global scales, with implications for our understanding of the true range of externally and internally forced variability.

The above uncertainties remain relevant, universal, and critical in the context of producing large-scale reconstructions of climate over the CE. These issues were overlooked by the rendition of global temperature reconstructions drafted by the IPCC in 2021 and they were also not addressed in the accompanying report discussion. We nevertheless are sympathetic to the daunting challenge that confronts IPCC authors who must summarize vast amounts of research in reports that are highly constrained in length and content. While some of the criticisms that we raise in this perspective could have been addressed with simple choices associated with the visual representation of data, other elements would have been more challenging with chapter length restrictions. Our final argument is therefore focused on another parallel between the 2001 and 2021 IPCC WGI reports, which was the absence of a paleoclimate chapter in both. While paleoclimate research was represented in various chapters of these reports, it was not a subject that was given a specific chapter, in contrast to the stand-alone paleoclimate chapters in the 2007 and 2013 reports. Given the importance of paleoclimate topics in general, and the historical focus of the IPCC on multiple paleoclimate subjects specifically (like temperature reconstructions of the CE), the IPCC should return to the practice of drafting a separate paleoclimate chapter in AR7, which would allow for a more cohesive and comprehensive focus on topics like large-scale temperature reconstructions.