Introduction

Accurate and trustworthy data are critical for understanding, mitigating, and preventing complex problems. During disease outbreaks, precise mortality data are essential for facilitating optimal resource allocation, conducting retrospective evaluations of disease mitigation measures, and effectively planning for—and perhaps potentially preventing—future epidemics and other public health emergencies1,2. However, there are widespread concerns about the lack of reliable mortality estimates during large-scale disease outbreaks, and this concern was prevalent during the COVID-19 pandemic3,4,5,6. Skepticism about the validity of official COVID-19 mortality data arises from a lack of rigorous testing, absence of medical certification, deaths occurring outside formal healthcare systems, and other indirect pandemic-related deaths that occurred due to delays or lack of access to healthcare, reduced hospital capacity, increased risk of other diseases, and post-COVID-19 complications3,4,7,8. In response, national and international health authorities, epidemiologists, and journalists across the world have estimated the number of additional deaths during the COVID-19 pandemic relative to those expected based on trends from pre-pandemic times, also known as excess mortality4,5,7,9,10,11,12,13,14. Excess mortality estimates are conventionally computed using a range of statistical and epidemiological models4,5,7,8,10,11,12,13,14,15,16,17,18,19,20,21,22,23,24,25. However, these models rely on plentiful, high-quality, and timely pre-pandemic mortality data. Such data are particularly difficult to obtain in resource-constrained settings where robust and resilient data infrastructures are limited2,3,26,27,28. In such circumstances, researchers have estimated COVID-19-related excess mortality using innovative alternatives such as post-mortem PCR tests, large household surveys, satellite imagery of cemeteries, obituary notifications, funeral data, cremation counts, burial numbers, death insurance claims, verbal autopsies, and investigative journalism10,29,30,31,32,33,34,35,36. Although useful, many of these alternative methods remain resource-intensive, and thus prompt a need for frugal fact-finding approaches in addition to traditional data-collection techniques.

To that end, the wisdom of crowds approach may potentially be useful in estimating excess mortality. The wisdom of crowds refers to the counterintuitive accuracy of aggregated cognitive estimates. Cognitive estimation is the human ability to provide reasonable answers to questions for which specific answers are not readily available37,38,39,126. Many everyday human behaviors depend on successful cognitive estimation (e.g., planning out how many clothes to pack for a trip). Such everyday cognitive estimation scenarios tap into a range of psychological processes such as reasoning, working memory, cognitive flexibility, mental imagery, and problem-solving40,41,42,43,44,45. Cognitive estimation has therefore been applied across clinical populations to assess patients with brain lesions and other psychiatric conditions46,47,48,49,50,51,52,53,54,55,56. Such neuropsychological studies have made progress in describing the neural underpinnings of cognitive estimation. Parallel research efforts have investigated cognitive estimation in the context of education and neurocognitive developmental disorders57,58,59,60,61. Such research has shown how the accuracy of cognitive estimation is sensitive to experiential factors including socioeconomic status, reading habits, quality of education, and media exposure62. Even though individual human judgment and decision-making are often biased and susceptible to influence from a range of cognitive, emotional, socioeconomic, and political factors27,63,64,65,66, a growing body of research points to the wisdom of crowds. This frugal method has been widely used across multiple domains including neuropsychology, business, finance, economics, election polling, public policy, and global geopolitics37,39,59,67,68,69,70,71,72,73,74,75,76,77. Specifically, in the context of epidemiology, public health surveillance, and the COVID-19 pandemic, this method has been used to predict future outbreaks, vaccination uptake, disease caseload, infection hotspots, and overall disease severity72,78,79,80,81,82,83,84,85,86. Despite the potential of cognitive estimation, the utility of this method has not been widely tested to estimate pandemic-associated excess mortality, a gap this study aims to fill.

In this study, we investigated whether COVID-19-related excess mortality estimates using multiple methods were similar to each other. We focused on three conventional statistical and epidemiological models: (a) a simple averaging technique18,87, (b) the Farrington surveillance algorithm20, and (c) an overdispersed Poisson model88, along with two novel frugal methods to estimate COVID-19-related excess mortality: (d) analyzing media reports about discrepancies between official mortality data and death compensation claims, and (e) the wisdom of crowds public surveying. Similar estimates obtained from different methods would establish the use of frugal methods in estimating pandemic-related excess mortality and other unknown public health-related statistics, especially in resource-constrained settings.

In our case study, we focused on the Pune Municipal Corporation region (henceforth simply referred to as ‘Pune’) in the state of Maharashtra, India. Pune is the eighth-largest metropolitan area of India with a population of 5 million people89. Of them, around 40% of inhabitants (~ 2 million) reside in urban slums90, with an additional floating population of migrants from surrounding rural areas. Between 1st January 2020 and 31st December 2021, Pune officially reported more than half a million COVID-19 cases and 9093 COVID-19 deaths over two successive waves (Fig. 1, Fig. S1)91. Despite a large number of officially reported cases and deaths, Pune is one of the few large Indian cities for which pandemic-associated excess mortality has not been determined. We, therefore, estimated COVID-19-related excess deaths in Pune. Our results point to an estimated 14,770 excess deaths [95% CI: 9820–22,790] in Pune from March 2020 to December 2021, of which 9093 were officially counted as COVID-19 deaths. This translates to an estimated undercount factor—the ratio of estimated excess deaths to officially reported COVID-19 deaths—of 1.6 [95% CI: 1.1–2.5] for Pune from March 2020 to December 2021. In other words, we estimate that Pune experienced 60% more COVID-19-related deaths than officially reported. We found that excess death estimates from diverse methods—both conventional and frugal—were within the margins of error of each other. Thus, our study provides evidence about excess death estimates from diverse approaches and demonstrates the utility of non-traditional frugal methods such as the analysis of media-reported death compensation claims and the wisdom of crowds in estimating excess mortality. Our study also reinforces the potential of collective cognitive estimation as an untapped theoretical avenue for computational social science, neuroscience, cognitive and behavioral science, and other life sciences. Finally, our study highlights the practical importance of the wisdom of crowds and other frugal estimation methods in generating equitable solutions to credible fact-finding, especially in resource-constrained settings where robust data infrastructures are unaffordable.

Figure 1
figure 1

Officially reported COVID-19 incidence and deaths in Pune. Top: Daily COVID-19 incidence in Pune from March 2020 through December 2021. Bottom: Monthly COVID-19 deaths reported in Pune from March 2020 through December 2021.

Methods

We adopted a multi-method approach to estimate COVID-19-related excess deaths in Pune from March 2020 to December 2021 by combining estimates from three methods: (a) statistical and epidemiological modeling with pre-pandemic mortality data, (b) analyzing media reports about discrepancies between official mortality data and death compensation claims, and (c) wisdom of crowds public surveying. Within statistical and epidemiological methods, we used three models: (a) a simple averaging technique18,87, (b) the Farrington surveillance algorithm20, and (c) an overdispersed Poisson model88. Multi-method approaches help mitigate the flaws and biases inherent to any particular method. Piecing together data from different sources improves our understanding of the pandemic10,16,92,93. Different methods often reflect different approaches to answering the same question, and thus may produce conflicting estimates. Rather than identifying any single “best” method, multi-method approaches combine diverse sources to produce a collective estimate that is typically more accurate than estimates from individual models. Combining estimates minimizes the pitfalls of relying on any particular individual model, and it can offset statistical bias, potentially canceling out overestimation and underestimation94,95,96. More broadly, multi-method approaches reflect an epistemic commitment to diverse viewpoints97. They highlight how the voice of diverse stakeholders may be critical to establishing the ground truth98. This is especially relevant in the context of COVID-19 where considerable debate exists about officially reported mortality figures3,12,19,99,100,101,102,103,119. Next, we briefly describe various methods used in this study to estimate COVID-19-related excess deaths in Pune. All methods were carried out in accordance with relevant guidelines and regulations. All experimental protocols were approved by Carnegie Mellon University’s Institutional Review Board (Registration No.: IRB00000352).

Statistical modeling with pre-pandemic all-cause mortality data

To estimate COVID-19-related excess mortality, researchers conventionally use various statistical modeling techniques ranging from simple averaging and linear regression to more sophisticated methods such as Monte-Carlo simulations, Poisson models, and other machine learning models4,10,11,14,87,104. Other researchers have estimated excess deaths by extrapolating from more traditional epidemiological measures such as serosurveillance data, infection fatality rate, the overall population’s susceptibility to the virus, the protection offered by vaccination, and the chances of reinfection10,16,87,106. Most statistical and epidemiological models computed excess deaths by estimating the number of expected deaths based on pre-pandemic trends5. However, different models varied widely in their assumptions and choice of relevant real-world parameters. Some researchers used simple averaging techniques to establish a baseline of expected all-cause mortality18,87. Although useful, such simple approaches lack flexibility and robustness because they ignore real-world factors including seasonality, population growth, and contemporary trends of mortality. Epidemiologists addressed these limitations using more sophisticated methods such as widely adopted Poisson and quasi-Poisson models that include parameters such as population growth, seasonality, and recent temporal trends of mortality4,7,9,20,24,105. Such models trace their roots to the “classical” Farrington surveillance algorithm that has been extensively used across diverse public health settings over the past three decades20,106,107,108. This approach remains a reference point for many of the improved and extended Poisson-related models that have since been developed109,110.

Our three statistical models used a dataset about monthly all-cause mortality in Pune from January 2014 through December 2021 (Fig. 2). This dataset was provided to the Jnana Prabodhini Foundation by the Pune Knowledge Cluster, a national-level Science and Innovation Cluster set up by the Office of the Principal Scientific Advisor, Government of India91. A formal memorandum of understanding (MoU) of institutional collaboration was signed between the Jnana Prabodhini Foundation and the Pune Knowledge Cluster to ensure responsible data-sharing and upholding privacy standards. The Pune Knowledge Cluster ultimately obtained this dataset from the Pune Municipal Corporation Health Office’s death certificate registration data. Besides estimating excess deaths (Eq. 1), we also computed the undercount factor, the ratio of excess deaths to officially reported COVID-19 death figures (Eq. 2)17. Next, we describe the three statistical models we used in this study.

$$excess\; deaths = total \; reported\; deaths - expected\; deaths$$
(1)
$$undercount\; factor = \frac{excess \;deaths }{{reported \;\text{COVID-19} \;deaths}}$$
(2)
Figure 2
figure 2

Officially reported monthly all-cause deaths in Pune from January 2014 through December 2021.

Simple average model

We used a simple nonparametric model18,87 to compute COVID-19-related excess deaths (Eq. 3). The expected deaths for each month from March 2020 to December 2021 were calculated as the mean number of total deaths recorded during that month for the previous six years. We also calculated the associated 95% prediction intervals [μ ± Zσ] where μ is the mean expected estimate and σ is the standard deviation around the predicted estimate. We set Z = 1.96, the 97.5th percentile of a standard normal distribution. Negative values, where observed counts were below the expected thresholds, were set to zero. This method assumes that the number of deaths is effectively constant over time and that the underlying data are independent and identically distributed (i.i.d.). See Supporting Information for further methodological details and an evaluation of model assumptions.

$$expected\; deaths = \frac{1}{6}(M_{2014} + M_{2015} + M_{2016} + M_{2017} + M_{2018} + M_{2019} )$$
(3)

where Mi is the number of deaths in month M of year i

Farrington surveillance algorithm

We implemented the Farrington surveillance algorithm20, a quasi-Poisson regression model that accounts for seasonality (Eq. 4), to compute the expected deaths for each month from March 2020 to December 2021. This model was implemented using the surveillance package in the R programming language105,111. As is standard practice, the lower bound for the margin of error of the Farrington surveillance algorithm was computed using a one-sided 95% prediction interval. The upper bound was computed using average expected deaths. Negative values, where observed counts were below the expected thresholds, were set to zero9. This method assumes that the number of deaths is effectively constant over time. See Supporting Information for further methodological details.

$$expected\; deaths = e^{{\left( {\alpha + \beta \cdot M} \right)}}$$
(4)

where ɑ and β account for a seasonal variation in deaths, and M is measured in months.

Overdispersed Poisson model

We implemented an overdispersed Poisson model that accounts for population growth in addition to seasonal variation in deaths (Eq. 5)88 to compute the expected deaths for each month from March 2020 to December 2021. This model was implemented using the excessmort package in the R programming language104. We obtained estimates about Pune’s monthly population from the World Population Review89. We report the associated 95% prediction intervals [μ ± Zσ] where μ is the mean expected estimate and σ is the standard deviation around the predicted estimate. We set Z = 1.96, the 97.5th percentile of a standard normal distribution. Negative values, where observed counts were below the expected thresholds, were set to zero. See Supporting Information for further methodological details and an evaluation of model assumptions.

$$expected \;deaths = P_{M} \cdot e^{{(\alpha_{M} + s_{M} )}}$$
(5)

where PM is the population in month M, ɑM is a gradual trend accounting for the increasing life expectancy, and sM is a seasonal trend accounting for a seasonal variation in deaths.

Analyzing media reports about discrepancies between official mortality data and death compensation claims

Governmental bodies across the world including India’s National Disaster Management Authority have implemented ex gratia monetary compensation policies targeted at households who lost family members to COVID-19101,112. Such policies often employ liberal definitions of COVID-19 mortality, thus counting some of the COVID-19 deaths that may have been missed for various reasons3,4,7,8, such as deaths that had occurred within a month of suffering from COVID-19 as well as the deaths of patients who did not possess positive RT-PCR (reverse transcriptionpolymerase chain reaction) tests, but nevertheless displayed other indicators of likely COVID-19 infection including positive antibody tests and HRCT (high-resolution computed tomography) chest scans113. We analyzed reports from the Times of India113, one of India’s most-circulated daily newspapers, about the number of COVID-19 death compensation claims filed by households that lost family members to COVID-19. We treated this number as the estimated COVID-19-related excess deaths (Eq. 3). We then computed the undercount factor as the ratio between the number of registered COVID-19 death compensation claims and the number of officially reported COVID-19 deaths (Eq. 4). Unlike statistical modeling, our analysis of death compensation claims only provides a point estimate of excess deaths. However, to heuristically estimate the margin of error associated with our point estimate, we further computed undercount factors for other cities in Maharashtra. Together, these cities constitute a fifth of Maharashtra state’s population and almost half of Maharashtra’s urban population. We calculated the standard error for the undercount factors, thus generating a range of plausible undercount factors for cities in Maharashtra [se = σ/√n where σ is the standard deviation across these cities and n is the number of cities]. This standard error was used to compute a 95% confidence interval for Pune [μ ± Z*se] where μ is the estimated undercount factor for Pune. We set Z = 1.96, the 97.5th percentile of a standard normal distribution. The lower and upper bounds of this confidence interval were multiplied by the number of reported COVID-19 deaths to compute plausible lower and upper estimates of excess COVID-19-related deaths in Pune. See Supporting Information for alternative heuristics of computing plausible lower and upper estimates of the undercount factor for Pune.

$$excess\; deaths = reported \;\text{COVID-19} \;death \;compensation \;claims$$
(6)
$$undercount \; factor = \frac{ reported \;\text{COVID-19}\; death \;compensation \;claims}{{reported\; \text{COVID-19} \;deaths}}$$
(7)

Wisdom of crowds public surveying

We conducted an online wisdom of crowds survey in Pune to obtain COVID-19-related excess death estimates. Ethics approval for this survey was obtained from Carnegie Mellon University’s Institutional Review Board (Registration No.: IRB00000352). Only adults participated in the survey and completed a digital consent form before proceeding to the survey questionnaire. Thus, we confirm that informed consent was obtained from all participants. We did not collect identifying or potentially identifying information about survey respondents. We deployed the survey from 8 January 2022 to 8 February 2022. Participants responded to the survey hosted on the SurveyMonkey platform (now Momentive) in either Marathi or English. We employed a sample-of-convenience snowball-sampling method and promoted the survey via social media platforms such as WhatsApp and Facebook. 280 adult residents of Pune participated in a COVID-19-related Knowledge, Attitudes, and Practices (KAP) survey (Table S2)27. Survey respondents were asked COVID-19-related questions including: “As of January 1, 2022, there have been 9,117 COVID-19 deaths in Pune during the pandemic. This data is from official government figures released by Pune Municipal Corporation (PMC). What do you think is the true number of COVID-19 deaths in Pune (as of January 1, 2022)? Please choose a number between 0 and 90,000.” The average cognitive estimate obtained from public surveying, that is, the collective guess about the “true number of COVID-19 deaths” was considered to be the number of excess COVID-19-related deaths (Eq. 5). We computed the undercount factor as the ratio between the collective cognitive estimate of the speculated true number of COVID-19 deaths and the number of officially reported COVID-19 deaths (Eq. 6). We calculated the standard error [se = σ/√n] and used it to compute the 95% confidence interval [μ ± Z*se] where we set Z = 1.96, the 97.5th percentile of a standard normal distribution.

$$excess \;deaths = collective \;cognitive \;estimate \;of\; COVID - 19\; deaths$$
(8)
$$undercount \;factor = \frac{ collective\; cognitive \;estimate\; of\; COVID - 19 \;deaths}{{reported \;COVID - 19 \;deaths}}$$
(9)

Aggregate estimate

We combined five COVID-19-related excess deaths and undercount factors obtained from different methods: (a) the simple averaging technique, (b) the Farrington surveillance algorithm, (c) the overdispersed Poisson model, (d) analyzing media-reported death compensation claims, and (e) the wisdom of crowds public surveying. We used a simple bootstrap to generate a plausible range of excess deaths and undercount factors for Pune. We first randomly sampled from the distributions generated by each of the five different methods. For all methods except the wisdom of crowds, we conducted sampling assuming a normal distribution. For the wisdom of crowds, we did not have any such assumption and conducted sampling from the raw survey data. We conducted 10,000 iterations of such random sampling with replacement and used the resulting 10,000 means to compute a 95% confidence interval. See Supporting Information for further methodological details.

Results

We used a multi-method approach to compute COVID-19-related excess death estimates in Pune from March 2020 to December 2021 compared to the 74,289 total reported deaths during this time. We also computed the undercount factor in this period, that is, the ratio of estimated excess deaths to the 9,093 officially reported COVID-19 deaths. Table 1 and Fig. 3 present a summary of excess death estimates and undercount factors estimated from all different methods in this study. All estimated expected deaths and excess deaths have been rounded to the nearest 10 to avoid a false sense of precision.

Table 1 Estimated expected deaths, excess deaths, and undercount factors during the COVID-19 pandemic in Pune.
Figure 3
figure 3

Undercount factor computed from COVID-19-related excess deaths in Pune. The margin of error is the 95% PI for the statistical models: simple average, Farrington surveillance algorithm (one-sided), and overdispersed Poisson model. It is the 95% CI for the analysis of death compensation claims from media reports, the wisdom of crowds public surveying, and the aggregate estimate. An undercount factor of 1 represents an ideal scenario where all estimated excess deaths can be attributed to officially reported COVID-19 mortality.

First, we used three types of statistical models. Based on the pre-pandemic trends, the simple average model estimated 53,790 expected deaths (95% PI: 41,230–64,230). Therefore, the estimated COVID-19-related excess deaths were 20,490 (95% PI: 10,050–33,050) (Fig. 4A). Compared to the estimated excess deaths, the 9,093 officially reported COVID-19 deaths were an undercount of 2.3 (95% PI: 1.1–3.6). However, the simple averaging model did not incorporate seasonal variation in deaths. Accounting for seasonal variation, the Farrington surveillance algorithm estimated 65,090 expected deaths (one-sided 95% PI: 54,390–65,090). Therefore, this method revealed 9,200 estimated excess deaths (one-sided 95% PI: 9,200–19,900) with an undercount factor of 1.01 (one-sided 95% PI: 1.01–2.2) (Fig. 4B). In addition to seasonal variation, the overdispersed Poisson model accounted for population growth and estimated 59,110 expected deaths (one-sided 95% PI: 45,200–68,300), implying 15,180 estimated excess deaths (95% PI: 5,990–29,090) with an undercount factor of 1.7 (95% PI: 0.7–3.2) (Fig. 4C).

Figure 4
figure 4

Results from three statistical models: A) the simple average model, B) the Farrington surveillance algorithm, C) and the overdispersed Poisson model. The dotted lines show the expected deaths (estimated from the statistical models) in Pune, the green lines show the officially reported all-cause deaths in Pune, and the gray bands show the 95% PI (one-sided for the Farrington surveillance algorithm).

Second, we analyzed media reports about discrepancies between official mortality data and the number of COVID-19 death compensation claims filed by the public. As of January 2022, residents of Pune had filed around 13,000 death compensation claims113, which served as the estimated COVID-19-related excess deaths in Pune based on media reports. Compared to the officially reported mortality, this figure was an undercount factor of 1.4. Using the same media reports113, we additionally computed excess deaths and undercount factors for other major cities in Maharashtra. Table 2 represents a summary of death compensation claims filed at different major cities in Maharashtra and the resultant undercount factors of COVID-19-related excess deaths. Finally, we used the undercount factors from cities in Maharashtra to compute a 95% confidence interval for Pune. Our analysis of media reports about discrepancies between official mortality data and the number of COVID-19 death compensation claims filed by the public point to an estimated 13,000 excess deaths [95% CI: 6,910–19,100] in Pune from March 2020 to January 2022 (Table 1), implying an undercount factor of 1.4 [95% CI: 0.8–2.1].

Table 2 Discrepancies between filed death compensation claims and officially reported deaths and estimates of undercount factors during the COVID-19 pandemic in major cities in Maharashtra as of January 2022.

Third, we conducted a wisdom of crowds survey to obtain cognitive estimates about pandemic-associated excess mortality. Cognitive estimates for excess deaths were diverse, with a sixth of survey respondents believing the official COVID-19 numbers were in fact an overestimate (Fig. 5). However, the crowd estimated that the true number of COVID-19 deaths in Pune was 18,900 [95% CI: 16,930–20,880], which served as the estimated COVID-19-related excess deaths. In other words, the crowd estimated an undercount factor of 2.1 [95% CI: 1.9–2.3].

Figure 5
figure 5

Results from the wisdom of crowds public survey. N = 280.

Finally, we used a simple bootstrap to combine estimates from different methods and computed an aggregate estimate of COVID-19-related excess deaths in Pune (Fig. S11). Aggregately, our results estimate 14,770 excess deaths [95% CI: 9,820–22,790] in Pune from March 2020 to December 2021, translating to an undercount factor of 1.6 [95% CI: 1.1–2.5].

Discussion

In our case study, we computed COVID-19-related excess death estimates for Pune. To our knowledge, this is the first such effort; therefore, our results provide new information that can inform the public health policy of Pune. Using multiple methods, we estimated 14,770 excess deaths [95% CI: 9,820–22,790] in Pune from March 2020 to December 2021, of which 9,093 were officially counted as COVID-19 deaths. We further calculated the undercount factor, a metric that allowed for easy comparison of the differential impact of the pandemic across diverse geographical regions and socioeconomic groups2,13,21,113. We estimated an undercount factor of 1.6 [95% CI: 1.1–2.5] for Pune from March 2020 to December 2021. Thus, we estimated excess COVID-19-related deaths were about 60% more than officially recorded. An undercount factor of 1 implies that all the estimated excess deaths can be attributed to officially reported COVID-19 mortality. This represents an ideal scenario where public health infrastructures are robust and resilient enough to maintain complete and high-quality data, even during acute crisis events such as pandemics. However, this ideal scenario was rarely achieved globally and across major Indian cities, where the estimated undercount factors were around three (Table S1)10,14,15,22,156. Even some of the world’s best healthcare systems saw undercount factors around 1.5 (Fig. S2 in Supporting Information)8,113. Based on our results, Pune’s performance in this regard seems comparable to some of the leading healthcare systems across the world, with its public health data recording infrastructure proving to be fairly robust and resilient during the COVID-19 pandemic115,116.

In addition to providing novel public health information about Pune, our main goal was to investigate whether diverse methods of estimating pandemic-related excess deaths provided us with accurate and overlapping statistical estimates. We computed COVID-19-related excess deaths and undercount factors from five different methods: (a) the simple averaging technique, (b) the Farrington surveillance algorithm, (c) the overdispersed Poisson model, (d) analyzing media-reported discrepancies between official mortality data and death compensation claims, and (e) the wisdom of crowds public surveying. Despite their limitations, diverse methods—both conventional and frugal—produced excess deaths estimates and undercount factors that were within the margins of error of each other. Results from all models except from the Farrington surveillance algorithm point towards a similar conclusion about the COVID-19-related undercount factor for Pune. These findings can inform Pune's public health policy—for future pandemics or health crises, decision-makers could assume a worst-case scenario and prepare for up to 2.5 times (upper limit of the 95% confidence interval associated with our aggregate estimate) the reported number of pandemic-caused deaths. Our results reinforce the strength of using multi-method approaches to triangulate the true extent of the impact of the COVID-19 pandemic. By combining conventional and novel frugal methods of estimating pandemic-associated excess mortality in a multi-method approach, we minimized the pitfalls of relying on any particular individual method86,95,96,97,98,117,118. Our findings can have important implications, especially in resource-constrained settings, where robust and resilient data infrastructures tend to be lacking or limited, and in contexts where considerable debate exists about the underlying ground truth1,2,3,12,19,26,27,28,99,100,101,102,103,119. Particularly with the COVID-19 pandemic, there are widespread concerns about the accuracy of officially reported COVID-19-related deaths3,4,5,6,7,8. Our study adds to a growing body of COVID-19-related excess mortality literature that emerged in response to such skepticism about the accuracy of officially reported pandemic casualties3,4,5,7,9,10,11,12,13,14,15,22. Future research efforts could focus on other untapped frugal alternatives such as analyzing discrepancies between COVID-19 cremation counts and officially reported COVID-19 mortality data157,158.  Our preliminary results from this method for Pune suggest consilience with the other methods we employed in our study (Table S3). However, these preliminary results are based on a temporally restricted dataset about COVID-19 cremation counts, and a more complete dataset is needed to ascertain the robustness of this method.

Within our multi-method approach, we employed three conventional statistical and epidemiological models that have been previously widely used to compute COVID-19-related excess mortality. These methods are often considered the gold standard of excess mortality estimation because of their interpretability and inclusion of multiple epidemiologically relevant real-world factors including seasonality, population growth, and contemporary trends of mortality4,5,6,7,8,9,10,11,12,13,14,15,16,17,18,19,20,21,22,23,24,25. Therefore, our results from these methods represent important benchmarks to examine the effectiveness of the novel frugal methods we used. However, these conventional statistical and epidemiological models rely on high-quality all-cause pre-pandemic data that is only accessible in robust and transparent public health data recording systems. The performance of these models suffers in the absence of such data. One limitation of our study was the low granularity of our dataset; it included only monthly—not weekly or daily—data. Future research efforts can address this limitation by using high-granularity datasets. Additionally, although Pune is estimated to have high pre-pandemic death registration coverage18,120, our study did not account for fluctuations in death registration coverage during the COVID-19 pandemic. Future work should use indirect proxy estimates of fluctuations in death registration coverage that can be computed from relevant public health and demographic data such as birth registration coverage, the incidence of traffic accidents, and surveillance of other infectious diseases such as AIDS and tuberculosis (Table S4)18,121,122,123,124.

Two of the statistical models we used: a) the simple averaging technique and b) the Farrington surveillance algorithm did not incorporate underlying population data, and therefore can be readily deployed when these data are non-existent or difficult to obtain due to monetary, bureaucratic, and time constraints. An additional strength of the simple averaging technique is its ease of implementation. This method does not require computer programming knowledge, thus increasing its potential for widespread applicability in low-resource and data-scarce settings. Both the simple averaging technique and the Farrington surveillance algorithm assumed that the pre-pandemic number of deaths was effectively constant over time. We assessed this assumption for both models (see Supporting Information). Even though there was a slight yet significant increase in mortality over time (Fig. S4), both models showed relatively robust performance despite this violated assumption (Fig. S5, Fig. S6, Fig. S7, and Fig. S8). Robust model performance depended upon the amount of underlying data used—both models required monthly data across at least four years. The overdispersed Poisson model incorporated underlying population data to account for fluctuations in mortality rates over time and thus did not assume that the pre-pandemic number of deaths was effectively constant over time. It also accounted for sustained indirect effects that both the simple averaging technique and the Farrington surveillance algorithm lacked the power to detect88, thereby offering more flexibility and robustness compared to these two models. Finally, the overdispersed Poisson nature of this model allowed it to capture more variance than predicted by a Poisson model. This makes it well-suited to our dataset of monthly reported all-cause mortality (mean = 2,687; variance = 418,337).

In addition to using statistical and epidemiological models, we also analyzed media reports about discrepancies between official mortality data and death compensation claims. To our knowledge, our study is the first effort to use this frugal method to estimate pandemic-associated excess deaths. The analyses in this method were possible only because of the availability of data about death compensation claims filed by the public under India’s ex gratia monetary compensation policy that employed a liberal interpretation of pandemic-associated mortality101,113,159,160. However, this policy may have led to somewhat inaccurate estimates of excess mortality due to the submission of fraudulent documents or the double counting of deaths in neighboring jurisdictions 125,159,160. Nonetheless, this frugal method remains an important component of multi-method approaches to estimating excess COVID-19-related deaths, given the checks and balances implemented by the government to ensure accurate relief disbursement113. Future research should use disaggregated and officially verified ex gratia death compensation data to compute more precise estimates of pandemic-associated excess mortality.

Finally, we examined the effectiveness of another frugal method—the wisdom of crowds approach—to estimate COVID-19-related excess mortality. Although this approach has been widely used across multiple real-world domains before, including during the COVID-19 pandemic37,38,86,126,127, to our knowledge, this frugal method has not yet been used to estimate COVID-19-related excess mortality. Therefore, our study provides a novel confirmation of the potential of the wisdom of crowds approach as a complementary tool of frugal fact-finding. However, the results from our wisdom of crowds public survey should be interpreted with caution, because collective cognitive estimates may be biased, sometimes resulting in herding, mob mentality, informational echo chambers, and widespread proliferation of unscientific opinions128,129,130,131,132,133,134,135,136,137. Nonetheless, these limitations can be overcome by integrating findings from judgment, decision-making, behavioral economics, and cognitive science that highlight how domain-general psychophysical representations and Bayesian mechanisms may account for many of the systematic mistakes observed in cognitive estimation across many real-world contexts137,138,139,140,141,142,143,144,145,146,147,148,149,151. These findings suggest that domain-general processes account for many of the quirks of human estimation, judgment, and decision-making. Accounting for such general psychophysical factors and other cognitive biases can greatly improve the accuracy, robustness, and effectiveness of the wisdom of crowds approach150. For example, in our study, we were able to partially mitigate the biases introduced due to social and peer influence127,128,130,151 by conducting an online, anonymous public survey. In addition to being a non-WEIRD (Western, Educated, Industrialized, Rich, and Democratic) population152, our survey sample of adult residents from Pune was diverse in terms of gender, age, native language, occupation, socioeconomic status, and COVID-19 infection history (Table S2). These study participants also displayed heterogeneous COVID-19-related beliefs and behaviors. Thus, the diversity, decentralization, and independence of opinions126 in our sample may have mitigated some of the inaccuracies stemming from demand characteristics and response biases. In our future work, we plan to explore how diverse COVID-19-related psychological perceptions influence cognitive estimates about COVID-19-related deaths, thus adding to a rapidly growing literature about cognitive estimation and the wisdom of crowds.

Our findings confirm that, like most other places, officially reported COVID-19 mortality in Pune was an underestimate. These findings highlight the limitations of public health infrastructures in capturing plentiful, high-quality, and timely data during unpredictable black swan events such as the COVID-19 pandemic153. To address these limitations, strong health data systems are needed to inform healthcare utilization planning, resource allocation, and policymaking to ensure healthy living and promote well-being for all (UN Sustainable Development Goal 3)154. Robust data systems also permit post-mortem evaluations of pandemic mitigation measures including vaccinations and public lockdowns156. To prepare for future pandemics, resilient public health systems require sustained material investments in vital infrastructure and medical equipment, as well as the availability of credible, open-source, and high-quality data (UN Sustainable Development Goal 17.19)154. The success of these initiatives will depend on both long-term material investments in vital infrastructure and medical equipment, as well as the availability and abundance of credible, open-source, high-quality data. Therefore, governments, think tanks, research universities, non-profits, industry actors, the media, and other relevant stakeholders have an onus to build and maintain robust data collection and storage infrastructures. This will support wider aims of sensitive societal governance, public accountability, and memorialization of one of the largest public health crises the world has collectively faced in over a century1,2.