Standards are required to ensure that population-scale testing for infectious agents is accurate and reliable. The COVID-19 pandemic has illustrated the importance of well-characterized reference materials that ensure a test is fit-for-purpose, of proficiency testing schemes that evaluate laboratory performance, and of information standards for clear communication of test results. Standards are a simple, inexpensive and proven method to assure the performance of a reliable testing enterprise at the massive scale and diversity needed during any pandemic.

At the beginning of the COVID-19 outbreak, the Coronavirus Standards Working Group (CSWG) — a network of academic, government and industry scientists — was established to advocate for the use of standards in SARS-CoV-2 testing. Here we describe the CSWG’s collective experiences, outcomes and recommendations since early 2020. We propose that the development and dissemination of standards is a cost-effective strategy that can broadly improve SARS-CoV-2 testing worldwide and should be prioritized as a key early step in the public health response to future pandemics.

Reference materials for SARS-CoV-2 testing

Testing measures the presence of an analyte in a sample, such as the presence of the SARS-CoV-2 RNA genome within a respiratory sample. All measurement results have uncertainty, and well-characterized reference materials can put results on a common scale and evaluate that uncertainty. This process is essential both to assess that tests are fit-for-purpose and to interpret test results across studies (Fig. 1). Reference materials can be categorized in four ways.

Fig. 1: Standards needed for the SARS-CoV-2 testing process.
figure 1

ac, Schematic diagram illustrating the steps in the molecular (a), antigen (b) and serological (c) testing process, which can be classified into pre-analytical (yellow), analytical (green) and post-analytical (blue) stages. Additional test development (yellow) is performed prior to manufacture. The lower panel describes the range of standards available for validating test development, processes and results.

‘Primary reference materials’ are assigned qualitative and quantitative properties by a recognized authority without reference to other materials. For example, the World Health Organization (WHO; Geneva) Expert Committee on Biological Standardization established its primary International Standards for SARS-CoV-2 testing in December 2020, with these reference materials used to define the international units1,2. These International Standards can then be used to calibrate ‘secondary reference materials’, which are more routinely used during test and laboratory validation. ‘Natural reference materials’ are derived from natural sources, such as clinical patient samples, that have been well-characterized using different methods. Although natural materials match the complexity and challenges of a patient sample, they are often finite in quantity and difficult to manufacture at scale with reliability3. Natural reference materials should ideally match the viral and antibody titers of final use cases that can vary across the course of infection, among individuals and target populations4. For example, SARS-CoV-2 tests that were originally validated using hospitalized patient samples with high viral titers performed markedly worse when used to screen mildly or asymptomatic individuals for which few available reference materials were initially available5.

During the initial stages of the pandemic, laboratories faced a major challenge in sourcing natural reference materials to evaluate testing, with patient samples also needed for competing research and therapeutic needs (Fig. 2). Although some large laboratories could leverage established clinical collaborations to source reference patient materials, many other laboratories faced difficulties sourcing reference patient materials needed to verify tests6. The coordinated and equitable dissemination of reference patient materials during the early stages of the pandemic would have not only accelerated the deployment of tests, but also provided an early opportunity to harmonize test performance among different laboratories.

Fig. 2: Schematic diagram illustrates key milestones in the development of testing and standards during the COVID-19 pandemic.
figure 2

Understanding these milestones can assist in preparing for current and future public health emergencies.

The rapid development of ‘synthetic reference materials’ can provide an interim solution in the absence of reference patient materials. Synthetic reference materials, including engineered viruses, bacteriophage, synthetic SARS-CoV-2 complementary DNA and RNA genomes and fragments, and recombinant proteins, were rapidly developed following the publication of the SARS-CoV-2 reference genome, enabling the analytical validation of tests in countries, even before the first reported cases of SARS CoV-2 virus had emerged7. However, synthetic materials may be poor surrogates for assessing pre-analytical variables, such as sampling and transport conditions, that can markedly impact test performance8; in addition, synthetic materials may need to be further spiked into buffer or negative specimens to contrive a specimen-like matrix.

During the early stages of the pandemic, due to concerns about access and scaling of testing (Fig. 2), regulatory agencies, such as the US Food and Drug Administration (FDA) and European Medicines Agency (EMA), initially permitted the use of synthetic reference material to validate tests; however, this was found to result in uneven testing performance. Today, these agencies require natural reference materials for validation and authorization or clearance9. For further analytical validation, the FDA also developed a panel of cultured virus reference materials that were shared with developers who submitted tests for emergency use authorization (EUA) to assess the limit of detection10.

Molecular testing for SARS-CoV-2 genes

Several molecular tests are used to detect the SARS-CoV-2 RNA genome, using methods such as reverse transcription–quantitative PCR (RT-qPCR) and digital PCR, loop-mediated isothermal amplification, next-generation sequencing and other nucleic amplification methods11. Molecular tests typically use respiratory-tract samples that are particularly susceptible to pre-analytical variables.

Numerous synthetic and natural reference materials have been developed to verify molecular testing protocols. Given the sensitivity of molecular tests, laboratories must take care to use negative controls to mitigate false-positive test results arising from contamination by synthetic SARS-CoV-2 reagents or previous tests that result in false-positive clinical results12. At early stages of the pandemic, the US Centers for Disease Control distributed RT-qPCR tests that were contaminated with target templates, requiring new test kits to be developed and issued that resulted in delays during critical early stages of the pandemic.

Although molecular tests have the potential to measure viral abundance, they must first use reference materials to evaluate their quantitative performance, including the quantitative limit of detection, linearity and uncertainty. Although the RT-qPCR cycling threshold score can be used to estimate viral abundance, this abundance can vary markedly between instruments and laboratories, and harmonization to shared-in-common reference materials is needed for quantitative comparisons of RT-qPCR results1; in absence of this, viral load cannot be considered in clinical stratification and care

Reference materials must also be regularly updated to reflect the diversity of SARS-CoV-2 variants circulating within a population. New genetic variants can interfere with RT-qPCR primer or probe binding and result in false-negative testing results. Although multiplex testing for several gene targets can mitigate the impact of a single variant, probes and primers require ongoing verification to ensure the continued validity of a molecular test, and the FDA routinely monitors the predicted impact of variants on the performance of EUA authorized tests13.

Antigen testing for SARS-CoV-2 proteins

Antigen tests employ lateral-flow or enzyme-linked immunosorbent assays to directly detect the presence of viral proteins and are typically less sensitive than molecular tests, detecting SARS-CoV-2 across a narrower window during the viral infectious course (although repeated serial testing may mitigate this lower sensitivity)14. However, antigen tests are inexpensive to manufacture, can be used outside of a clinical laboratory, and can rapidly return results. These widely deployed tests are undoubtedly useful but the irregularity of reporting of results has created a blind spot in public health knowledge of disease prevalence.

Many laboratories reported that the performance of antigen tests differed markedly from the manufacturers’ declarations, and independent validation with reference materials was needed to confirm test performance15. Antigen tests can be evaluated using inactivated viruses, but their performance is more typically measured by comparison to results from previously authorized RT-qPCR tests9. However, relying on evaluation by positive agreement to a comparator test can be problematic, as it can propagate inaccuracies, differences or limitations that are present in the benchmark RT-qPCR method.

Antigen tests have been promoted as a viable method to realize population-scale testing; however, this proposal remains controversial14. An antigen test widely used in a pilot program to evaluate whether population-scale testing could curb rates of infection in Liverpool, UK, was criticized for poor sensitivity and found to miss almost half of the individuals who otherwise tested positive using RT-qPCR16. The field performance of this antigen test was markedly lower than the manufacturer’s declaration. This illustrates not only the need for independent test validation with appropriate reference materials to understand limitations, but also the challenge of evaluating the performance of tests undertaken by untrained personnel outside of laboratories, where additional and diverse variables can impact performance.

Serology testing for a COVID-19 immune response

The presence of antibodies in an individual’s blood that are reactive to SARS-CoV-2 proteins can be detected by serology tests, which can measure the avidity, duration and composition of different reactive antibody isotypes elicited by previous infection or vaccination. However, serology assays must calibrate using reference materials for comparison of antibody measurements between individuals, across time, and in response to different variants.

Reference materials for COVID-19 serology tests are typically derived from convalescent patient serum. The WHO International Standard, prepared and supplied by US National Institute for Biological Standards and Control (NIBSC), comprises a pool of convalescent plasma from recovered COVID-19 patients, with plasma from healthy donors collected before the pandemic serving as a negative control2. The WHO assigned an arbitrary unit to the reference materials to establish the international units for neutralizing antibodies (for example, international units (IU) ml–1) and binding assays (for example, binding antibody units (BAU) ml–1). Traceability to these international units can standardize quantitative serology measurements needed for comparisons of clinical trial outcomes for vaccines, define consensus antibody titer thresholds, and measure correlates of protection against COVID-19 (ref. 17).

Standardization for serological testing also underpins reproducible research in epidemiology and the development of vaccines and therapeutics. Large-scale studies have used serology testing to understand the transmission of SARS-CoV-2 through populations, as well as the impact of vaccination on this transmission. Without standardizing these serology tests, comparison among datasets and populations can be inaccurate, resulting in a lost opportunity for research that can be used in further work. Standardization of key research methods, such as the cell-based assays used to measure neutralizing antibodies, would improve the interpretation, reproducibility and utility of research results.

Proficiency testing schemes harmonize testing across laboratories

Proficiency studies (or external quality assessment studies) share samples among participating laboratories for testing, the results of which are a basis of laboratory evaluation and comparison. Proficiency testing evaluates the performance of SARS-CoV-2 testing in individual laboratories and across a population of laboratories.

Given the scale and diversity of COVID-19 testing, proficiency-testing schemes are needed to evaluate field performance of tests following initial validation for regulatory approval and to harmonize results among laboratories with different capabilities. Numerous proficiency studies were launched at early stages of the pandemic for both genome detection and serology testing, with results shared widely among laboratories and published in the scientific literature18. This was critical as many laboratories were newly established or re-purposed for COVID-19 testing with little previous clinical experience, and many SARS CoV-2 tests were given accelerated regulatory approval (for example, EUAs) with little demonstrated performance in real-world settings.

The urgency and uncertainty of the pandemic required academic, government and commercial organizations to assume new roles, cooperate and pool resources to build testing capacity. This coordination has been considered key to the success of pandemic responses6. Government and regulatory organizations were able to leverage reference laboratories to develop standards, evaluate tests and disseminate best practices. Organizations, such as the Foundation for Innovative Diagnostics (FIND)19, independently evaluated test performance with interim reference materials to inform testing best practices. However, the adoption of international standards by regional organizations is key to establish proficiency tests that promote traceability of secondary reference materials and widespread harmonization of testing.

The importance of information standards

Information standards are templates and guidelines for representation of test performance, process and method description, and results. They ensure that consistent, transparent and harmonized terminology is used to clearly communicate results that can be consistently interpreted20. For example, an information standard for reporting SARS-CoV-2 RT-qPCR assays is the Minimum Information for Publication of Quantitative Real-Time PCR Experiments (MIQE) guidelines, which provide a checklist for the disclosure of all reagents, sequences and methods necessary for other laboratories to reproduce methods and results21.

Information standards are also needed to ensure that next-generation sequencing datasets, which can be large and complex, are standardized and enable subsequent querying and analysis. More than 13 million SARS-CoV-2 genome sequences have been submitted to databases, such as Global Initiative on Sharing Avian Influenza Data (GISAID)22. The assignment of consistent metadata to these genome sequences facilitates data integration, accessibility and the re-use of data for insightful analysis in future studies17. The nomenclature used to describe different SARS-CoV-2 strains has also been standardized to consolidate naming schemas and avoid the stigma of naming variants according to origin.

Information standards are also needed to provide unambiguous descriptions of test performance that can be compared among laboratories and independently verified20. Manufacturers of SARS-CoV-2 tests often used secondary reference materials without traceability to a primary reference that can be difficult to independently verify. Many metrics, such as sensitivity and specificity, are not fixed test properties and need to be considered in the context of the clinical samples or secondary reference material used23. As a result, comparisons of manufacturer declarations of test performance have often diverged markedly from independent real-world evaluations.

Global surveillance of SARS-CoV-2 variants

Novel SARS-CoV-2 variants can impact the fitness of the virus, allowing the virus to spread more easily, cause more severe disease or escape the body’s natural or vaccine-induced immune response24. Genomic surveillance has proven useful in monitoring the emergence and circulation of variants of concern and informing public health response. Genomic surveillance will likely become an established feature of global testing, with a requirement to monitor novel, seasonal or resistant strains.

Accordingly, there is a pressing need to develop reference materials and bioinformatic standards to ensure the quality and comparability of results across the ecosystem. This includes the collection of reference materials for different variant strains in biorepositories25. The analysis and transformation of genomic data to actionable information is complex and must be harmonized to ensure interoperability and best practice across the surveillance network17. Stable versioning, data freezes and workflow management tools can standardize bioinformatic protocols, data outputs and reference files. Reference genomic datasets can also be used in bioinformatic proficiency schemes to test the ability of genomic surveillance networks to detect novel variants.

In addition to developing reference materials and leading harmonization efforts, global organizations such as the WHO and public health non-profits such as FIND also provided expertise and guidance to countries that may lack their own established regulatory or standards organizations. Global dissemination of reference materials is needed to support the implementation of testing in low- and middle-income countries, where there is often greater dependence on point-of-care tests performed under heterogeneous conditions. The necessary materials for decentralized production of synthetic ‘open source’ secondary reference materials can be provided by plasmid repositories and distributed under open-source terms to empower regional centers to develop their own secondary reference materials to validate local testing workflows26. Global imbalances in vaccination and testing have contributed to a global disparity in the impact of COVID-19, and standards have a key role in mitigating these imbalances and ensuring that testing is performed worldwide.

Conclusions

The standardization of SARS-CoV-2 testing remains an ongoing priority and is part of the normalization of epidemic prevention and control. SARS-CoV-2 testing will likely remain a feature of public health, where it will be used to update vaccine formulation, in response to seasonal outbreaks, to support vulnerable populations, or for international travel. This global testing should be calibrated and benchmarked with standards to harmonize performance and results, and those standards must be maintained as novel SARS-CoV-2 variants emerge.

An independent review of the processes by which the FDA authorizes tests in the pandemic led to a recommendation to establish a framework for validation of test performance in preparation for a public health emergency6. A key finding of the review found a “limited understanding in the test developer community on how to appropriately validate a diagnostic test” and recommended “developing a framework for how to conduct validation of diagnostic tests for emerging pathogens in the setting of a declared public health emergency”. This framework should include an independent capability to develop and deploy reference materials, such as clinical samples that are needed for test validation and better development and deployment of traceable reference materials. During the pandemic, the CSWG contributed many of these capabilities and recommends this expertise and experience be institutionalized as part of pandemic preparedness (see Supplementary Note 2).

Numerous pathogens of concern with epidemic potential and few effective countermeasures have been identified27. The risks of a pandemic posed by increasing urbanization, global travel and connectivity, and laboratory research have further heightened these concerns28. Surveillance testing is central to preparedness plans that aim to contain pathogens of concern, and must be supported by standards that ensure reliable, consistent and trustworthy detection29,30. A standing group is needed to advocate for standards in the pandemic preparedness plans (Supplementary Note 2). Most of the recommendations for standards for SARS-CoV-2 are generalizable and could similarly benefit from testing for other pathogens, such as influenza, that are currently monitored for seasonal variants. These proposals can also be extended to develop standards for viral outbreaks in agriculture and livestock populations, which act as reservoirs for SARS-CoV-2 and other viruses that undergo zoonotic transfer.

The WHO has recently declared the emergence of the monkeypox outbreak a Public Health Emergency of International Concern. This outbreak has similar standards needs to SARS-CoV-2. It should not escape notice that this known pathogen has become a public health emergency despite the availability of diagnostics and therapeutic approaches. This demonstrates that having biomedical infrastructure in place, while necessary, is insufficient to stem an emergency in absence of a robust public health response. Establishing a standing body to assure an agile and accurate response to meet standards needs is only the beginning of the journey toward a more systematic consideration of response to emerging infectious disease.

The public is the ultimate beneficiary of better standards. Standards ensure that patients will receive consistent and reliable results that inform their treatment, regardless of how and where they are tested. Standards can provide a fair evaluation of test performance, which may otherwise be considered secondary to cost and convenience when selecting a test. During the pandemic, numerous governments awarded contracts to the manufacturers of tests that were subsequently shown to perform poorly when independently validated. Previous research has demonstrated that standards are a cost-effective solution to improve testing and health outcomes that ultimately benefit the broader economy.

The pandemic has focused media, government and community attention on the importance of testing. However, while extensive resources have been invested in new testing methods, relatively fewer resources have been invested in the development of standards, despite their proven effectiveness. There is an opportunity to ensure that the new widespread appreciation of testing in public health is accompanied by a matched appreciation of standards. Accordingly, we call for renewed consideration and investment to be afforded to standards, commensurate with the strategic, far-reaching and impactful benefits. Standards are a simple and proven method to assure a robust and effective testing enterprise at the massive scale and diversity that we have witnessed during the COVID-19 pandemic.