Introduction

The COVID-19 pandemic showed that, based on previous research efforts, we understood many aspects of coronavirus biology and pathogenesis, but also that there was much we did not know. In 2019, the worldwide number of coronavirus investigators was small, having increased after the severe acute respiratory syndrome coronavirus (SARS-CoV) outbreak in 2003 but decreasing thereafter. The influx of scientists with diverse expertise into the field after the pandemic onset contributed to an increased understanding of coronavirus replication, epidemiology, SARS-CoV-2 pathogenesis and immune responses in humans, to the development and characterization of experimentally infected animal models for COVID-19, and to SARS-CoV-2 vaccine and antiviral drug development. Here, as investigators who have studied coronaviruses for decades, we outline some of the outstanding research questions that we think need to be addressed.

SARS-CoV-2 emergence

Where did SARS-CoV-2 originate and how did it evolve to infect humans? The emergence of SARS-CoV-2 continues to be an area of controversy and has been, and is being, investigated by many national and international organizations, including the WHO (World Health Organization). It is almost certain that the virus originated in bats and crossed species to humans either directly or indirectly via intermediary hosts. There remains debate on whether the virus first infected humans from a zoonotic source or from a research laboratory, but, no matter what the answer to this question is, it is clear to us that in order to be prepared for the next pandemic, we need to further delineate the panoply of coronaviruses present in bats and possible intermediary hosts1. We need to better understand coronavirus circulation in hotspots, such as parts of China and Southeast Asia, where humans, wildlife gathered for food or medicinal purposes and bats are in close proximity. These investigations should include surveillance (virological and serological) of humans in close contact with bats and the game animal trade, with or without respiratory disease, for evidence of coronavirus infection. A related question, discussed below, is why coronaviruses are especially good at jumping species, to humans and other animals.

Zoonotic risk

Once coronaviruses in animal reservoirs are identified, can they be better risk assessed for threats for human spillover? Surveillance of bat reservoirs of sarbecoviruses (Sarbecovirus is the subgenus to which SARS-CoV-2 belongs) had previously found evidence of viruses with a capacity for infecting human cells using the angiotensin converting enzyme 2 (ACE2) receptor (reminiscent of SARS-CoV)2. Serological evidence of viral spillover to humans was demonstrated before the emergence of SARS-CoV-2 (ref.3). Arguably, these signals together should have been triggers for action to develop countermeasures with greater urgency. The availability of human organoid cultures and ex vivo cultures of human respiratory tissue may enable the use of physiologically relevant systems for a more systematic risk assessment of animal coronaviruses in the future, analogous to ongoing risk assessments being carried out for animal influenza viruses4.

SARS-CoV-2 transmissibility

What explains the high transmissibility of SARS-CoV-2 compared with SARS-CoV or Middle East respiratory syndrome coronavirus (MERS-CoV)? A critical factor leading to the COVID-19 pandemic was the ability of SARS-CoV-2 to grow to high levels in the upper respiratory tract and therefore to readily transmit to other humans. Titres of SARS-CoV and MERS-CoV in the upper respiratory tract peak at later times after infection5, consistent with the ability to interrupt transmission with relevant public health infection-prevention methods. A second, related question is why SARS-CoV and a common cold coronavirus, HCoV-NL63, which both use the same receptor as SARS-CoV-2 (ACE2)6, have such different patterns of infection within the human respiratory tract. HCoV-NL63 rarely infects the lower respiratory tract, whereas SARS-CoV preferentially causes pneumonia. These different patterns of infection most likely relate to differences in cell entry, including differences in co-receptor usage, host protease usage or fusogenicity of the spike protein, but there are other possibilities. Understanding these differences will provide information on which coronavirus might be expected to be transmissible and to identify additional targets for therapeutic interventions. Further elucidation of the factors that contribute to virus spread will require additional experimental animal models of coronavirus transmission.

The SARS-CoV-2 outbreak also highlighted the lack of evidence-based data on the transmission of coronaviruses, or indeed respiratory viruses in in general, and on which non-pharmaceutical countermeasures (for example, social distancing and masks (surgical versus N99/FFP3 masks)) are effective or not. The SARS-CoV-2 outbreak demonstrated that the only effective control options available in the first months of the pandemic were non-pharmaceutical, but our understanding of the efficacy of specific measures is limited.

Coronavirus genome complexity

Why do coronavirus genomes encode so many more proteins than other RNA viruses? Coronavirus genomes are bigger than those of any other RNA virus, apart from those of related members of the Nidovirales order. The genomes are so large that they require genomic proofreading activity to avoid error catastrophe7. A large genome size may contribute to enhanced cross-species transmission, but, at present, this notion is speculative. In any case, an important question is to understand the function of the many non-structural proteins involved in virus replication. Development of a cell-free or entirely in vitro replication system would facilitate detailed probing of the role of individual proteins in replication and transmission. Efforts to develop such cell-free systems were initiated 40 years ago, but it is only in the past few years with the advent of cryo-electron microscopy and new biochemical approaches that progress has been made. These efforts are expected to complement studies in intact cells, which use high-resolution microscopy and related techniques to analyse macromolecular interactions and function at the subcellular level.

Related to the previous question, why do coronaviruses encode so many proteins with apparent immunoevasive function? Coronaviruses encode a variable number of accessory proteins, the genes of which are intermingled within the structural protein genes located at the 3′ end of the genome. For example, SARS-CoV-2 encodes at least six such proteins, with several other putative open reading frames in the genome hypothesized to be expressed and have immunoevasive properties8. Confusingly, these genes are often deleted in viruses isolated from infected animals, without apparent loss of virulence. This was shown most clearly in the case of MERS-CoV, in which diverse deletions and insertions in accessory genes were detected in some isolates obtained in Africa from camels, the primary host of the virus9. These genetic changes may have unpredictable consequences for virus transmissibility or pathogenesis. Deletion of these genes occasionally leads to increased virulence10. The variable and sometimes unexpectedly high numbers of these proteins suggest that they have redundant and, perhaps, additional functions. Such redundancy could contribute to cross-species transmission. The genetic instability of MERS-CoV camels in Africa therefore needs to be monitored and evidence for human spillover needs to be continually assessed.

Predictive evolution

Can coronavirus evolution in infected human or other animal hosts be predicted? Coronaviruses readily mutate and recombine as they adapt to a new host. This is well illustrated by the COVID-19 pandemic, in which ancestral strains of SARS-CoV-2 initially mutated to better infect humans, and later evolved to evade the human immune response, generating a series of variants of concern. Several studies have modelled SARS-CoV-2 evolution but so far it has not been possible to predict how the virus will evolve in the future. Such predictive modelling is recognized to be difficult, but would be very useful in the present pandemic as well as in future coronavirus outbreaks or pandemics for vaccine development, for anticipating clinical disease and pathogenesis, and for risk assessment of animal viruses with zoonotic potential.