Skip to main content

Thank you for visiting nature.com. You are using a browser version with limited support for CSS. To obtain the best experience, we recommend you use a more up to date browser (or turn off compatibility mode in Internet Explorer). In the meantime, to ensure continued support, we are displaying the site without styles and JavaScript.

  • Article
  • Published:

Targeted accurate RNA consensus sequencing (tARC-seq) reveals mechanisms of replication error affecting SARS-CoV-2 divergence

Abstract

RNA viruses, like SARS-CoV-2, depend on their RNA-dependent RNA polymerases (RdRp) for replication, which is error prone. Monitoring replication errors is crucial for understanding the virus’s evolution. Current methods lack the precision to detect rare de novo RNA mutations, particularly in low-input samples such as those from patients. Here we introduce a targeted accurate RNA consensus sequencing method (tARC-seq) to accurately determine the mutation frequency and types in SARS-CoV-2, both in cell culture and clinical samples. Our findings show an average of 2.68 × 10−5 de novo errors per cycle with a C > T bias that cannot be solely attributed to APOBEC editing. We identified hotspots and cold spots throughout the genome, correlating with high or low GC content, and pinpointed transcription regulatory sites as regions more susceptible to errors. tARC-seq captured template switching events including insertions, deletions and complex mutations. These insights shed light on the genetic diversity generation and evolutionary dynamics of SARS-CoV-2.

This is a preview of subscription content, access via your institution

Access options

Buy this article

Prices may be subject to local taxes which are calculated during checkout

Fig. 1: SARS-CoV-2 library preparation for tARC-seq.
Fig. 2: RNA variant frequencies and mutational spectra in SARS-CoV-2.
Fig. 3: APOBEC editing does not account for the majority of C > T mutations observed by tARC-seq.
Fig. 4: RdRp template switching at sites of sequence complementarity models rare events in SARS-CoV-2.
Fig. 5: RdRp template switching contributes to genomic change during the COVID-19 pandemic.

Similar content being viewed by others

Data availability

Sequencing data are available through the Sequence Read Archive under BioProject PRJNA824595. Source data are provided as supplementary files.

Code availability

Python and R codes are available on GitHub (https://github.com/chermanlab/SarsCov2-ArcSeq).

References

  1. Snijder, E. J., Decroly, E. & Ziebuhr, J. in Advances in Virus Research Vol. 96 (ed. Ziebuhr, J.) 59–126 (Academic Press, 2016).

  2. Bradley, C. C., Gordon, A. J. E., Halliday, J. A. & Herman, C. Transcription fidelity: new paradigms in epigenetic inheritance, genome instability and disease. DNA Repair 81, 102652 (2019).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  3. Drake, J. W. Rates of spontaneous mutation among RNA viruses. Proc. Natl Acad. Sci. USA 90, 4171–4175 (1993).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  4. Sanjuán, R., Nebot, M. R., Chirico, N., Mansky, L. M. & Belshaw, R. Viral mutation rates. J. Virol. 84, 9733–9748 (2010).

    Article  PubMed  PubMed Central  Google Scholar 

  5. Acevedo, A., Brodsky, L. & Andino, R. Mutational and fitness landscapes of an RNA virus revealed through population sequencing. Nature 505, 686–690 (2014).

    Article  CAS  PubMed  Google Scholar 

  6. Smith, E. C., Sexton, N. R. & Denison, M. R. Thinking outside the triangle: replication fidelity of the largest RNA viruses. Annu. Rev. Virol. 1, 111–132 (2014).

    Article  CAS  PubMed  Google Scholar 

  7. Eckerle, L. D. et al. Infidelity of SARS-CoV Nsp14-exonuclease mutant virus replication is revealed by complete genome sequencing. PLoS Pathog. 6, e1000896 (2010).

    Article  PubMed  PubMed Central  Google Scholar 

  8. Koyama, T., Platt, D. & Parida, L. Variant analysis of SARS-CoV-2 genomes. Bull. World Health Organ. 98, 495–504 (2020).

    Article  PubMed  PubMed Central  Google Scholar 

  9. Wang, S. et al. Molecular evolutionary characteristics of SARS‐CoV‐2 emerging in the United States. J. Med. Virol. 94, 310–317 (2022).

    Article  CAS  PubMed  Google Scholar 

  10. Tay, J. H., Porter, A. F., Wirth, W. & Duchene, S. The emergence of SARS-CoV-2 variants of concern is driven by acceleration of the substitution rate. Mol. Biol. Evol. 39, msac013 (2022).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  11. Di Giorgio, S., Martignano, F., Torcia, M. G., Mattiuz, G. & Conticello, S. G. Evidence for host-dependent RNA editing in the transcriptome of SARS-CoV-2. Sci. Adv. 6, eabb5813 (2020).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  12. Reid-Bayliss, K. S. & Loeb, L. A. Accurate RNA consensus sequencing for high-fidelity detection of transcriptional mutagenesis-induced epimutations. Proc. Natl Acad. Sci. USA 114, 9415–9420 (2017).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  13. Acevedo, A. & Andino, R. Library preparation for highly accurate population sequencing of RNA viruses. Nat. Protoc. 9, 1760–1769 (2014).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  14. Traverse, C. C. & Ochman, H. Conserved rates and patterns of transcription errors across bacterial growth states and lifestyles. Proc. Natl Acad. Sci. USA 113, 3311–3316 (2016).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  15. Li, W. & Lynch, M. Universally high transcript error rates in bacteria. Elife 9, e54898 (2020).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  16. Traverse, C. C. & Ochman, H. A genome-wide assay specifies only GreA as a transcription fidelity factor in Escherichia coli. G3 8, 2257–2264 (2018).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  17. Eyre-Walker, A. & Keightley, P. D. The distribution of fitness effects of new mutations. Nat. Rev. Genet. 8, 610–618 (2007).

    Article  CAS  PubMed  Google Scholar 

  18. Sanjuán, R., Moya, A. & Elena, S. F. The distribution of fitness effects caused by single-nucleotide substitutions in an RNA virus. Proc. Natl Acad. Sci. USA 101, 8396–8401 (2004).

    Article  PubMed  PubMed Central  Google Scholar 

  19. Wang, C. et al. Identification of evolutionarily stable functional and immunogenic sites across the SARS-CoV-2 proteome and greater coronavirus family. Bioinformatics 37, 4033–4040 (2021).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  20. Hou, Y. J. et al. SARS-CoV-2 D614G variant exhibits efficient replication ex vivo and transmission in vivo. Science 370, 1464–1468 (2020).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  21. Fitzsimmons, W. J. et al. A speed–fidelity trade-off determines the mutation rate and virulence of an RNA virus. PLoS Biol. 16, e2006459 (2018).

    Article  PubMed  PubMed Central  Google Scholar 

  22. Aksamentov, I., Roemer, C., Hodcroft, E. B. & Neher, R. A. Nextclade: clade assignment, mutation calling and quality control for viral genomes. J. Open Source Softw. 6, 3773 (2021).

    Article  Google Scholar 

  23. Chung, C. et al. Evolutionary conservation of the fidelity of transcription. Nat. Commun. 14, 1547 (2023).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  24. Wei, L. Retrospect of the two-year debate: what fuels the evolution of SARS-CoV-2: RNA editing or replication error? Curr. Microbiol. 80, 151 (2023).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  25. Nakata, Y. et al. Cellular APOBEC3A deaminase drives mutations in the SARS-CoV-2 genome. Nucleic Acids Res. 51, 783–795 (2023).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  26. Kim, K. et al. The roles of APOBEC-mediated RNA editing in SARS-CoV-2 mutations, replication and fitness. Sci. Rep. 12, 14972 (2022).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  27. Kim, D. et al. The architecture of SARS-CoV-2 transcriptome. Cell 181, 914–921.e10 (2020).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  28. Alonso, S., Izeta, A., Sola, I. & Enjuanes, L. Transcription regulatory sequences and mRNA expression levels in the coronavirus transmissible gastroenteritis virus. J. Virol. 76, 1293–1308 (2002).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  29. Garushyants, S. K., Rogozin, I. B. & Koonin, E. V. Template switching and duplications in SARS-CoV-2 genomes give rise to insertion variants that merit monitoring. Commun. Biol. 4, 1343 (2021).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  30. Abraham, M. & Hazkani-Covo, E. Protein innovation through template switching in the Saccharomyces cerevisiae lineage. Sci. Rep. 11, 22558 (2021).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  31. Wu, H. et al. Nucleocapsid mutations R203K/G204R increase the infectivity, fitness, and virulence of SARS-CoV-2. Cell Host Microbe 29, 1788–1801.e6 (2021).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  32. Bar-On, Y. M., Flamholz, A., Phillips, R. & Milo, R. SARS-CoV-2 (COVID-19) by the numbers. Elife 9, e57309 (2020).

    Article  PubMed  PubMed Central  Google Scholar 

  33. Moeller, N. H. et al. Structure and dynamics of SARS-CoV-2 proofreading exoribonuclease ExoN. Proc. Natl Acad. Sci. USA 119, e2106379119 (2022).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  34. Baddock, H. T. et al. Characterization of the SARS-CoV-2 ExoN (nsp14ExoN–nsp10) complex: implications for its role in viral genome stability and inhibitor identification. Nucleic Acids Res. 50, 1484–1500 (2022).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  35. Ogando, N. S. et al. The enzymatic activity of the nsp14 exoribonuclease is critical for replication of MERS-CoV and SARS-CoV-2. J. Virol. 94, e01246-20 (2020).

    Article  PubMed  PubMed Central  Google Scholar 

  36. Hastings, P. J., Ira, G. & Lupski, J. R. A microhomology-mediated break-induced replication model for the origin of human copy number variation. PLoS Genet. 5, e1000327 (2009).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  37. Xiao, Y. et al. RNA recombination enhances adaptability and is required for virus spread and virulence. Cell Host Microbe 19, 493–503 (2016).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  38. Gribble, J. et al. The coronavirus proofreading exoribonuclease mediates extensive viral recombination. PLoS Pathog. 17, e1009226 (2021).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  39. Lee, H., Popodi, E., Tang, H. & Foster, P. L. Rate and molecular spectrum of spontaneous mutations in the bacterium Escherichia coli as determined by whole-genome sequencing. Proc. Natl Acad. Sci. USA 109, E2774–E2783 (2012).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  40. Kunkel, T. A. The mutational specificity of DNA polymerases-alpha and -gamma during in vitro DNA synthesis. J. Biol. Chem. 260, 12866–12874 (1985).

    Article  CAS  PubMed  Google Scholar 

  41. Imashimizu, M., Oshima, T., Lubkowska, L. & Kashlev, M. Direct assessment of transcription fidelity by high-resolution RNA sequencing. Nucleic Acids Res. 41, 9090–9104 (2013).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  42. Hestand, M. S., Van Houdt, J., Cristofoli, F. & Vermeesch, J. R. Polymerase specific error rates and profiles identified by single molecule sequencing. Mutat. Res. 784–785, 39–45 (2016).

    Article  PubMed  Google Scholar 

  43. Gout, J.-F. et al. The landscape of transcription errors in eukaryotic cells. Sci. Adv. 3, e1701484 (2017).

    Article  PubMed  PubMed Central  Google Scholar 

  44. Harcourt, J. et al. Severe acute respiratory syndrome coronavirus 2 from patient with coronavirus disease, United States. Emerg. Infect. Dis. 26, 1266–1273 (2020).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  45. Rio, D. C., Ares, M., Hannon, G. J. & Nilsen, T. W. Purification of RNA using TRIzol (TRI reagent). Cold Spring Harb. Protoc. 2010, pdb.prot5439 (2010).

    Article  PubMed  Google Scholar 

  46. Avadhanula, V. et al. Viral load of SARS-CoV-2 in adults during the first and second wave of COVID-19 pandemic in Houston, TX: the potential of the super-spreader. J. Infect. Dis. https://doi.org/10.1093/infdis/jiab097 (2021).

  47. Stead, M. B. et al. RNAsnapTM: a rapid, quantitative and inexpensive, method for isolating total RNA from bacteria. Nucleic Acids Res. 40, e156 (2012).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  48. Zhao, W.-M. et al. The 2019 novel coronavirus resource. Yi Chuan 42, 212–221 (2020).

    PubMed  Google Scholar 

  49. Wang, M., Zhao, Y. & Zhang, B. Efficient test and visualization of multi-set intersections. Sci. Rep. 5, 16923 (2015).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  50. Hadfield, J. et al. Nextstrain: real-time tracking of pathogen evolution. Bioinformatics 34, 4121–4123 (2018).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  51. Mautner, L. et al. Replication kinetics and infectivity of SARS-CoV-2 variants of concern in common cell culture models. Virol. J. 19, 76 (2022).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

Download references

Acknowledgements

We thank S. Rosenberg, M. Estes, C. Gross, P. Hotez, J. Halliday and H. Dierick for critical reading. The study was supported by NIH grants R01GM088653 (C.H.), 3R01AG061105-03S1 (O.L.), 1R21CA259780 (S.R.K.) and 1R21HG011229 (S.R.K.); and by NSF grant DBI-2032904 (O.L.). The funders had no role in study design, data collection and analysis, decision to publish or preparation of the manuscript.

Author information

Authors and Affiliations

Authors

Contributions

C.C.B., A.J.E.G. and C.H. conceived the study. C.C.B. developed the sequencing methodology, conducted the experiments, served as project administrator and wrote the original manuscript draft. A.X.W., A.J.E.G., C.W. and C.H. wrote the final manuscript draft. C.C.B., C.W., S.R.K. and B.F.K. contributed to software analysis and validation. C.C.B., C.W., A.J.E.G., P.N.L., A.X.W. and C.A.S. performed data analysis and investigation. C.C.B., S.E.R., M.B.C., V.A., A.X.W. and P.A.P. supplied project resources. C.C.B., C.W., A.J.E.G., P.N.L. and C.A.S. contributed to data visualization. C.C.B., C.W., O.L., A.J.E.G., C.H., S.E.R., A.X.W., M.B.C and S.R.K. reviewed and edited the manuscript. C.H. supervised the project.

Corresponding author

Correspondence to Christophe Herman.

Ethics declarations

Competing interests

The authors declare no competing interests.

Peer review

Peer review information

Nature Microbiology thanks the anonymous reviewers for their contribution to the peer review of this work.

Additional information

Publisher’s note Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Extended data

Extended Data Fig. 1 Replication cycle of SARS-CoV-2 virus.

a, As a positive-strand RNA virus, SARS-CoV-2 encodes an RNAP (blue) that is responsible for both replication and gene expression. After entering the cell, the virus releases its (+) strand RNA into the host cell’s cytoplasm. Using its own polymerase (RdRp), the viral RNA replicates into a (−) strand then back to a (+) strand, producing more viral RNA genomes for new virus particles. RNAP errors (red) generate genetic diversity in SARS-CoV-2 at any step of replication and fuel the evolution of novel strains. b, Plaque forming units over time of WT, Alpha, and Omicron SARS-CoV-2 grown in Vero cells (n = 3 independent experiments, mean ± SD), consistent with previous data51.

Extended Data Fig. 2 Hybrid capture of specific E. coli mRNAs for tARC-seq validation.

a, Hybrid capture in tARC-seq produces a > 30-fold enrichment in post-consensus nucleotides across a panel of twelve E. coli genes. PCR duplicates account for most of the pre-consensus nucleotides sequenced, and fold-enrichment drops during consensus calling as duplicates of the same parent RNA fragment are collapsed into a single read. The drop in enrichment between pre- and post-consensus reads is more pronounced for low-expression genes like marR. Fold enrichment was calculated from the cumulative, normalized sequencing depth across each gene in tARC-seq samples versus matched bulk ARC-seq controls. b,c, Each biological replicate represents one WT E. coli sample sequenced two ways to generate paired data. Purified mRNA was either fragmented individually and prepared for ARC-seq (control), or it was used as a carrier for SARS-CoV-2 fragmentation and tARC-seq (carrier). Libraries were sequenced separately and aligned to the E. coli reference for variant calling. Mutation frequencies were comparable between carrier (6.4 × 10−5) and control (7.5 × 10−5) samples (b) and reproduced the known variant frequency for WT E. coli (8.2 × 10−5)14. The mutation distribution across all base substitution types was also comparable between carrier and control (c). (n = 2 biologically independent replicates across all panels).

Extended Data Fig. 3 Empirical validation of tARC-seq data analysis parameters.

In contrast to de novo variants, clonal and subclonal variants are not independent events and should be filtered out during analysis. a, To determine an appropriate cutoff, all variants were graphed by the cumulative base substitution frequency as a function of each variant’s clonality. A cutoff of 0.05 – or ≤5% allele fraction – counted most variants on the curve while excluding clonal outliers. b, The overall variant frequency (left y-axis, grey bars) in WT SARS-CoV-2 is graphed by consensus read depth (right y-axis, purple line) over a series of minimum cDNAcs family sizes (minmem2). Minmem2 is an expression of the minimum number of PCR copies required to form a cDNA consensus sequence during consensus calling. A family size of 1 is equivalent to traditional RNA-seq without error correction, while a family size of 3 reduces the frequency of technical artifacts to <10−9[13].

Extended Data Fig. 4 De novo mutation frequencies in SARS-CoV-2 vary by feature independent of variant effect.

a, Synonymous (grey), nonsynonymous (blue) and nonsense mutation frequencies (red) from a single biological replicate of WT SARS-CoV-2 are mapped across nsp12, which encodes the viral RdRp. b, Substitution frequency is analyzed by Evolutionary Action (EA) for the S gene and nsp1219 (see also: http://cov.lichtargelab.org/). Higher EA scores correspond to residues with greater impact on evolutionary divergences and variants at these positions are predicted to be more deleterious. The lack of relationship between variant frequency and EA score (m ≈ 0) further suggests a limited role for selection in tARC-seq data. c, The variant frequency (VF) is computed by position (VF = count / depth) and graphed along the genome. Positions were filtered for depth ≥5,000X. Two-sided Fisher’s exact test with Benjamini–Hochberg correction was performed position-wise across the genome to determine cold spots for RNA variants. Cold spot positions with significantly decreased VFs relative to the genome-wide average are indicated with black dots (P < 0.05). One representative sample of WT virus is shown. The broken black line represents the genome-wide average VF. d, C > T mutations observed across the WT1-Vero genome were graphed by 5′ base: T (red dot) or G (blue dot). Grey lines indicate every C > T mutation observed.

Extended Data Fig. 5 Effect of lineage on variant frequency using a Negative binomial regression model.

To handle the high dimensionality of the genome wide data a resampling regression approach was employed to estimate strain and site effects. Negative binomial regression analyses modelling the effects of explanatory variables on variant count were iteratively performed on random subsets of sites (n = 2 biologically independent replicates across all lineages). Estimated effect sizes across all regression models in the resampling regression approach are calculated by taking the mean of the predicted effect sizes across all models that include the site. a-c, Strain effects are predicted across N = 1000 iterations for the 1000 most significant sites. The distributions of estimated strain effects show that Alpha (P = 0.003) has significantly higher variant frequency compared to WT (b), and Omicron (P = 0.07) has fewer variants than WT (c). Red dashed lines indicate 95% confidence intervals; for Alpha this confidence interval excludes zero indicating significance; for Omicron zero is within the interval but close to the boundary. Significance levels are calculated under the null hypothesis βstrain = 0 (two-sided) using the empirical P-value for the estimated effect sizes. d,e, Site effects estimated using all S = 12,093 genomic positions in the resampling regression analysis demonstrate systematic patterns when grouped by features of the variant. In panel (d) differences are shown in the estimated site effects by mutation type broken down by reference base and alternative allele combinations. In panel (e) differences are shown in site effects across genomic features. f, Estimated site effects with P < 0.05 after adjusting for multiple testing using the Bonferroni correction are plotted along the genome. The center of all boxplots here indicates the median value. The lower and upper bounds of the box are the first and third quartiles (25th and 75th percentiles), respectively. The lower and upper whiskers extend to the most extreme values within 1.5*IQR from the first and third quartiles, respectively. Outliers beyond the whiskers are plotted as points.

Extended Data Fig. 6 Frequency and spectrum of RNA variants in two clinical samples.

Omicron data is graphed by sample source and clade. For both clinical samples, patients were female, age 25–30 years, fully vaccinated and without comorbidities. In both cases, symptoms were reported as ‘mild/moderate’. (BA.5.6 → 22B clade) (BA.2.12.1 → 22 C) (B.1.1.529 → 21 K) (n = 1, mean ± Wilson score 95% confidence interval). a,b, Genome-wide mutation frequencies (a) and analyzed by base substitution type (b) in the two clinical samples.

Extended Data Fig. 7 TRS activity fuels RNA variants.

Recombination at transcription regulatory sequences (TRS) drives sgRNA synthesis and is central to SARS-CoV-2 gene expression27. a, Canonical junctions associated with sgRNA synthesis are shown (black lines). Chimeric reads are detected by mapping with a spliced aligner (STAR). Each arc represents a chimeric alignment where the left and right x-intercepts correspond to the junction coordinates and line shading reflects frequency. Data is from one biological replicate of WT SARS-CoV-2. b-d, RNA errors are increased at TRS sites and flanking regions compared to genome-wide rates in WT (b), Alpha (c), and Omicron (d) SARS-CoV-2 (n = 2 biologically independent samples). Each TRS region (n = 10) is small ( < 115 nt) and composed of one canonical TRS site plus 100 flanking nucleotides. WT TRS site error bars appear large because of high-frequency outliers. e, Indels are mapped by size (y-axis) and count (x-axis) across the SARS-CoV-2 genome for one representative replicate of WT virus. f, Promiscuous RdRp activity fuels a diverse repertoire of indels detectable by tARC-seq at a single locus, visible as a vertical smear.

Extended Data Fig. 8 Template switching is a driving mechanism for deletions.

a, To test if template switching is a contributing factor for deletions, we determined the expected number of SIDs by chance. For each deletion event with size ≥ 2 bps in the WT1 sample, we assigned it with a random genomic position in SARS-CoV-2 while preserving the deletion size. The occurrence of SIDs and the maximum nucleotides of complementarity was then determined. This process was repeated 1000 times to obtain a null distribution for each complementarity size. The observed SIDs occurrences were indicated as red lines. b, SIDs observed in more than one lineage (observed in at least one of the two biologically independent replicates) have significantly lower GC content than ones only shown in no more than one lineage (Two-sided t-test). c, SIDs observed in more than one lineage have significantly longer complementarity size than ones only shown in no more than one lineage (Two-sided t-test). For (b) and (c), (n = 97252 SIDs present in 0 lineages) (n = 5529 SIDs present in 1 lineage) (n = 1470 SIDs present in > 1 lineage). Box plots are bound by minimum, Q1 = 25th percentile, Q2 = median, Q3 = 75th percentile, and maximum, excluding outliers (IQR = Q3 – Q1, outliers fall outside Q1 – 1.5 x IQR and Q3 + 1.5 x IQR).

Extended Data Fig. 9 RdRp template switching contributes to genomic change during the COVID-19 pandemic.

VOC-specific multiple nucleotide alterations can be modelled as singular RdRp template switching events based on 3′ micro-complementarity that facilitates RdRp misalignments/realignments. a, Phylogenetic tree based on sequence alterations observed in VOC arising from the 20B clade; not drawn to any scale. The different colours indicate VOC-specific nucleotide alterations. The other coloured dots are rare variants that arose from the sequences shown. b-d, Top panels, proposed template switching events that explain multiple nucleotide alterations in the Lambda (b), Alpha (c), and Gamma (d) lineages. Bottom panels, phylogenetic trees that establish the singular origin (red arrows) of the coordinated multiple nucleotide alterations in each lineage. Phylogenetic trees were constructed in Nextstrain v2.35.0 from genomes sequenced between Dec. 2019 and March 2022.

Supplementary information

Supplementary Information

Supplementary Tables 1, 2, 7, 10–12 and 14.

Reporting Summary

Supplementary Table 1

Supplementary Tables 3–6, 8, 9, 13, viral titres (ED Fig. 2b), tARC-seq oligomer sequences.

Supplementary Data 2

Fig. 2 plotted data.

Supplementary Data 3

ED Fig. 2 plotted data.

Supplementary Data 4

ED Fig. 7 plotted data.

Rights and permissions

Springer Nature or its licensor (e.g. a society or other partner) holds exclusive rights to this article under a publishing agreement with the author(s) or other rightsholder(s); author self-archiving of the accepted manuscript version of this article is solely governed by the terms of such publishing agreement and applicable law.

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Bradley, C.C., Wang, C., Gordon, A.J.E. et al. Targeted accurate RNA consensus sequencing (tARC-seq) reveals mechanisms of replication error affecting SARS-CoV-2 divergence. Nat Microbiol (2024). https://doi.org/10.1038/s41564-024-01655-4

Download citation

  • Received:

  • Accepted:

  • Published:

  • DOI: https://doi.org/10.1038/s41564-024-01655-4

Search

Quick links

Nature Briefing: Translational Research

Sign up for the Nature Briefing: Translational Research newsletter — top stories in biotechnology, drug discovery and pharma.

Get what matters in translational research, free to your inbox weekly. Sign up for Nature Briefing: Translational Research