Main

What minimal features are required for a synthetic system to encapsulate its own genome and to evolve biological functionality similar to viruses? In the nearly 40 years since the first high-resolution structure of an icosahedral virus11, the structures and functions of a wide array of viral capsids have been characterized. This has inspired efforts to reengineer naturally occurring protein containers12 and to design new polypeptides13 to package biological molecules. For example, the naturally occurring, non-viral protein container lumazine synthase was evolved in E. coli to sequester a toxic protein14. However, there have been no reports of non-viral containers that can encapsulate their own genomes and evolve in complex biochemical environments outside of cells.

We recently reported the design, with atomic-level accuracy, of two-component, 120-subunit icosahedral protein assemblies with internal volumes large enough to package biological macromolecules5. These highly stable and engineerable assemblies5,6 in principle could be redesigned to package their own genomes: bicistronic mRNAs encoding the two protein subunits. We investigated this possibility by modifying two assemblies with accessible protein termini and no large pores, I53-47 and I53-505, either by introducing positively charged residues on their interior surfaces (I53-47-v1 and I53-50-v1; Fig. 1a and Extended Data Table 1a) or by genetically fusing the Tat RNA-binding peptide from bovine immunodeficiency virus15 to the interior-facing C terminus of one subunit (I53-50-Btat and I53-47-Btat). After expression and intracellular assembly in E. coli (Fig. 1b), intact protein assemblies were purified from cell lysates using immobilized metal affinity chromatography (IMAC) and size exclusion chromatography (SEC). The assemblies eluted as a single peak at the same retention volume as the original design5 (Extended Data Fig. 1), and intact particles were observed by negative-stain transmission electron microscopy (Fig. 1c, Extended Data Fig. 1a). After purification, the assemblies were incubated with RNase A for 10 min at 20 °C to degrade any RNA not protected inside the synthetic capsid-like proteins. Nucleic acid and protein co-migrated on native agarose gels (Fig. 1d, e, Extended Fig. 1b, c), suggesting that the protected nucleic acid was encapsulated in the protein assembly. Nucleic acid extraction followed by reverse transcription quantitative PCR (RT–qPCR) and Sanger sequencing confirmed that full-length RNA genomes were packaged and protected from RNase by I53-50-v1 and I53-50-Btat but not by the original I53-50 design (Fig. 1f); all versions of I53-47 could package their genomes (Extended Data Fig. 1d). In all cases, RT–PCR products were only obtained upon addition of reverse transcriptase, indicating that the protected nucleic acids were RNA and not DNA. We refer to these designed RNA–protein complexes as synthetic nucleocapsids.

Figure 1: Biochemical characterization of synthetic nucleocapsids.
figure 1

a, Design model of I53-50-v1. Trimeric subunits are coloured green and pentameric subunits are coloured cyan. Mutations with respect to the original I53-50 protein assembly7 are coloured blue (increases in positive charge and/or decreases in negative charge (for example, E→N, N→K, E→K)), orange (no change in charge (for example, E→D, N→T, K→R)), or red (decreases in positive charge and/or increases in negative charge (for example, N→E, K→N, K→E)). b, Synthetic nucleocapsids encapsulate their own mRNA genomes while assembling into icosahedral capsids inside E. coli cells. c, Negative-stain electron micrographs of I53-50-v1 (positively charged interior) and I53-50-Btat (RNA binding Tat peptide from bovine immunodeficiency virus). Micrographs shown are representative of the entire sample tested on between one and three different grids, each at a different concentration. d, e, Synthetic nucleocapsids were purified, treated with RNase A, and electrophoresed on non-denaturing 1% agarose gels then stained with Coomassie (protein, d) and SYBR gold (nucleic acid, e). Nucleic acids co-migrated with capsid proteins for I53-50-v1 and I53-50-Btat, but not for the original I53-50. f, Full-length synthetic nucleocapsid genomes were recovered from each sample by RT–PCR. The left-most lane is NEB 1-kb DNA ladder. Plus and minus symbols indicate PCR performed on templates prepared with and without reverse transcriptase (RT), respectively, confirming that I53-50-v1 and I53-50-Btat package their own full-length RNA genomes. This procedure is part of our standard quality control for synthetic nucleocapsids and has been performed reproducibly more than 10 times.

PowerPoint slide

To investigate whether synthetic nucleocapsids can evolve, we generated combinatorial libraries of variants and selected for improved genome packaging and fitness against nuclease challenge. Nine positions on the interior surfaces of I53-50-v1 and I53-50-Btat were mutated to positive, negative, or uncharged polar amino acids (Supplementary Table 1) to produce variants with a wide range of interior charge distributions. We performed three rounds of selection comprising expression, purification, RNase challenge, RNA recovery, and re-cloning (Fig. 2a). The RNA recovered from the selected population after each round was reverse-transcribed and sequenced on an Illumina MiSeq. The net interior charge of the evolved population converged to narrow distributions around 388 ± 87 (mean ± s.d. of the population) in the absence of Btat and 662 ± 91 (480 of which are from 60 copies of Btat) in the presence of Btat (Fig. 2b). A total of 1,170 different variants exhibited higher enrichment than I53-50-v1 (Fig. 2c); there are evidently many solutions to the genome-packaging problem. The presence or absence of the positively charged Btat peptide influenced the identities of beneficial mutations: all except two of the lysine residues were beneficial in the absence of Btat (Fig. 2d), whereas most lysine residues were disfavoured in the presence of Btat (Fig. 2e). We combined the substitutions from one of the most highly enriched variants from the library lacking Btat (Fig. 2c; trimeric subunit: K178N, K183N, E189K; pentameric subunit: K123N, H125K) with the most enriched substitution from a separate library of mutants in the trimer-pentamer interface (pentameric subunit: E24F; Supplementary Fig. 2 and Supplementary Table 1a) to produce I53-50-v2, which exhibited improved genome packaging efficiency as assessed by RT–qPCR (Extended Data Fig. 2a). The net interior charge did not change between I53-50-v1 and I53-50-v2—the improved genome packaging and protection results from reconfiguration of the position of the charges (Fig. 2f). I53-50-v2 outperformed the best variants from the I53-50-Btat library (Extended Data Fig. 2a), so we focused on I53-50-v2 for subsequent evolution experiments.

Figure 2: Evolution of RNA packaging.
figure 2

a, A library of plasmids encoding synthetic nucleocapsid variants is transformed into E. coli. Each cell in the population produces a unique synthetic nucleocapsid variant. Nucleocapsids are purified en masse from cell lysates and challenged (for example, RNase, heat, blood and mouse circulation). The capsid-protected mRNA is then recovered and amplified using RT–qPCR, re-cloned into a plasmid library, and transformed into E. coli for another generation. bf, Combinatorial libraries targeting nine residues on the interior surface of I53-50 (Supplementary Table 1) were used to investigate how the interior surface charge affects RNA packaging in the presence or absence of a positively charged RNA binding peptide (Btat). Three rounds of evolution were performed with two independent biological replicates. b, The evolved populations converged towards narrow distributions of interior net charge: Btat library from 215 ± 114 (mean ± s.d.) to 388 ± 87, Btat+ library from 733 ± 119 to 662 ± 91. The net interior charge of each variant was calculated from its sequence by summing the positive and negative residues on the interior surface. Black lines are without Btat and grey lines are with Btat; dashed lines are naive populations and solid lines are round three selected populations. These results represent the combined population distribution of two independent evolutionary trajectories. c, Rank order list of variants observed in both biological replicates; 1,170 unique variants outperformed I53-50-v1. I53-50-v2 was created based on the second most highly enriched variant from the Btat- library. d, e, Heat map of log enrichments for each mutation explored in the combinatorial surface charge optimization library. All except two of the lysine residues were beneficial in the absence of the positively charged Btat, whereas most lysine residues were disfavoured in the presence of Btat. Purple or orange indicate mutations that were depleted or enriched in the selected population, respectively. Blue squares and black dots indicate the I53-50-v1 starting sequence and the I53-50-v2 selected sequence, respectively. f, Design model of I53-50-v2. Although the net interior surface charge did not change from I53-50-v1 to I53-50-v2, the spatial configuration of charged residues impacted genome packaging efficiency (see Fig. 4a). Colouring is as described in Fig. 1a.

PowerPoint slide

The ability to evolve the nucleocapsids enabled comprehensive mapping of how each residue affects the fitness of the 2.5-megadalton complex comprising 22,920 amino acids and 1,370 RNA bases. We produced a deep mutational scanning library16,17 of I53-50-v2 with every residue in each protein subunit substituted with each of the 20 amino acids, and performed two consecutive rounds of selection with two biological replicates. Selection in the first round was performed at room temperature with 10 μg ml−1 RNase A for 10 min to deplete non-assembling variants from the population, and selection in the second round was at 37 °C for 1 h with either 10 μg ml−1 RNase A or heparinized mouse whole blood. Each biological replicate of the naive, round 1 and round 2 populations was sequenced on an Illumina MiSeq, and enrichment values were calculated from the fraction of the population corresponding to each variant before and after selection (Fig. 3a, b; 7,156 out of the possible 7,240 single mutants were observed with at least 10 counts in the pre-selection population). The enrichments of individual mutations were correlated between the RNase A and mouse whole blood selections (Fig. 3c), suggesting that similar mechanisms underlie the increased genome protection in both cases.

Figure 3: Synthetic nucleocapsid fitness landscape.
figure 3

Deep mutational scanning of I53-50-v2 enables comprehensive mapping of how each residue affects nucleocapsid fitness. a, Average log enrichments for all 20 amino acids at each position in the 2.5-megadalton capsid revealed that many native lysine and arginine residues (red circles) favour being mutated to other amino acids. b, Heat map of log enrichments for all amino acids at the positions exhibiting the highest and lowest conservation. Purple or orange indicate mutations that were depleted or enriched in the selected population, respectively. Blue squares and black dots indicate the I53-50-v2 starting sequence and the I53-50-v3 selected sequence, respectively. c, Average log enrichment was highly correlated between the RNase A (10 μg ml−1 RNase A, 37 °C, 1 h) and heparinized mouse blood (37 °C, 1 h) selections, indicating that the beneficial mutations shared a common mechanism for improving nucleocapsid stability (RNase A versus blood: Pearson correlation r = 0.79; one biological replicate of each selection condition for the trimeric subunit and two independent biological replicates for the pentameric subunit). d, The core and interface residues of the capsid pentameric and trimeric subunits are more highly conserved than the surface residues. The colour spectrum in d and g represents the average log enrichment of all 20 amino acids at the indicated position and is rescaled relative to that in c for clarity (purple is conserved and orange is highly mutated; see Methods). e, I53-50 design model with pentameric subunit (cyan), trimeric subunit (green) and pore indicated. f, g, Surface electrostatics (−, red; +, blue) (f) and sequence conservation (g) show that lysine and arginine residues are highly depleted in the capsid pore during evolution (in particular, trimeric subunit residues K8, K9, K11, K61, K145 and K152). By contrast, the negatively charged E4 is highly conserved. These data suggest that positively charged residues in the capsid pore are deleterious for RNA packaging and protection.

PowerPoint slide

Evaluating the enrichment values in the context of the I53-50 design model (Fig. 3d–g) provides insight into the features important for genome encapsulation and protection. I53-50 is composed of 20 trimers and 12 pentamers; the hydrophobic protein cores, intra-oligomer interfaces, and designed inter-oligomer interface were conserved (Fig. 3d, Supplementary Fig. 3), probably because proteins bearing mutations that disrupt the assembly fail to protect their genomes and are removed from the population. Strong selective pressure also operated on the electrostatics of the surface lining the pore between trimeric subunits of I53-50-v2; all highly depleted residues were lysines or arginines, whereas the nearby glutamate (residue E4) was highly conserved (Fig. 3f, g). Lysine removal around the pore also occurred in the earlier transition from I53-50-v1 to I53-50-v2 (K179N in the trimer and K124N in the pentamer; Fig. 2d, Supplementary Fig. 4). Positively charged residues near the pores may compromise genome protection either by promoting protrusion of the encapsulated RNA from the interior of the icosahedral assembly, thereby rendering it susceptible to RNases, or by destabilizing the assembly through electrostatic repulsion between trimeric subunits. To test whether several of the most enriched mutations could be combined to produce a synthetic nucleocapsid with superior fitness, a combinatorial library was constructed containing charged and uncharged polar residues at positions where positively charged residues were deleterious in the deep mutational scanning data (trimeric subunit: K2, K8, K9, K11 and K61). After selection in 10 μg ml−1 RNase A at 37 °C for 1 h, the six most enriched variants were tested individually to evaluate their improvements over I53-50-v2 (Extended Data Fig. 2b, c). The one best protected under these conditions was designated I53-50-v3 (trimeric subunit: K2T, K9R, K11T and K61D). The failure of an assembly-defective variant to protect its genome (I53-50-v3-KO; trimeric subunit: V29R, pentameric subunit: A38R; Supplementary Fig. 5) confirmed that encapsulation was required for RNA protection.

We next investigated whether synthetic nucleocapsids can evolve with part of their lifecycle inside an animal. As long circulation times are desirable for in vivo applications such as drug delivery, we decided to focus on this property. We hypothesized that the histidine tag might mediate undesired interactions in vivo, so we created cleavable versions that were used for all subsequent experiments (see Methods). We produced two populations of synthetic nucleocapsids, one displaying hydrophilic 60-residue polypeptides of varying compositions intended to mimic viral glycosylation or PEGylation18 (Supplementary Table 2) and another with 14 exterior surface positions combinatorially mutated to polar charged and uncharged amino acids (D, E, N, Q, K and R; Supplementary Table 1). We administered each population of nucleocapsids to mice by retro-orbital injection (n = 5 mice for hydrophilic peptides and 6 for surface positions), and evaluated the survival of each member of the population in vivo by blood draws from the tail vein at successive time points. From both libraries, several distinct sequences markedly improved circulation times. An optimal amino acid composition emerged in the hydrophilic peptide library (Extended Data Fig. 3a–c). Arbitrary polypeptides with similar amino acid composition (for example, 4.5 repeats of PETSPASTEPEGS or 4 repeats of PESTGAPGETSPEGS) increased the circulation time, whereas other polypeptides composed of different amino acids (for example, 12 repeats of ESESG) did not (Extended Data Fig. 3d, e). From the exterior surface library (Extended Data Fig. 4a, b), we isolated several variants exhibiting markedly enhanced circulation time compared to I53-50-v3 (Extended Data Fig. 4c, d) and found that most contained the substitution E67K in the pentameric subunit. We generated I53-50-v4 by incorporating E67K along with a set of other consensus mutations that were enriched in the selected population of synthetic nucleocapsids and may also contribute to increased expression and stability (Extended Data Table 1a; as the hydrophilic polypeptides reduced nucleocapsid yield, they were not included).

We next sought to determine what fraction of the I53-50-v4 synthetic nucleocapsids are filled, and with which RNAs. Negative-stain electron microscopy analysis of 15,119 particles showed that most I53-50-v4 nucleocapsids are more electron-dense—probably due to encapsulated nucleic acid—than the unfilled I53-50-v0 assemblies (Extended Data Fig. 5). Quantification of bulk RNA and protein indicated that there is approximately one nucleocapsid genome equivalent (1,433 nucleotides) of total RNA encapsulated per 6.6 I53-50-v1 capsids and 4.8 I53-50-v4 capsids (Extended Data Table 1b). RNA sequencing (RNA-seq) of I53-50-v4 capsids showed that approximately 74% of this RNA was derived from the nucleocapsid genome, suggesting one genome equivalent of nucleocapsid mRNA per 6.5 capsids (Fig. 4e, f). Independent quantification by RT–qPCR indicated one full-length genome per 11 capsids. The difference between these two estimates probably reflects the inclusion of genome fragments by RNA-seq (Extended Data Fig. 6). Although capsid genomes are modestly enriched and ribosomal RNA is depleted in nucleocapsids relative to cells (Fig. 4e, f), I53-50-v4 does not exhibit increased specificity for its genome relative to I53-50-v1 (Extended Data Fig. 7a). Instead, packaging correlates strongly with expression level (Extended Data Fig. 7b), accounting for the encapsulation of a modest amount of host cell RNA. The ability to package arbitrary RNA sequences combined with the ability to assemble in vitro from purified subunits5 could make synthetic nucleocapsids the basis for a highly flexible platform for RNA delivery.

Figure 4: Increased fitness of evolved synthetic nucleocapsids.
figure 4

Evolution markedly increases the property under selection without compromising previously evolved properties. ac, Time courses of full-length RNA genomes per 1,000 capsids isolated after challenge with 10 μg ml−1 RNase A at 37 °C (n = 3 independent reactions) (a), heparinized mouse whole blood at 37 °C (n = 3 independent reactions) (b), and in vivo circulation of live mice (n = 5 biologically independent animals) (c). Error bars denote s.e.m. d, Summary of improved nucleocapsid properties, including total packaged RNA (10 μg ml−1 RNase A for 10 min at 20 °C to degrade non-encapsulated RNA, n = 3 independent reactions). The coloured arrows in ac indicate the 6-h time point represented in the summary plot. Five synthetic nucleocapsids were tested: I53-50-v0 (original assembly that did not package its full length mRNA), I53-50-v1 (design with positive interior surface for packaging RNA), I53-50-v2 (evolution-optimized interior surface), I53-50-v3 (evolution-optimized residues lining the capsid pore), and I53-50-v4 (evolution-optimized exterior surface for increased circulation in living mice). Evolution resulted in efficient genome encapsulation for I53-50-v2 and its derivatives (approximately 1 RNA genome per 14 icosahedral capsids for I53-50-v2), protection from blood for I53-50-v3 and I53-50-v4 (82% and 71% protection, respectively), and increased circulation half-life for I53-50-v4 (~4.5 h serum half-life). Full-length RNA genomes were quantified by RT–qPCR, capsid proteins were quantified by Qubit, and genomes per capsid were calculated based on these values by dividing the number of genomes by the number of capsids. e, Nucleocapsid genomes are enriched and ribosomal RNA is depleted in nucleocapsids. f, Top 13 RNA transcripts encapsulated in I53-50-v4. Nucleocapsid genomes account for more than 74% of the packaged transcripts. g, h, The relative biodistribution of intact I53-50-v3 (g) and I53-50-v4 (h) nucleocapsids was evaluated by RT–qPCR of their full-length genomes recovered from mouse organs collected 5 min or 4 h after retro-orbital injection (n = 3 biologically independent animals at each time point for each nucleocapsid, I53-50-v3 and I53-50-v4). Red (5 min) and blue (240 min) bars represent the mean of three biologically independent animals, error bars denote the s.e.m., and thick black bars represent the detection limit of the assay. No obvious tissue tropism was observed for either nucleocapsid. At 4 hours after the injection, I53-50-v3 had largely disappeared, whereas I53-50-v4 remained predominantly in the blood with lower levels in the other tissues.

PowerPoint slide

Source data

Like modern viruses, our evolved synthetic nucleocapsids exhibit genome packaging, nuclease protection, and sustained circulation in vivo. Each evolutionary step (Extended Data Table 1a and Extended Data Fig. 8) improved the particular property under selection without compromising gains from previous steps (Fig. 4). Negative-stain electron micrographs of I53-50-v1, I53-50-v2, I53-50-v3 and I53-50-v4 showed that the functional improvements introduced by evolution did not compromise the designed icosahedral architecture (Supplementary Fig. 6), and dynamic light scattering indicated uniform populations of nucleocapsids around the expected size (radius = 13.5 nm; Extended Data Fig. 9). The I53-50-v1 design provided a starting point for evolution, inefficiently packaging its own full-length genome. Evolving the interior surface produced I53-50-v2, which packages roughly 1 RNA genome for every 14 capsids, rivalling the best recombinant adeno-associated viruses (rAAVs)8,9 (Fig. 4d). Subsequently, evolving the capsid pore for improved stability resulted in I53-50-v3, which protects 44% of its RNA when challenged by RNase A (10 μg ml−1, 37 °C, 6 h) and 82% of its RNA when challenged by mouse whole blood (37 °C, 6 h), whereas I53-50-v2 only protects 1.0% and 1.2%, respectively (Fig. 4a, b). Evolving the exterior surface of the capsid in circulation in live mice produced I53-50-v4, with a more than 54-fold increase in the circulation half-life—from less than 5 min to roughly 4.5 h— relative to I53-50-v3 (Fig. 4c). To further characterize the difference in behaviour between these two nucleocapsids, we determined the relative biodistribution of intact nucleocapsids by RT–qPCR of full-length genomes at both 5 min and 4 h. As expected, no obvious tissue tropism was observed for either nucleocapsid. There is no substantial intact I53-50-v3 remaining in any organs by 4 h after injection, consistent with the rapid elimination of I53-50-v3 compared to I53-50-v4 (Fig. 4g, h).

This work demonstrates that by acquiring positive charge on its interior, an otherwise inert self-assembling protein nanomaterial can package its own RNA genome and evolve under selective pressure. Starting from this blank slate, evolution uncovered several simple mechanisms to improve complex properties such as genome packaging, nuclease resistance and in vivo circulation time. This suggests paths by which viruses could have arisen from protein assemblies that adopted simple mechanisms to package their own genetic information. Modern viruses are much more complex, having evolved under selective pressure to minimize genome size and to optimize multiple capsid functions required for a complete viral life cycle. However, this makes it difficult to change one property (for example, alter tropism or remove epitopes for pre-existing antibodies19,20) without compromising other functions. By contrast, the simplicity of our synthetic nucleocapsids should allow them to be further engineered more freely. Combining the evolvability of viruses with the accuracy and control of computational protein design, synthetic nucleocapsids can be custom-designed and then evolved to optimize function in complex biochemical environments.

Methods

No statistical methods were used to predetermine sample size.

Solutions and buffers

Lysogeny broth (LB): autoclave 10 g tryptone, 5 g yeast extract, 5 g NaCl, 1 l dH2O. LB agar plates: autoclave LB with 15 g l−1 bacto agar. Terrific broth (TB): autoclave 12 g tryptone, 24 g yeast extract, 4 ml glycerol, 950 ml dH2O separately from KPO4 salts (23.14 g KH2PO4, 125.31 g K2HPO4, 1 l dH2O); mix 950 ml broth with 50 ml KPO4 salts at room temperature. Antibiotics: kanamycin (50 μg ml−1 final). Inducers: β-d-1-thiogalactopyranoside (IPTG, 500 μM final). Tris-buffered saline with imidazole (TBSI): 250 mM NaCl, 20 mM imidazole, 25 mM Tris-HCl, pH 8.0. Lysis buffer: TBSI supplemented with 1 mg ml−1 lysozyme (Sigma, L6876, from chicken egg), 1 mg ml−1 DNase I (Sigma, DN25, from bovine pancreas), and 1 mM phenylmethanesulfonyl fluoride (PMSF). Elution buffer: 250 mM NaCl, 500 mM imidazole, 25 mM Tris-HCl, pH 8.0. Phosphate-buffered saline (PBS): 150 mM NaCl, 20 mM NaPO4, pH 7.4. 20× lithium borate buffer (use at 1×): 1 l dH2O, 8.3 g lithium hydroxide monohydrate, 36 g boric acid. Tris-glycine buffer: 25 mM Tris-HCl, 192 mM glycine, 0.1% SDS, pH 8.3.

DNA cloning by PCR mutagenesis and isothermal assembly

Synthetic genes encoding I53-50 and I53-475 were amplified using Kapa High Fidelity Polymerase according to manufacturer’s protocols with primers incorporating the desired mutations or the Btat peptide. The resulting amplicons were isothermally assembled21 with PCR-amplified or restriction-digested (NdeI and XhoI) pET29b fragments and transformed into chemically competent E. coli XL1-Blue cells. Monoclonal colonies were verified by Sanger sequencing. Plasmid DNA was purified using a Qiagen miniprep kit and transformed into chemically competent E. coli BL21(DE3)* cells for protein expression.

Kunkel mutagenesis

Kunkel mutagenesis was performed as previously described22. In brief, E. coli CJ236 was transformed with the desired pET vector and then infected with bacteriophage M13K07. Single-stranded DNA (ssDNA) was purified from PEG/NaCl-precipitated bacteriophage using a Qiaprep M13 kit. Oligonucleotides were phosphorylated for 1 h with T4 polynucleotide kinase (NEB, M0201) and annealed to purified ssDNA plasmids. For routine cloning, annealing was performed using a temperature ramp from 95 °C to 25 °C over 30 min. For library generation, annealing mixtures were denatured at 95 °C for 2 min, followed by annealing for 5 min at either 55 °C (220 bp agilent oligonucleotides) or 50 °C (all other oligonucleotides). Oligonucleotides were extended using T7 DNA polymerase (NEB) for 1 h at 20 °C and transformed into E. coli as described for either routine cloning or library generation.

Transformation of DNA libraries

Plasmid DNA libraries generated as described above by isothermal assembly or kunkel mutagenesis were purified by SPRI purification23 and electrotransformed into E. coli DH10B (Invitrogen 18290-015) to produce libraries with at least 10× coverage. Transformed libraries were grown as lawns on LB agar plates containing 50 μg ml−1 kanamycin. Additionally, a tenfold dilution series of the transformed library was spotted onto an additional plate to assess library size. After 12–18 h of growth, the resulting lawn of cells was scraped from the plate into 1 ml of LB and pelleted at 16,000 g for 30 s. Plasmid DNA was purified directly from this cell pellet using a Qiagen miniprep kit and electrotransformed into E. coli BL21(DE3)* with a minimum of 10× coverage of the library. The resulting bacterial lawns were then lifted from plates in 1mL TB and inoculated directly into expression cultures.

Deep mutational scanning library design, amplification, and purification

For the deep mutational scanning library, the DNA sequence encoding the two components of I53-50-v2 was divided into 7 windows of up to 159 bp. For each window, a pool of oligonucleotides was synthesized to mutate every residue of I53-50-v2 in the specified window (Agilent SurePrint Oligonucleotide Library Synthesis, OLS). Each oligonucleotide encoded a single amino acid change using the most common codon in E. coli for that amino acid. To disambiguate bona fide mutations from sequencing and reverse transcription errors, mutagenic oligonucleotides included silent mutations on either side of the mutagenized position. Each of the 7 oligonucleotide pools was amplified from the OLS pool using primers that annealed to constant regions flanking the mutagenic sequences. Reaction progress was monitored by SYBR green fluorescence on a Bio-Rad CFX96 to prevent over-amplification. The resulting amplicons were then PAGE-purified and subjected to an additional round of amplification and SPRI purification. A final PCR reaction was set up with only the reverse primer to perform linear amplification of the desired primer sequence (50 cycles of temperature cycling were performed to generate a DNA sample highly enriched for the reverse strand). This sample was then purified using a Qiagen QIAquick PCR Purification Kit. The resulting pool of single stranded oligonucleotides was then used in a kunkel reaction as described above for library generation.

Hydrophilic polypeptide library design, amplification, and purification

The hydrophilic polypeptide library was generated by alternating sets of hydrophilic amino acids (DE, ST, QN, GE, EK, ES, EQ, EP or PAS) with a guest residue (A, S, T, E, D, Q, N, K, R, P, G, L or I) introduced between every 1, 2 or 5 occurrences to generate a final peptide of 59 amino acids in length. An additional 21 peptides were generated by splitting known hydrophilic peptides24,25 into 59 amino acid chunks or concatenating one of their primary repeating units. All polypeptide sequences were reverse translated to DNA using codon frequencies found in E. coli K1226, and flanking sequences were added for amplification. These oligonucleotide sequences were synthesized using Agilent OLS technology. After amplification, flanking regions were removed using the AgeI and HindIII restriction enzymes, and the insert was cloned onto the C terminus of the I53-50-v3 pentamer subunit by ligation (T4 ligase, NEB M0202, final concentration: 40 U μl−1, 1× T4 ligase buffer with 1 mM ATP). The resulting DNA was SPRI purified and transformed as described above.

Protein expression and purification

E. coli BL21(DE3)* expression cultures were grown to an optical density (600 nm) of 0.6 in 500 ml TB supplemented with 50 μg ml−1 kanamycin at 37 °C with shaking at 225 r.p.m. Expression was induced by the addition of IPTG (500 μM final). Expression proceeded for 4 h at 37 °C with shaking at 225 r.p.m. Cultures were harvested by centrifugation at 5,000g for 10 min and stored at −80 °C.

Cell pellets were resuspended in TBSI and lysed by sonication or homogenization using a Fastprep96 with lysing matrix B. Lysate was clarified by centrifugation at 24,000g for 30 min and passed through 2 ml of nickel-nitrilotriacetic acid agarose (Ni-NTA) (Qiagen, 30250), washed 3 times with 10 ml TBSI, and eluted in 3 ml of elution buffer, of which only the second and third milliltre were kept. EDTA was immediately added to 5 mM final concentration to prevent Ni-mediated aggregation.

For in vitro evolution (Figs 1, 2, 3, Extended Data Figs 1, 2, 3d, 4c and Supplementary Figs 2–8) and all experiments involving hydrophilic polypeptides, synthetic nucleocapsids were prepared with a C-terminal histidine tag on the pentameric subunit. For these constructs, purification proceeded immediately from Ni-NTA elution to size exclusion chromatography (SEC) using a Superose 6 Increase column (GE Healthcare, 29-0915-96) in TBSI. Nucleocapsids with hydrophilic polypeptides intended for mouse experiments were subjected to endotoxin removal as described below prior to SEC.

For all in vivo evolution experiments (Fig. 4, Extended Data Fig. 4a, d and Supplementary Figs 8, 9), synthetic nucleocapsids were prepared with a N-terminal, thrombin cleavable histidine tag on the pentameric subunit to allow scarless removal. This was done to allow removal of the affinity tag for in vivo use and to prevent the divalent cation-dependent aggregation observed in the C-terminal histidine-tagged constructs. After elution from Ni-NTA, these samples were dialysed into PBS and treated with thrombin at a final concentration of 0.00264 U μl−1 for 90 min at 20 °C to remove the histidine tag. Thrombin was inactivated by addition of PMSF (1 mM final concentration), and nucleocapsids were purified by SEC using a Superose 6 Increase column in PBS.

Endotoxin was removed from all samples intended for animal studies. Endotoxin removal was performed after thrombin cleavage by addition of Triton X-114 (1% final concentration vol/vol) followed by incubation at 4 °C for 5 min, incubation at 37 °C for 5 min, and centrifugation at 24,000 g at 37 °C for 2 min. The supernatant was then removed, incubated 4 °C for 5 min, incubated at 37 °C for 5 min, and centrifuged at 24,000g at 37 °C for 2 min to ensure optimal endotoxin removal before continuing with SEC purification in PBS.

Gel electrophoresis

Native agarose gels: agarose gels were prepared using 1% ultrapure agarose (Invitrogen) in lithium borate buffer. For synthetic nucleocapsid samples, 20 μl purified synthetic nucleocapsids were treated with 10 μg ml−1 RNase A (20 °C for 10 min), mixed with 4 μl 6× loading dye (NEB B7025S, no SDS), and electrophoresed at 100 V for 45 min. Gels were then stained with SYBR gold (Thermo Fischer Scientific, S11494) for RNA followed by Gelcode (Thermo Fischer Scientific, 24590) for protein.

DNA gels: 1% agarose gels were prepared containing SYBR Safe (Invitrogen) according to the manufacturer’s protocols.

Protein SDS–PAGE: SDS–PAGE was performed using 4–20% polyacrylamide gels (Bio-Rad) in Tris-glycine buffer.

RNA purification and reverse transcription

RNA was purified using TRIzol (Thermo Fischer Scientific, 15596018) and a Qiagen RNeasy kit (Qiagen, 74106) according to the manufacturers’ instructions. In brief, 100 μl synthetic nucleocapsid samples were mixed vigorously with 500 μl TRIzol. 100 μl chloroform was added and mixed vigorously, and then the solution was centrifuged for 10 min at 24,000g. Then, 150 μl of the aqueous phase was mixed with 150 μl of 100% ethanol, transferred to a RNeasy spin column for purification according to the manufacturer’s instructions, and eluted in 50 μl nuclease-free dH2O. For samples intended for absolute quantification (including RNA standards), yeast tRNA was added to 100 ng μl−1 final concentration to ensure consistent sample complexity.

Reverse transcription was carried out using Thermoscript reverse transcriptase for 1 h at 53 °C according to the manufacturer’s instructions, with the only modifications being that a gene-specific primer (skpp_reverse) was used. Thus, a 10 μl reaction contained: 1 μl dNTPs (10 mM each), 1 μl DTT (100 μM), 1 μl Thermoscript reverse transcriptase, 2 μl cDNA synthesis buffer, 1 μl RNase-Out, 1 μl skpp_reverse (10 μM), 2 μl purified RNA template, and 1 μl nuclease-free dH2O. Controls lacking reverse transcriptase were set up identically except with the substitution of nuclease-free dH2O in place of Thermoscript reverse transcriptase.

Quantitative PCR

Quantitative PCR was performed in a 10 μl reaction using a Kapa High Fidelity PCR kit (Kapa Biosystems, KK2502) according to the manufacturer’s instructions with the addition of SYBR green at 1× concentration and 0.5 μM forward and reverse primers (skpp_fwd and skpp_Offset_Rev) for quantification of nucleocapsid RNA. Thermocycling and Cq calculations were performed on a Bio-Rad CFX96 with the following protocol: 5 min at 95 °C, then 40 cycles of: 98 °C for 20 s, 64 °C for 15 s, 72 °C for 90 s.

Allele-specific qPCR was performed using Kapa 2G Fast polymerase readymix along with 1× SYBR green, 3 μl of 100× diluted cDNA template, and 0.5 μM each of the forward and reverse allele-specific primers for each construct. Thermocycling and Cq calculations were performed on a Bio-Rad CFX96 with the following protocol: 5 min at 95 °C, then 40 cycles of: 95 °C for 15 s, 58 °C for 15 s and 72 °C for 90 s.

Absolute quantification of full-length RNA per protein capsid was calculated from Cq values using a linear fit (log([RNA]) = m × (Cq) + b, where m is the slope and b is the y-intercept of a standard curve that consists of in vitro transcribed nucleocapsid RNA). In vitro transcription was performed using a NEB HiScribe T7 high yield RNA synthesis kit (NEB, E2040S) according to the manufacturer’s protocols. Excess DNA was degraded using RNase-free DNase I (NEB, M0303), and RNA was purified using Agencourt RNAClean XP (Beckman Coulter, A63987) according to manufacturer protocols. The concentration of this standard was measured using a Qubit RNA HS Assay Kit (Life Technologies, Q32852), and a tenfold dilution series was prepared in nuclease-free dH2O supplemented with 100 ng μl−1 yeast tRNA. The dilution series samples were then processed in parallel with the synthetic nucleocapsid samples using the RNA purification and reverse transcription protocol above, and run on the same qPCR plate as the samples to be quantified.

In the pooled samples used to compare the fitness of I53-50-v1, I35-50-v2, I53-50-v3 and I53-50-v4, the total amount of full-length nucleocapsid genome was quantified by qPCR performed with skpp_fwd and skpp_rev using the Kapa High Fideltiy PCR kit as described above. Subsequently, the relative fraction of RNA corresponding to each version was determined by allele specific qPCR as described above using allele-specific primers (Supplementary Table 6) unique to each version. Absolute quantification was with respect to a standard curve for each version prepared as described above. The fractional RNA content from each version was then multiplied by total amount of full-length genomes.

In vitro synthetic nucleocapsid selection conditions

The total amount of RNA packaged in nucleocapsids was evaluated by treating 100 μl synthetic nucleocapsids with 10 μg ml−1 RNase A at 20 °C for 10 min (‘total RNA’) so as to degrade non-encapsulated RNA. Reaction buffer was PBS for N-terminal histidine-tagged constructs or TBSI for C-terminal histidine-tagged constructs. More stringent RNase protection assays were performed with 10 μg ml−1 RNase A at 37 °C for the specified duration (‘RNase’). Protection from blood was assessed by diluting synthetic nucleocapsids 1:10 in heparinized mouse whole blood (collected from the vena cava of mice sacrificed using a lethal dose of avertin and stabilized in 6 U ml−1 heparin) and incubating at 37 °C for the specified duration (‘blood’). Samples were then centrifuged at 24,000g for 2 min before adding the supernatant to TRIzol. RNA was purified as described in the RNA Purification and RT–qPCR sections. All reactions were quenched by adding the sample directly to 500 μl TRIzol.

General information about mouse work

6–8-week-old BALB/c mice were selected randomly and retro-orbitally injected with 150 μl of synthetic nucleocapsids. All mice were female to minimize any unknown variability in tissue distribution bias attributed to animal sex. No blinding was performed. The Institutional Animal Care and Use Committee (IACUC) at the University of Washington authorized all animal work in accordance with ethical animal use and regulations.

In vivo synthetic nucleocapsid selection conditions

Synthetic nucleocapsid libraries containing either hydrophilic polypeptides (104 μg ml−1, 3.7 × 1012 injected particles per mouse) or exterior surface mutations (570 μg ml−1, 2.0 × 1013 injected particles per mouse) were created and selected for circulation time in live mice. Five mice per library underwent retro-orbital injections and tail lancet blood draws at 5, 10, 15 and 30 min, with a final euthanization and blood draw at 60 min. Following Illumina MiSeq sequencing of the selected nucleocapsid libraries, the circulation times of several selected variants (I53-50-v1, I53-50-v2, I53-50-v3, 10 hydrophilic polypeptide variants, and 4 surface mutation variants were pooled to 570 μg ml−1 total protein) were compared in 5 mice with tail lancet blood draws at 5, 15, 30, 60 and 120 min, submental collection10 at 4 h, and final euthanization and blood draw at 6 h. I53-50-v4 was created based on the consensus sequence of the most common residues in the library after in vivo selection.

Synthetic nucleocapsid characterization for Fig. 4a–d

I53-50-v1, I53-50-v2, I53-50-v3 and I53-50-v4 were expressed in E. coli BL21(DE3)*, harvested by cell lysis, purified by IMAC, dialysed into PBS, cleaved by thrombin, subjected to endotoxin removal, and purified by SEC as described above. The protein concentrations for each sample were determined using a Qubit Protein Assay kit (Thermo Fisher Scientific, Q33211), and samples were mixed to give a final concentration of 170 μg ml−1 nucleocapsid protein for each version (680 μg ml−1 total). This pool was split into four different samples that were each subjected to the total RNA, RNase, blood and in vivo selection conditions described above (n = 3 independent replicates for each in vitro selection condition). For in vivo selections, 150 μl of the pool was injected retro-orbitally (2.4 × 1013 particles per mouse), and tail lancet draws were performed at 5 min, 1 h, 3 h and 6 h, submental collection10 at 10 h, and final euthanization and blood draw at 24 h. While I53-50-v0 was not included in this nucleocapsid pool, we independently showed that it does not package detectable RNA.

Synthetic nucleocapsid biodistribution

I53-50-v3 and I53-50-v4 were injected into 6 mice each. Animals were then euthanized after either 5 min or 4 h (3 animals per nucleocapsid version at each time point). After blood was collected, animals were exsanguinated by transcardial perfusion with saline. Half of each bisected organ and 20 μl of whole blood were collected into tubes containing 500 μl TRIzol and homogenized. RNA was purified, total tissue RNA was measured by either absorbance at A260nm (organs) or Qubit RNA HS Assay Kit (blood, owing to its lower total RNA), and full-length nucleocapsid genomes were quantitated by RT–qPCR as described above.

Negative-stain electron microscopy specimen preparation, data collection and data processing

6 μl of purified protein (I53-50-v0, I53-50-v1, I53-50-v2, I53-50-v3, I53-50-v4, I53-50-Btat, I53-47-v0, I53-47-v1 and I53-47-Btat) at 0.04–0.30 mg ml−1 were applied to glow discharged, carbon-coated 300-mesh copper grids (Ted Pella), washed with Milli-Q water and stained with 0.75% uranyl formate as described previously27. Screening and sample optimization was performed on a 100 kV Morgagni M268 transmission electron microscope (FEI) equipped with an Orius charge-coupled device (CCD) camera (Gatan). Data were collected with Leginon automatic data-collection software28 on a 120 kV Tecnai G2 Spirit transmission electron microscope (FEI) using a defocus of 1 μm with a total exposure of 30 e Å−2. All final images were recorded using an Ultrascan 4000 4 k × 4 k CCD camera (Gatan) at 52,000 × magnification at the specimen level. For data collection used in two-dimensional class averaging, the dose of the electron beam was 80 e Å−2, and micrographs were collected with a defocus range between 1.0 and 2.0 μm. Coordinates for unique particles (7,979 for I53-50-v0 and 7,130 for I53-50-v4) were obtained for averaging using EMAN229.

Illumina sequencing sample preparation for evolution experiments

Evolution experiments were analysed by performing targeted RNA-seq on full-length nucleocapsid genomes surviving the specified selection condition (RT–qPCR using skpp_reverse as the reverse transcription primer and using skpp_fwd and skpp_Offset_Rev as the qPCR primers). The starting populations and selected populations were evaluated by sequencing nucleocapsid genomes extracted from producer cells or nucleocapsids, respectively. After SPRI purification, two sequential qPCR reactions were performed using Kapa HiFi polymerase to add sequencing adapters and barcodes, respectively. qPCR reactions were monitored by SYBR green fluorescence and terminated before completion so as to prevent over-amplification. The resulting amplicons were purified using SPRI purification or a Qiagen QIAquick Gel Extraction Kit. The resulting amplicons were then denatured and loaded into a Miseq 600 cycle v3 (Illumina) kit and sequenced on an Illumina MiSeq according to the manufacturer’s instructions.

Illumina sequencing sample preparation for comprehensive RNA-seq

The composition of encapsulated RNA was evaluated by performing comprehensive RNA-seq on total RNA from producer cells (representing expression levels) and nucleocapsids (representing encapsulated RNA). RNA was extracted using TRIzol and purified using a Direct-zol RNA MiniPrep Plus kit (Zymo Research, R2072) with on-column DNase digestion. The purified RNA was quantitated using a Qubit RNA HS Assay Kit, and 100 ng of RNA was used to prepare each RNA-seq library with a NEBNext Ultra RNA Library Prep Kit for Illumina (NEB, E7530S). Each library was PCR amplified using Kapa HiFi polymerase to add sequencing barcodes before being pooled for sequencing. The resulting libraries were then denatured and loaded into an Illumina NextSeq 500/550 High Output Kit v2 (75 cycles) and sequenced on an Illumina NextSeq according to the manufacturer’s instructions.

Sequencing analysis for evolution experiments

Raw sequencing reads were converted to fastq format and parsed into separate files for each sequencing barcode using the Generate Fastq workflow on the Illumina MiSeq. Forward and reverse reads were combined using the read_fuser script from the enrich package30.

For all libraries, enrichment values were calculated as the change in fraction of the library corresponding to each linked sequence (rank order of variants) or unlinked substitutions (heat maps) that were observed at least 10 times in the naive library. The base 10 logarithm of each value was then taken in order to give enrichment values that more symmetrically span enrichment and depletion.

For the charge optimization library, the total interior charge of each variant was calculated by summing the number of lysine and arginine residues, and subtracting the number of aspartate and glutamate residues determined to be on the interior surface of the capsid by visual inspection of the design model. For the deep mutational scanning library, substitutions were only counted if they contained the expected silent mutation barcodes as described in the oligonucleotide design section. This greatly reduces the effect of both RT–PCR errors and sequencing errors because instead of a minimum of one error allowing a miscalled amino acid mutation, a minimum of three errors are required for a mutation to be miscalled.

Heat maps were generated using a custom MatPlotLib31 script by mapping the calculated log enrichment values onto a LinearSegmentedColormap (purple, white, orange; rgb = (0.75,0,0.75), (1,1,1), (1.0,0.5,0)) using the ‘pcolormesh’ function. The minimum and maximum values of the colour-mesh were set as shown in each figure to fully utilize the dynamic range of the colour-map. A pymol session coloured by the average log enrichment of all 20 amino acids at each position was created by substituting average log enrichment values for B-factors in the pdb file and running the command: ‘spectrum b, purple_white_white_orange, minimum = −1.5, maximum = 0.6’. Note that this is rescaled relative to the colouring of individual residues because the averages span a smaller range than the individual values, and thus a different colour range is needed to clearly differentiate values.

Sequencing analysis for comprehensive RNA-seq

RNA-seq data were converted from bcl format to fastQ format using the Illumina bcl2fastq script. HISAT232 converted fastQ to sam, and SAMtools33 converted sam files to sorted bam files. StringTie34 was used to calculate gene expression as transcripts per kilobase million (TPM).

Dynamic light scattering

Dynamic light scattering was performed on a DynaPro NanoStar (Wyatt) DLS setup. I53-50-v0, I53-50-v1, and I53-50-v4 were evaluated with 0.2 mg ml−1 of nucleocapsid protein in PBS at 25 °C. Data analysis was performed using DYNAMICS v7 (Wyatt) with regularization fits.

Code availability

Custom scripts for Illumina sequencing analysis are available on github (https://github.com/mlajoie/Synthetic_Nucleocapsid).

Data availability

Nucleocapsid genome files (GenBank format) and design models (pdb format) are available on github (https://github.com/mlajoie/Synthetic_Nucleocapsid). All raw sequencing data from Figs 2, 3, 4, Extended Data Figs 2, 3, 4, and Supplementary Figs 3, 7–9 are available at the NCBI Sequence Read Archive under BioProject accession PRJNA417493. Source data for Figs 1d–f, 4a–d, Extended Data Figs 1b–d, 3d, e, 4c, d and Supplementary Fig. 8 are provided with the paper. All other raw data not included in the manuscript are available from the corresponding author upon request.