Main

PPR proteins function in multiple aspects of organelle RNA metabolism, such as RNA splicing, editing, degradation and translation1,2,3,4,5. In plants, PPR mutants may cause embryonic lethality12,13,14, and a number of PPR proteins act as restorers of fertility to overcome cytoplasmic male sterility15,16,17,18,19. In humans, mutations in the mitochondrial PPR protein LRPPRC are associated with the French-Canadian-type Leigh syndrome characterized by the deficiency in Complex IV20,21.

PPR proteins contain 2–30 tandem repeats, each typically comprising 35 amino acids that are organized into a hairpin of α-helices1,6,22,23. PPRs are divided into two classes: the P-class, whose members only comprise the 35-amino-acid repeats; and the PLS-class, which has repeats of 31–36 amino acids and extra domains at the carboxyl terminus3,12. Computational and biochemical analyses suggest that PPR proteins may recognize RNA in a modular fashion, but different from that of the RNA-binding PUF domain6,24. The putative RNA recognition code by PPR proteins derived from bioinformatic and biochemical analyses awaits structural corroboration2,6,7,8.

To elucidate the mechanism of specific RNA recognition by PPR proteins, we sought to determine the crystal structure of well-characterized PPR proteins in complex with their target RNAs. The recombinant protein of maize chloroplast PPR10, which belongs to the P-class, specifically binds to the 17-nucleotide (nt) (ATPH) and 18-nt (PSAJ) RNA oligonucleotides (Extended Data Fig. 1a)10. We launched a systematic effort to determine the structures of PPR10 in both RNA-free and RNA-bound states.

The crystal structure of the RNA-free PPR10 fragment (residues 61–786) containing quadruple Cys mutations (C256S/C279S/C430S/C449S) was determined at 2.85 Å resolution. PPR10 forms a right-handed two-turn superhelical assembly, with 19 PPR motifs (residues 107–771) capped by three short α-helices at the amino-terminal domain (NTD) and a single α-helix at the C terminus (Fig. 1a). Capping motifs are known to contribute to ligand specificity for repeat proteins such as TPR (tetratricopeptide repeat)25 and TALE (transcription activator-like effector)26,27. The function of the extra motifs in PPR10 remains to be determined.

Figure 1: Crystal structure of RNA-free PPR10.
figure 1

a, Overall structure of RNA-free PPR10. The fragment (residues 61–786, C256S/C279S/C430S/C449S) comprises 19 repeats capped by a small NTD (light purple) and a C-terminal helix (yellow). The two helices within each repeats, designated helix a and helix b, are coloured green and blue, respectively. b, Structural superimposition of the 19 repeats of PPR10. c, Overall structure of the PPR10 dimer. Two molecules from adjacent asymmetric units form an intertwined antiparallel dimer. All structure figures were prepared with PyMol30.

PowerPoint slide

The 35 amino acids in each PPR motif form a hairpin of α-helices, each containing four helical turns, followed by a five-residue loop (Fig. 1b). The two helices, designated helix a and helix b, are connected by a short turn of two amino acids. Helices a and b of each repeat constitute the inner and outer layers of the superhelical assembly, respectively (Fig. 1a). In the crystals, there is one molecule of PPR10 in each asymmetric unit, yet two symmetry-related molecules are intertwined in an antiparallel fashion. The N terminus of one molecule is in close contact with the C terminus of the other, yielding an overall appearance of an ellipsoid with a polar axis of approximately 140 Å and an equatorial diameter of 70 Å (Fig. 1c).

On the basis of the PPR10 structure, we defined the starting amino acid of helix a as the first residue in a PPR motif (Fig. 1b and Extended Data Fig. 1b). This definition results in a one-residue shift either forwards6,12 or backwards7,28 within each repeat compared to the previously described boundary of a PPR motif (Extended Data Fig. 1c). With the new boundary assignment of a PPR motif, the residues that were predicted to determine RNA binding specificity are all included in one structurally intact motif. We hope that this structure-based demarcation of the PPR motif will simplify future descriptions of PPR proteins.

After numerous unsuccessful crystallization trials for PPR10–ATPH complexes, we finally determined the structure of PPR10 (residues 69–786, C256S/C279S/C430S/C449S) in the presence of 18-nt PSAJ RNA (5′-GUAUUCUUUAAUUAUUUC-3′) at 2.45 Å resolution (Extended Data Table 1). In the crystals, there is one antiparallel PPR10 dimer in each asymmetric unit. Analysis by sedimentation equilibrium analytical ultracentrifugation (SE-AUC) of PSAJ-bound PPR10 (residues 37–786, C256S/C279S/C430S/C449S) supports its dimeric existence at micromolar concentration in solution (Extended Data Fig. 2). The two PPR10 protomers can be superimposed with a root-mean-squared deviation of 1.3 Å over 629 Cα atoms (Extended Data Fig. 3). The overall appearance of the dimer has changed to a hollow cylindrical tube (Fig. 2a), and the N- and C-terminal portions of the PPR10 protomer are compressed towards the centre, resulting in a reduction of 20 Å in axial length (Fig. 2b).

Figure 2: Structure of PPR10 bound to an 18-nt PSAJ RNA element.
figure 2

a, The PPR10 dimer (residues 69–786, C256S/C279S/C430S/C449S) forms a cylindrical tube in the presence of PSAJ. The two protomers are coloured light purple and grey with their NTDs coloured blue and cyan. b, The PPR10 protomer undergoes pronounced conformational changes upon binding to PSAJ. The structure of RNA-free PPR10 is coloured magenta with the NTD coloured lilac. c, Electron densities found in the cavities on both ends of the PPR10 dimer. The ‘omit’ electron density, with a close-up view in the inset, is contoured at 3σ. d, Overall structure of the PPR10–PSAJ complex. The two ssRNA molecules are coloured yellow and orange.

PowerPoint slide

Following assignment of most amino acids of PPR10 into the electron density map, strong electron densities indicative of RNA bases became clearly visible in the cavities on both ends of the cylindrical tube (Fig. 2c). Assignment of 18 and 14 nucleotides of the two bound RNA elements was validated by the anomalous signals of bromine (Br), which were collected for crystals of PPR10 bound to Br-labelled RNA oligonucleotides (Extended Data Fig. 4 and Extended Data Table 2). The 5′ and 3′ portions of the ssRNA are specifically recognized by the N-terminal repeats of one protomer and C-terminal repeats of the other. By contrast, the middle portion of the ssRNA, comprising nucleotides U5 to A10, remains largely uncoordinated by PPR10 (Fig. 2d and Extended Data Fig. 5a, b).

PPR10 has 19 repeats and the bound PSAJ RNA contains 18 nucleotides. Consistent with a bioinformatic prediction6, specific recognition of the PSAJ RNA begins with repeat 3 (Fig. 3a). Each of the first four nucleotides on the 5′ end, 5′-GUAU-3′, is recognized by one PPR10 repeat. Such recognition exhibits a modular pattern involving residues that were predicted through biochemical and bioinformatic analyses2,6,7. Each RNA base is surrounded by four residues, the 2nd residues from two adjacent repeats, and the 5th and 35th residues from a corresponding repeat. In addition to base recognition, the backbone phosphate or ribose groups of the bound PSAJ RNA are also coordinated by charged or polar amino acids from PPR10 (Extended Data Fig. 5c).

Figure 3: Base-specific recognition of ssRNA by PPR10 repeats.
figure 3

a, The four nucleotides at the 5′ end of the PSAJ RNA segment are specifically recognized in a modular fashion. Inset: each of the four RNA bases at the 5′ end is sandwiched by two residues at the 2nd positions of adjacent repeats. b, Specific recognition of the bases G, U and A by PPR10 repeats. The side chain of the 5th residue in each repeat, which makes a direct hydrogen bond to the base, is highlighted in cyan. The hydrogen bonds are represented by red dotted lines.

PowerPoint slide

A polar amino acid located at the 5th position in each repeat appears to be the most important determinant for RNA base specificity. Thr 178, Asn 213, Ser 249 and Asn 284 in repeats 3–6 recognize the bases G1, U2, A3 and U4, respectively, through direct hydrogen bonds (Fig. 3b). The importance of the 5th residue in RNA recognition is supported by mutational analysis. Mutating any of the 5th residues in repeats 4 (N213A), 5 (S249L) or 6 (N284A) resulted in complete abolishment of RNA binding. By contrast, substitution of the 5th residues of repeats 7, 8, 10, 11 or 13, which are not involved in RNA binding in the structure, showed little or no effect on PSAJ binding (Extended Data Fig. 6).

Buttressing the hydrogen bonds, five residues at the 2nd position of PPR repeats 3–7 sandwich the four bases mainly through van der Waals interactions (Fig. 3a). For example, G1 is surrounded by Arg 175/Val 210 of repeats 3 and 4. Similarly, U2, A3 and U4 are sandwiched by Val 210/Phe 246, Phe 246/Val 281 and Val 281/Val 316, respectively (Fig. 3). The 35th residue is located in the vicinity of the base. It is possible that water molecules, although invisible in the structure, may mediate hydrogen bonds between the polar residues and the bases. Importantly, Asp 244 and Asp 314, the 35th residues in repeats 4 and 6, are respectively hydrogen bonded to Asn 213 and Asn 284, the 5th residues in the corresponding repeats, and may help to stabilize their conformation for base recognition (Fig. 3b and Extended Data Fig. 6d).

Recognition of the 3′ end of the PSAJ RNA by the C-terminal repeats in the other PPR10 protomer appears to be less modular except for U15 and U16, which are coordinated by repeats 16 and 17 following the described recognition pattern for U2 and U4 (Fig. 4a). The bases A11 and C18 are coordinated in a non-modular fashion. The adenine base of A11 donates a hydrogen bond to Asp 630, the 35th residue of repeat 15, whereas the cytosine base of C18 makes a hydrogen bond to Ser 714 on the last helical turn of helix 18a (Fig. 4b). A number of direct and water-mediated hydrogen bonds are found between the backbone phosphate/ribose groups of U15-U16-U17-C18 and the polar residues on repeats 15–19 of PPR10 (Fig. 4c).

Figure 4: Coordination of the 3′-end segment of the PSAJ RNA by PPR10.
figure 4

a, Recognition of the bases U15 and U16 by PPR10 follows the code discussed in Fig. 3. b, The bases A11 and C18 are hydrogen bonded to polar residues on PPR10 in a non-modular fashion. c, The backbone of the 3′-end segment of PSAJ is coordinated through direct or water-mediated hydrogen bonds. d, Summary of specific recognition of PSAJ RNA by PPR10. Left, the residues at the 2nd, 5th and 35th positions of PPR10 motifs and the corresponding sequences of the target RNA elements. The structurally corroborated recognition codes are shaded yellow. The difference in RNA sequences of PSAJ and ATPH is shaded grey. Right, the structure of repeats 3–19 of PPR10 with their 2nd (yellow), 5th (cyan) and 35th (wheat) residues shown in spheres, and the six recognized RNA bases shown in orange.

PowerPoint slide

In the structure of PSAJ-bound PPR10, only six out of 18 nucleotides in the PSAJ RNA element strictly follow the modular pattern (Fig. 4d). Two bases, A11 and C18, are bound by PPR10 in a non-modular fashion, and the other 10 bases are literally uncoordinated. Notably, only 17 repeats in each PPR10 protomer are available for the binding of 18 nucleotides. It remains to be seen whether the 17-nt ATPH binds to repeats 3–19 of PPR10 in a completely modular fashion. Binding of ATPH leads to dissociation of the PPR10 dimer (Extended Data Fig. 2)6,10. Interestingly, analysis by SE-AUC suggests that, although RNA-free PPR10 forms a stable dimer, PSAJ binding weakens the PPR10 dimer formation (Extended Data Fig. 2). It remains to be investigated whether PSAJ-bound PPR10 is a dimer under physiological conditions, in which the protein concentration can be very low. Nevertheless, the crystal structure of PPR10 bound to the 18-nt PSAJ RNA element reveals the molecular basis for specific recognition between a PPR protein and its target RNA sequence.

Recognition of the six RNA nucleotides 5′-G1-U2-A3-U4-3′ and 5′-U15-U16-3′ by repeats 3–6 and repeats 16 and 17, respectively, largely supports the predicted code for base discrimination in ssRNA, where the bases G, U and A are specifically recognized by the 5th residues of a PPR motif: Thr, Asn and Ser, respectively6,7. The 2nd and 35th residues, sitting in the vicinity of the bases, contribute to RNA binding (Fig. 4d and Extended Data Fig. 6c, d). The prediction that base C is recognized by an Asn at the 5th position can also be conveniently rationalized based on our crystal structure (Extended Data Fig. 7). One unexpected feature is that the 2nd residues from two consecutive repeats sandwich one base; therefore the identity of the 2nd residue on the next repeat must be considered for RNA binding. Base sandwiching by hydrophobic residues or Arg is also observed in the recognition of ssRNA by PUF proteins24,29, although PPR and PUF proteins exhibit distinct RNA binding modes (Extended Data Fig. 8).

Further biochemical, computational, structural and in vivo characterizations are required to completely rationalize the codes for specific RNA recognition by PPRs and to engineer PPR proteins for targeted RNA manipulations. The structures reported here provide unprecedented insights into the recognition mechanism of RNA elements by PPR proteins and serve as an important foundation for understanding the function and mechanism of numerous PPR proteins in RNA metabolism, and for the potentially customized design of specific-RNA-binding PPR proteins.

Methods Summary

The codon-optimized complementary DNA of full-length PPR10 (Gene ID: 100302579) from Zea mays was subcloned into pET15b vector (Novagen). Overexpression of PPR10 protein was induced in Escherichia coli BL21(DE3). To crystallize PPR10, we mounted a systematic protein engineering effort including a series of protein truncations and mutations of Cys residues. There are 18 Cys residues within the repeat region. We generated 18 mutants, each consisting of a single Cys to Ser mutation and tested their binding with the 17-nt ATPH element. For those that completely retained binding affinity, we further grouped them to double, triple and quadruple mutations. Finally, the PPR10 mutant containing C256S/C279S/C430S/C449S showed the same binding affinity as wild type and exhibited excellent protein behaviour. All the PPR10 proteins used in the manuscript contain the quadruple Cys mutations. The RNA-free PPR10 fragment (residues 61–786, C256S/C279S/C430S/C449S) was eventually crystallized in the space group P21212. The structure was determined by selenium-based single-wavelength anomalous diffraction and refined to 2.85 Å resolution (Extended Data Table 1). In the effort to crystallize PPR10 in complex with its target RNA, despite numerous trials, most PPR10–ATPH complexes defied crystallization; for those that crystallized, X-ray diffraction was consistently poor. We applied the same strategy to complexes between PPR10 and PSAJ. After screening more than 100,000 conditions, we were able to crystallize the complex between PPR10 (residues 69–786, C256S/C279S/C430S/C449S) and the 18-nt PSAJ RNA (5′-GUAUUCUUUAAUUAUUUC-3′) in the space group P43. These crystals diffract X-rays beyond 2.5 Å. The structure was determined by molecular replacement using successive segments of the RNA-free PPR10 structure, but not the entire molecule. We were able to assign all 18 nucleotides of one bound PSAJ RNA element, but only 14 of the other. For details of electrophoretic mobility shift assay and SE-AUC experiments, please refer to Methods.

Online Methods

Protein preparation

The codon-optimized complementary DNA of full-length PPR10 (Gene ID: 100302579) from Zea mays was subcloned into pET15b vector (Novagen). Overexpression of PPR10 protein was induced in E. coli BL21(DE3) with 0.2 mM isopropyl-β-d-thiogalactoside at an OD600 nmof 1.2. After growing for 16 h at 16 °C, the cells were collected, homogenized in a buffer containing 25 mM Tris-HCl, pH 8.0, and 150 mM NaCl. After sonication and centrifugation, the supernatant was applied to Ni2+ affinity resin (Ni-NTA, Qiagen) and further fractionated by ion-exchange chromatography (Source 15Q, GE Healthcare). The PPR10 mutants were generated using two-step PCR and subcloned, overexpressed and purified in the same way as the wild-type protein.

A systematic protein engineering effort was mounted for crystallization of RNA-free and -bound PPR10. A series of protein truncations were tested without giving rise to crystals. There are 18 Cys residues within the repeat region. It is well known that the presence of surface Cys residues, which are subject to oxidation, may lead to protein heterogeneity and impede crystallization. We therefore generated 18 mutants, each consisting of a single Cys to Ser mutation and tested their binding with the 17-nt ATPH element. For those that completely retained binding affinity, we further grouped them to double, triple and quadruple mutations. Finally, the PPR10 mutant containing C256S/C279S/C430S/C449S showed the same binding affinity as wild type and exhibited excellent protein behaviour. For consistency, all the PPR10 proteins used in the manuscript contain the quadruple Cys mutations.

For the crystallization trials of RNA-free PPR10 (residues 61–786, C256S/C279S/C430S/C449S), the protein was concentrated and applied to gel filtration chromatography (Superdex-200 10/30, GE Healthcare) in the buffer containing 25 mM Tris-HCl, pH 8.0, 150 mM NaCl and 10 mM dithiothreitol (DTT). Selenomethionine (Se-Met)-derived protein was purified similarly.

To obtain the crystals of protein–RNA complex, PPR10 (residues 69–786, C256S/C279S/C430S/C449S) was purified through Ni2+ affinity resin (Ni-NTA, Qiagen), followed by heparin affinity column (HiPrep Heparin FF 16/10, GE Healthcare). The protein was then applied to gel filtration chromatography (Superdex-200 10/30, GE Healthcare). The buffer for gel filtration contained 25 mM Tris-HCl, pH 8.0, 50 mM NaCl, 5 mM MgCl2 and 10 mM DTT. The peak fractions were incubated with target RNA oligonucleotides with a molar ratio of approximately 1:1.5 at 4 °C for about 40 min before crystallization trials.

Crystallization

Both RNA-free and RNA-bound PPR10 proteins were crystallized by hanging-drop vapour-diffusion method at 18 °C. PPR10 (residues 61–786, C256S/C279S/C430S/C449S), at a concentration of approximately 6.0 mg ml−1, was mixed with an equal volume of reservoir solution containing 1.8–2.1 M sodium formate, and 0.1 M Bis-Tris propane, pH 6.5. Plate-shaped crystals appeared overnight and grew to full size within 1–2 weeks. Se-Met-labelled protein was crystallized similarly.

To obtain crystals of protein–RNA complex, various combinations of protein boundaries and RNA oligonucleotides (Takara) were examined. Because the first visible residue in the structure of RNA-free PPR10 starts at position 69, we invested more effort into this construct. Finally, the protein (residues 69–786, C256S/C279S/C430S/C449S) and 18-nt RNA from the PSAJ–RPL33 intergenic region with the sequence 5′-GUAUUCUUUAAUUAUUUC-3′ (designated PSAJ RNA) gave rise to crystals in the reservoir solution containing 8–10% (w/v) polyethylene glycol 3350, 8% Tacsimate, pH 6.0 (Hampton Research), and 0.1 M MES, pH 5.5.

Data collection and structural determination

All data sets were collected at SSRF beamline BL17U or SPring-8 beamline BL41XU and processed with the HKL2000 packages31. Further processing was carried out with programs from the CCP4 suite32. Data collection and structure refinement statistics are summarized in Extended Data Tables 1 and 2.

The RNA-free PPR10 structure was solved by single anomalous diffraction (SAD) of Se-Met using the program ShelxC/D/E33. Then a crude helical model was manually built in the program Coot34. Using this partial model as input, the identified Se atom positions were refined and phases were recalculated using the SAD experimental phasing module of the program Phaser35. With the improved map, the molecular boundary was unambiguously defined and one molecule was found in an asymmetry unit. The crude model was further rebuilt with Coot and refined with Phenix36. The sequence docking was aided by anomalous map of selenium.

Data sets collected from five crystals of the PPR10–RNA complex were merged for complete and better data. The structure of the PPR10–RNA complex was solved by molecular replacement with the newly solved RNA-free structure as the search model using the program Phaser35. To find the right solution, the structure of the RNA-free PPR10 protomer was divided into three consecutive segments. The assignment of RNA sequence was aided by the anomalous signal of bromine obtained for crystals of PPR10 in complex with Br-labelled RNA oligonucleotides, where U4/U7/U15, U5/U7/U15 or U12 were substituted by 5-bromouracil (Extended Data Table 2). The structure was manually refined with Coot and Phenix iteratively (Extended Data Table 1).

Electrophoretic mobility shift assay (EMSA)

The ssRNA oligonucleotides were radiolabelled at the 5′ end with [γ-32P] ATP (PerkinElmer) catalysed by T4 polynucleotide kinase (Takara). The sequences of ssRNA oligonucleotides used in EMSA are: PSAJ, 5′-GUAUUCUUUAAUUAUUUC-3′; and ATPH, 5′-GUAUCCUUAACCAUUUC-3′.

For EMSA, PPR10 (residues 37–786, C256S/C279S/C430S/C449S) and the other variants consisting of the indicated point mutations were incubated with approximately 40 pM 32P-labelled probe in the final binding reactions containing 40 mM Tris-HCl, pH 7.5, 100 mM NaCl, 4 mM DTT, 0.1 mg ml−1 BSA, 5 μg ml−1 heparin and 10% glycerol at room temperature (22 °C) for 20 min. Reactions were then resolved on 6% native acrylamide gels (37.5:1 for acrylamide:bisacrylamide) in 0.5× Tris-glycine buffer under an electric field of 15 V cm−1 for 40 min. Vacuum-dried gels were visualized on a phosphor screen (Amersham Biosciences) with a Typhoon Trio Imager (Amersham Biosciences).

SE-AUC

The oligomeric states of PPR10 (residues 37–786, C256S/C279S/C430S/C449S) with or without target RNA oligonucleotides in solution were investigated by AUC experiments. SE-AUC experiments were performed in a Beckman Coulter XL-I analytical ultracentrifuge using six-channel centrepieces. RNA-free PPR10, PSAJ-bound PPR10 and ATPH-bound PPR10 were in solutions containing 25 mM Tris-HCl, pH 8.0, 150 mM NaCl and 2 mM DTT. The sequences of RNA oligonucleotides were identical to those used in EMSA. Data were collected by interference detection at 4 °C for all three protein concentrations (4 μM, 6 μM and 8 μM) at different rotor speeds (6,000, 8,500 and 12,000 r.p.m.). The buffer composition (density and viscosity) and protein partial specific volume (V-bar) were obtained using the SEDNTERP program (available through the Boston Biomedical Research Institute). The SE-AUC data were globally analysed using the Sedfit and Sedphat programs37 and were fitted to a monomer–dimer equilibrium model to determine the dissociation constants (Kd) for the homodimers.