Chemical Structure of RNA

Citation: Clancy, S. (2008) Chemical structure of RNA. Nature Education 7(1):60

The more researchers examine RNA, the more surprises they continue to uncover. What have we learned about RNA structure and function so far?

Aa Aa Aa

In a schematic illustration, a four base-pair-long region of double-stranded, double helical DNA is shown at the left, and a region of single-stranded RNA is shown at the right. In DNA, the two sugar-phosphate backbones are represented as two parallel, grey ribbons. Arrows at the end of the grey ribbons show how one strand is oriented in an antiparallel manner relative to the other strand. In RNA, only a single sugar-phosphate backbone is shown; it is represented as a single vertical, grey ribbon. A downward-pointing arrow at the end of the ribbon shows the spatial orientation of the single strand. In both the DNA and the RNA molecules, the strands are transparent, revealing the structure and position of individual atoms, and covalent and hydrogen bonds within the sugar-phosphate backbone and between nitrogenous base pairs (in the case of DNA), respectively.

Figure 1

Figure Detail

With the discovery of the molecular structure of the DNA double helix in 1953, researchers turned to the structure of ribonucleic acid (RNA) as the next critical puzzle to be solved on the road to understanding the molecular basis of life. Indeed, RNA may be the only molecule to have inspired the formation of a club, known as the RNA Tie Club, whose members included Nobel Laureates James Watson and Francis Crick, the discoverers of DNA structure, as well as Sydney Brenner, who was awarded the Nobel Prize in 2002 for his work involving gene regulation in the model organism Caenorhabditis elegans. The members of this club, each nicknamed for a particular amino acid, exchanged letters in which they presented various unpublished ideas in an attempt to understand the structure of RNA and how this molecule participates in the building of proteins. During the following 50 years, many questions were answered and many surprises were uncovered.

Early Discoveries of RNA Structure

Panel A of a two-panel schematic illustration shows a four base-pair-long region of single-stranded RNA. The RNA’s single sugar-phosphate backbone is represented as a single vertical, grey ribbon. The ribbon is transparent, revealing the structure and position of individual atoms. Panel B shows the primary and secondary structures of RNA. The primary structure is depicted as a horizontal blue rectangle containing a row of 46 white uppercase letters. Each letter represents a nitrogenous base, and are either: A (adenine), U (uracil), G (guanine), or C (cytosine). The left side of the rectangle is labeled as the five prime end, and the right side of the rectangle is labeled as the three prime end. The secondary structure of the RNA molecule is shown in an illustration below the primary structure. The horizontal rectangle has been folded in on itself to form two looped regions: one loop is larger than the other. A text box states that folding occurs due to hydrogen bonding between complementary bases on the same strand.

Figure 2

Today, researchers know that cells contain a variety of forms of RNA—including messenger RNA (mRNA), transfer RNA (tRNA), and ribosomal RNA (rRNA)—and each form is involved in different functions and activities. Messenger RNA is essentially a copy of a section of DNA and serves as a template for the manufacture of one or more proteins. Transfer RNA binds to both mRNA and amino acids (the building blocks of proteins) and brings the correct amino acids into the growing polypeptide chain during protein formation, based on the nucleotide sequence of the mRNA. The process by which proteins are built is called translation. Translation occurs on ribosomes, which are cellular organelles composed of protein and rRNA.

Although there are multiple types of RNA molecules, the basic structure of all RNA is similar. Each kind of RNA is a polymeric molecule made by stringing together individual ribonucleotides, always by adding the 5'-phosphate group of one nucleotide onto the 3'-hydroxyl group of the previous nucleotide. Like DNA, each RNA strand has the same basic structure, composed of nitrogenous bases covalently bound to a sugar-phosphate backbone (Figure 1). However, unlike DNA, RNA is usually a single-stranded molecule. Also, the sugar in RNA is ribose instead of deoxyribose (ribose contains one more hydroxyl group on the second carbon), which accounts for the molecule's name. RNA consists of four nitrogenous bases: adenine, cytosine, uracil, and guanine. Uracil is a pyrimidine that is structurally similar to the thymine, another pyrimidine that is found in DNA. Like thymine, uracil can base-pair with adenine (Figure 2).

Figure 3

Although RNA is a single-stranded molecule, researchers soon discovered that it can form double-stranded structures, which are important to its function. In 1956, Alexander Rich—an X-ray crystallographer and member of the RNA Tie Club—and David Davies, both working at the National Institutes of Health, discovered that single strands of RNA can "hybridize," sticking together to form a double-stranded molecule (Rich & Davies, 1956). Later, in 1960, the discovery that an RNA molecule and a DNA molecule could form a hybrid double helix was the first experimental demonstration of a way in which information could be transferred from DNA to RNA (Rich, 1960).

Single-stranded RNA can also form many secondary structures in which a single RNA molecule folds over and forms hairpin loops, stabilized by intramolecular hydrogen bonds between complementary bases. Such base-pairing of RNA is critical for many RNA functions, such as the ability of tRNA to bind to the correct sequence of mRNA during translation (Figure 3).

Robert Holley, a chemist at Cornell University, was the first researcher to work out the structure of tRNA (Holley et al., 1965). This molecule turned out to be the elusive structure that Francis Crick proposed in his so-called "adapter hypothesis" of 1955—a structure that carried amino acids and arranged them in a certain order that corresponded to the sequence in the nucleic acid strand. In 1968, Holley was awarded the Nobel Prize in Physiology or Medicine together with Gobind Khorana, at the University of Wisconsin, and Marshall Nirenberg, at the National Institutes of Health. Nirenberg and Khorana devised the key experiments to decipher the genetic code—in other words, which sequences of three nucleotides (codons) in an mRNA molecule would code for which amino acids.

mRNA and Splicing

A schematic diagram shows the transcription and translation processes in two basic steps. First, DNA is transcribed into RNA, and then the RNA is translated into a protein. DNA is represented at the top of the diagram by a grey rectangle. An arrow points from the grey rectangle to a purple rectangle, representing RNA, at the center of the diagram. A second arrow leads to the final step: translation, during which the RNA is used as a template to join amino acids to form a polypeptide chain. The polypeptide chain is depicted as a chain of dark pink circles. A curved arrow emanating from the rectangle representing RNA terminates at the same rectangle: a textbox beside the arrow explains that some viruses copy RNA directly from RNA.

Figure 4

Several forms of RNA play pivotal roles in gene expression—the process responsible for manifesting the instructions stored in the sequence of DNA nucleotides in either RNA or protein molecules that carry out the cell's activities (Figures 4 & 5). Messenger RNA (mRNA) is particularly important in this process. mRNA is primarily composed of coding sequences; that is, it carries the genetic information for the amino acid sequence of a protein to the ribosome, where that particular protein is synthesized. In addition, each mRNA molecule also contains noncoding, or untranslated, sequences that may carry instructions for how the mRNA is handled by the cell (Figure 6). For example, the untranslated region at the 5' end of the mRNA molecules found in bacteria and other prokaryotes contains what is called a Shine-Dalgarno sequence, which aids in the binding of the mRNA to ribosomes.

The location and function of eight classes of RNA shown in this four-column table. The RNA classes are listed in eight rows in the first column, the types of cells the RNA classes appear in are listed in the second column, the location where the RNA executes its function in eukaryotic cells is listed in the third column, and function of the RNA is listed in the fourth column.

Figure 5

Figure Detail

In contrast, the mRNA of eukaryotic organisms is prepared for translation through more complex mechanisms. For one, the addition of a guanine nucleotide with a methyl (CH₃) group to the 5' end of the mRNA, called the 5' cap, increases the stability of the mRNA and assists in the binding of the mRNA to the ribosome for translation. Meanwhile, another untranslated region is added to the 3' end of the mRNA, thereby further affecting the stability of the molecule. In this case, a "tail" consisting of anywhere from 50 to 250 adenine nucleotides is added to the 3' end. This poly(A) tail can increase the stability of many mRNA molecules, depending on the proteins that attach to it. The greater the stability, and the longer an mRNA molecule exists in a cell, the more protein that can be made from that molecule.

In eukaryotes (and to a lesser extent, prokaryotes), when RNA is first transcribed from DNA, it may contain additional noncoding sequences that are interspersed within the coding sequence. This immature RNA molecule is referred to as precursor mRNA (pre-mRNA) or heterogeneous nuclear RNA (hnRNA). The intervening noncoding sequences are called introns, and the segments of coding are known as material exons. The introns are then removed by a process known as RNA splicing to produce the mature mRNA molecule (Figure 7). An organelle called the spliceosome, composed of protein and small nuclear RNAs (snRNAs), is responsible for recognizing and removing the introns from pre-mRNA.

A schematic illustration shows an MRNA molecule that contains a protein-coding region flanked by two untranslated, or non-coding, regions. The MRNA is represented as a thin horizontal rectangle, and the protein-coding and non-coding sequences are represented by different colored rectangular regions along the MRNA. The five-prime untranslated region is represented as a mostly-grey rectangular region at the left end of the MRNA strand. One-quarter of the five-prime untranslated region is shaded blue, and labeled as the Shine-Dalgarno sequence (in prokaryotes only). The start codon is represented as a green, square-shaped region near the right end of the five-prime untranslated region. The protein-coding region, represented as an orange rectangular region, is adjacent to the start codon. To the right of the protein-coding region, a red square represents the stop codon. The three-prime untranslated region is to the right of the stop codon, and is represented as a grey rectangular region.

Figure 6

The surprising discovery of RNA splicing caused a paradigm shift in genetics. Much early work indicated that mRNA and the genes in DNA were colinear; that is, they were thought to match up, base for base, with the exception of the 3' poly(A) tail. In the late 1970s, however, seminal studies of gene expression in cells infected with an adenovirus demonstrated that the RNA transcripts produced by viral infection contained sequences that were not next to one another in the viral genome. Further study revealed that these mRNAs were produced after material had been removed or spliced out of a larger primary transcript (Berget et al., 1977; Evans et al., 1977). Since that time, introns have been found to occur in many eukaryotic cellular genes and some prokaryotic genes.

Probably the most thoroughly studied class of introns consists of those found in protein-coding genes. The 5' end of these introns almost always begins with the dinucleotide GU, and the 3' end typically contains AG. Changing one of these nucleotides precludes splicing. Another important sequence occurs at the branch point, anywhere from 18 to 40 nucleotides upstream from the 3' end of an intron. This sequence always contains an adenine, but it is otherwise loosely conserved. A typical sequence at a branch point is YNYYRAY, where Y indicates a pyrimidine, N denotes any nucleotide, R any purine, and A is for adenine (Figure 8) (Pierce, 2000; Patel & Steitz, 2003).

Many eukaryotic genes can be spliced in a number of different ways by choosing between different potential 5′ and 3′ splice junctions, thereby creating different combinations of exons and introns in the final mRNAs. This mix-and-match process allows the creation of several different proteins from a single gene sequence. The first example of such "alternative splicing" (Figure 9) was discovered in the adenovirus in 1977 (Berget et al., 1977). The first example in cellular genes was reported in 1980 in the IgM gene, which encodes an immunoglobulin, one of several proteins created by immune cells to fight infection by foreign organisms and particles (Early et al., 1980).

The Dscam gene of Drosophila, which encodes proteins involved in guiding embryonic nerves to their target destinations during formation of the fly's nervous system, exhibits an especially impressive number of alternative splicing patterns. Dozens of different forms of Dscam mRNAs and corresponding proteins have been identified, while analysis of the gene's sequence reveals a staggering 38,000 potential additional mRNAs, based on the large number of introns found. The ability to produce so many different proteins from a single gene may be necessary for forming as complex a structure as the nervous system (Schmucker et al., 2000). In general, the existence of multiple mRNA transcripts from single genes may account for the complexity of some organisms, such as humans, even though these organisms have relatively few genes (in the case of humans, approximately 25,000).

A schematic illustration shows the removal of introns during the transcription of two genes: ovalbumin and cytochrome b. The genes are each depicted as a region of DNA, represented as a horizontal rectangle. Introns, or non-coding regions, along the DNA molecules are shaded grey; exons, or coding sequences, are represented as blue rectangular regions. The horizontal rectangle representing the ovalbumin gene is predominately grey, but contains eight shaded blue regions (exons). The exons are labeled one through eight, from left to right, along the gene. The rectangle representing the cytochrome b gene is also predominately grey; it contains five exons. After transcription, introns are removed from the immature MRNA product during a process called RNA splicing. The mature ovalbumin MRNA molecule is depicted as a blue horizontal rectangle, flanked on either side by a region of grey. The blue rectangle is a composite of the eight blue exons labeled along the original DNA molecule. Likewise, the mature cytochrome b MRNA molecule is depicted as a blue horizontal rectangle: a composite of the five blue exons in the original DNA molecule.

Figure 7: Introns are removed during RNA splicing.

Non-coding sequences, or introns, are removed during RNA splicing to produce a mature mRNA transcript composed of exons (coding sequences).

tRNA and rRNA: Their Role in Translation

A schematic illustration shows a precursor MRNA molecule that contains an intron flanked by two exons. The pre-MRNA is represented as a thin horizontal rectangle, and the intron and two exons are represented by different colored rectangular regions along the pre-MRNA. The intron is represented as a grey rectangular region at the center of the pre-MRNA transcript. A row of seven horizontal, capital letters is labeled across the center of the intron, representing a nucleotide sequence at the intron branching site. From left to right, the letters are YNYYRAY. A red rectangular region to the left of the intron is exon number one; a red rectangular region to the right of the intron is exon number two. An arrow indicates the five-prime splice site is located between the intron and exon one. A second arrow indicates the three-prime splice site is located between the intron and exon two.

Figure 8

Two additional categories of RNA play a critical role in the translation process: tRNA and rRNA. Ribosomal RNA (rRNA) molecules were initially characterized by how rapidly they would "sink" in a centrifuge tube—in other words, they were described by their sedimentation velocity as measured in Svedberg (S) units. Prokaryotic organisms contain one type of rRNA gene that encodes three distinct RNA species: the 23S, 5S, and 16S rRNAs. In comparison, eukaryotic cells contain two types of rRNA genes that give rise to four rRNA species: the 28S, 5.8S, 5S, and 18S rRNAs. Both the eukaryotic and prokaryotic genomes contain multiple copies of these rRNA genes to be able to manufacture the large number of ribosomes required by a cell. Mature rRNAs are produced by cleavage and modification of initial transcripts (Pierce, 2000).

Figure 9

Transfer RNA (tRNA) molecules serve as molecular adaptors that bind to mRNA on one end and carry amino acids into position on the other. Most types of cells possess approximately 30 to 40 different tRNAs, with more than one tRNA corresponding to each amino acid. tRNAs fold into a cloverleaf structure held together by the pairing of complementary nucleotides. Structural studies using X-ray crystallography have demonstrated that the cloverleaf is further folded into an L shape (Figure 10). A loop at one end of the folded structure base-pairs with three nucleotides on the mRNA that are collectively called a codon; the complementary three nucleotides on the tRNA are called the anticodon.

A schematic diagram shows three different structural models of a TRNA molecule. In the three-dimensional, space-filling molecular model on the left, red, blue, and grey spheres represent individual atoms. A three-dimensional ribbon model in the middle emphasizes the hydrogen bonds that occur between paired bases. An anticodon region is shaded in light blue, and an amino acid attachment site is shaded in green. A flattened cloverleaf model at right shows a two-dimensional perspective. The TRNA molecule looks like an elongated cylinder folded into a flat T-shape. Three nucleotides at the bottom of the T-shape represent the anticodon sequence. Nucleotides folded into a loop on the left-hand arm of the T form the DHU arm. Nucleotides folded into a loop on the right-hand arm of the T form the T-mm-C arm. An extra arm is depicted as a bulge below the TmC arm. An acceptor stem, represented as the upwards-facing stem, and top of the T-shape, is composed of seven nucleotide base-pairs, organized in seven rows from the stem’s base. One RNA strand is composed of an additional three nucleotides, which, for lack of complementary nucleotides on the shorter, opposite strand, do not form base-pairs. The nucleotides, from bottom to top, are cytosine (C), cytosine (C), and adenine (A). The CCA nucleotide sequence on the acceptor stem is the amino acid attachment site.

Figure 10

Although the pairing between codon and anticodon takes place over three nucleotides, strict complementary base-pairing is only necessary between the first two nucleotides. The third position is referred to as the "wobble" position (Figure 11), and the rules for base-pairing are less stringent at this position. Because of this flexibility, the 30 to 40 tRNAs present in a cell can "read" all 61 codons in mRNA.

The opposite end of the folded structure, which is the 3' end of the tRNA, binds to its corresponding amino acid at an attachment site that is also three nucleotides long, invariably CCA. Enzymes called aminoacyl-tRNA synthetases attach the correct amino acid to each tRNA, based on the three-dimensional structure of the tRNA molecule.

More and More RNAs

Finally, there are still more forms of RNA beyond mRNA, rRNA, and tRNA. For instance, short RNAs are not only part of organelles like ribosomes and spliceosomes, but also of some enzymes. For example, the enzyme telomerase, which adds nucleotides to the ends of chromosomes, is composed of a 451-nucleotide RNA and several proteins. Juli Feigon at the University of California, Los Angeles, together with postdoctoral scholar Carla Theimer and graduate student Craig Blois, first solved the structure of an essential piece of this RNA by nuclear magnetic resonance spectroscopy (Theimer et al., 2005). They revealed a unique RNA structure with extensive RNA folding, which is necessary for telomerase activity.

Other classes of RNA species include microRNAs, small interfering RNAs, and sRNAs—all of which are not translated into proteins but still perform important functions in the cell. The discovery of these RNAs has been one of the most exciting advances in recent years, and there is currently a lot of interest in the use of these molecules as possible therapies. But as far as their structure is concerned, these RNAs all share the same basic single-stranded chemical structure with, in some cases, higher-order structures obtained through complementary base-pair folding.

From the RNA Tie Club to today, the more scientists have studied RNA, the more surprises they have uncovered. New functions for RNA, new modifications to RNA, and other surprises undoubtedly await discovery in the years to come.

A schematic shows two TRNA molecules bound to complementary sequences on a strand of MRNA. The sugar-phosphate backbone of the mRNA is depicted as a horizontal grey rectangle. Nitrogenous bases are attached to the sugar-phosphate backbone and are represented as blue, orange, yellow, or green vertical rectangles. Two red tRNA molecules, each with an anticodon of three nucleotides, are attached to a complementary codon sequence on the mRNA strand. The TRNA molecules each look like a thin red tube looped into a T-shape. Three nucleotides within the TRNA sequence are shown at the bottom of the T-shape. These nucleotides represent the anticodon sequence. The anticodon sequence, from left to right, of both TRNA molecules is AGG. A textbox explains that pairing at the third codon position is relaxed: the nucleotide G on the TRNA anticodon can pair with the nucleotides C or U on the MRNA codon. The TRNA molecule at left is bound to the MRNA codon UCC; the TRNA molecule at right is bound to the MRNA codon UCU. Thus, the G in one TRNA molecule's anticodon is bound to C, while the G in the other TRNA molecule’s anticodon is bound to U.

Figure 11: The "wobble" position.

Base-pairing rules between the tRNA anticodon and the mRNA codon are less stringent at the third nucleotide position. This base-pairing flexibility is also called "wobble."

References and Recommended Reading

Berget, S. M., Moore, C., & Sharp, P. A. Spliced segments at the 5' terminus of adenovirus 2 late mRNA. Proceedings of the National Academy of Sciences 74, 3171–3175 (1977)

Early, P., et al. Two mRNAs can be produced from a single immunoglobulin u chain by alternative RNS processing pathways. Cell 20, 313–319 (1980)

Evans, R. M., et al. The initiation sites for RNA transcription in Ad2 DNA. Cell 12, 733–739 (1977)

Holley, R. W., et al. Structure of a ribonucleic acid. Science 147, 1462–1465 (1965) doi:10.1126/science.147.3664.1462

Patel, A. A., & Steitz, J. A. Splicing double: Insights from the second spliceosome. Nature 4, 960–970 (2003) doi:10.1038/nrm1259 (link to article)

Pierce, B. A. Genetics: A Conceptual Approach, 2nd ed. (New York, Freeman, 2000)

Rich, A. A hybrid helix containing both deoxyribose and ribose polynucleotides and its relation to the transfer of information between the nucleic acids. Proceedings of the National Academy of Sciences 46, 1044–1053 (1960)

Rich, A., & Davies, D. R. A new two-stranded helical structure: Polyadenylic acid and polyuridylic acid. Journal of the American Chemical Society 78, 3548–3549 (1956) (link to article)

Schmucker, D., et al. Drosophila Dscam is an axon guidance receptor exhibiting extraordinary molecular diversity. Cell 101, 671–684 (2000)

Theimer, C. A., Blois, C. A., & Feigon, J. Structure of the human telomerase RNA pseudoknot reveals conserved tertiary interactions essential for function. Molecular Cell 17, 671–682 (2005)