Introduction

RNA is a single-stranded molecule consisting of four nucleotides: adenosine (A), guanosine (G), cytidine (C) and uridine (U). RNA, both an important and conserved macromolecule, not only participates in the flow of genetic information but also regulates gene expression. Beyond sequence information, chemical modifications add to the complexity of RNA, emerging as a new layer of gene expression regulation. Since the first chemical modification was characterized 60 years ago, more than 170 RNA modifications have been characterized1. Most of the RNA modifications have been identified in abundant non-coding RNAs, including ribosomal RNA (rRNA), transfer RNA (tRNA), and small nuclear RNA (snRNA).

The development of detection technologies advances the investigation of the functional roles of RNA modifications2,3,4,5,6,7. To date, more than ten chemical modifications have been mapped in a transcriptome-wide manner, including N6-methyladenosine (m6A), N6, 2′-O-dimethyladenosine (m6Am), 5-methylcytosine (m5C), 5-hydroxymethylcytosine (hm5C), inosine (I), pseudouridine (Ψ), N1-methyladenosine (m1A), 2ʹ-O-methylation (Nm), N4-acetylcytidine (ac4C), N7-methylguanosine (m7G) and dihydrouridine (D). Different chemical modifications play distinct regulatory roles in RNA metabolism and function. For instance, m6A, the most abundant internal messenger RNA (mRNA) modification, influences RNA metabolism in multiple ways, including stability, splicing, translation, localization and RNA secondary structure8,9,10,11. m5C in mRNA influences mRNA export, RNA stability, and translation, and m5C in tRNA is essential for maintaining structural stability and translational fidelity12,13,14,15,16,17,18,19. Inosine preferentially exists in double-stranded RNA (dsRNA) regions and affects codon recoding, splice-site choice, microRNA (miRNA) biogenesis and targeting efficiency20,21,22. Ψ is required for proper rRNA folding, tRNA structure stabilization and snRNP (small nuclear ribonucleoprotein) biogenesis23,24,25,26,27,28,29,30,31,32. In addition, introducing Ψ into mRNA increases protein production and alters translation33,34,35. m1A at position 58 in tRNA is conserved and vital for stabilizing tRNA tertiary structure, and m1A in mRNA influences translation36,37,38,39,40,41,42,43,44,45,46,47. Nm is essential for accurate and efficient protein synthesis48,49,50,51,52. Internal m7G increases mRNA translation efficiency and augments miRNA biogenesis53,54. ac4C in mRNA promotes translation, and ac4C in rRNA can affect rRNA biogenesis55,56,57,58,59.

RNA modification detection technologies provide not only resources for a comprehensive understanding of the epitranscriptome but also tools for functional studies. Hence, in this review, we will summarize the current knowledge about these existing RNA modification detection technologies and discuss the challenges for these existing detection tools. The detection technologies are categorized according to the detection throughput and principles into four classes: quantification methods, locus-specific detection methods, next-generation sequencing-based detection technologies and nanopore direct RNA sequencing-based technologies.

RNA modification quantification methods

The identification and quantification of new modified nucleotides requires powerful RNA modification quantification methods. Based on the principle that modified nucleotides possess distinct chemical properties from the originals, several RNA modification quantification methods have been established, including two-dimensional thin-layer chromatography (2D-TLC), dot blot, and liquid chromatography–mass spectrometry (LC–MS). These approaches can be used to quantify the modification abundance in specific RNA species and require highly purified RNA due to the lack of sequence information.

2D-TLC

2D-TLC is a widely used RNA modification detection method according to the distinct mobilities of different nucleotides in the solvent60,61,62. In detail, isolated RNA is first partially digested into oligonucleotides using RNase A, T1, or T2 and then labeled with 32P using T4 polynucleotide kinase (T4 PNK). Finally, 5ʹ-32P-NMP is acquired by nuclease P1 digestion and further separated by 2D-TLC. The nucleotides can be determined by assignment to the standards by comparing their retardation factor (Rf) values (Fig. 1). Quantification of the nucleotides is achieved by measuring the radioactivity of the corresponding spots in the TLC plate63,64,65. This approach is very sensitive and requires only a small amount of RNA (from 50 ng to 200 ng). Consequently, this approach can be applied to both abundant non-coding RNAs (rRNA and tRNA) and less abundant mRNA. In addition, it can also detect RNA modifications in specific tRNA or snRNA sequences, which can be isolated by gel purification or hybridization methods. Furthermore, this approach does not require expensive instruments and thus can be run inexpensively. However, this method also has certain drawbacks, including the requirement of a radioactive reagent, the bias caused by RNase digestion and discrepant 32P labeling efficiency for the modified nucleotides63,66.

Fig. 1: Schemes of RNA modification quantification methods.
figure 1

Technical flowchart of dot blot, 2D-TLC and LC–MS.

Dot blot

Dot blot assay is established and applied to detect and quantify modification levels in RNAs using a specific antibody. In detail, isolated RNAs are stalled to the polyvinylidene fluoride (PVDF) or nitrocellulose membrane directly without electrophoretic size separation. A specific antibody for target modification is incubated with the membrane, followed by secondary antibody hybridization and subsequent signal trapping (Fig. 1). Comparing results from different experimental groups can obtain semiquantitative modification level information. This approach has been widely applied to various RNA species, including non-coding RNAs and mRNAs. Given the detection principle, the sensitivity and accuracy of this approach are highly dependent on the specificity of the antibody. In addition, the amount of starting material, ranging from nanograms to micrograms, also depends on the abundance of the RNA modifications of interest. Since the workflow is straightforward and can be performed inexpensively, this approach has been widely applied for RNA modification detection67,68. However, the lack of absolute quantification and locus information limits a wider application of this approach.

HPLC and LC–MS

High-performance liquid chromatography (HPLC) is an advanced column chromatography technology to separate nucleosides according to their distinct polarities. Prior to HPLC analysis, RNA or oligonucleotides are digested and dephosphorylated to single nucleosides by nuclease P1 and alkaline phosphatase. The UV absorbance and retention time of the nucleosides are recorded by a UV detector, which can be used to identify and measure the abundance of the modified nucleosides (Fig. 1). Compared with 2D-TLC, HPLC analysis is rapid and free of radiolabeling. However, this approach can only be applied to detect highly abundant modifications in abundant RNA species, such as rRNA, tRNA and synthetic RNAs, and requires a large amount of purified RNA (more than 1 µg) due to the detection limit of the UV detector69.

To increase detection sensitivity, HPLC is coupled with mass spectrometry70. Similarly, RNA or oligonucleotides are completely digested to nucleosides and separated by reverse column chromatography. Then, these nucleosides are ionized and further fragmented into specific product ions via mass spectrometry. Integration of retention time, mass-to-charge ratio (m/z) and product ion are capable of determining a certain nucleoside (Fig. 1). In addition, quantification of nucleosides can be achieved through the external standard curve in the same batch60,71. The extremely high sensitivity of triple quadrupole-based mass spectrometry provides this approach a detection limit that can reach the low femtomolar range, and the amount of starting material can be as low as 50 ng, thereby allowing the determination and quantification of these low-abundance modifications in mRNA and low-abundance ncRNA72,73. Hence, LC–MS has been a benchmark for RNA modification detection and quantification. However, the limit of this technique is the requirement of instruments including HPLC and mass spectrometry. In addition, due to the lack of sequence information, when detecting and quantifying RNA modifications in mRNA or other less abundant RNA types, caution must be taken to reduce contamination from abundant and highly modified rRNA and tRNA.

Locus-specific detection methods

The precise position information of a certain RNA modification is important for functional studies. To date, several locus-specific RNA modification detection methods have been developed. These methods can be categorized according to the detection principles into four classes: (1) Primer extension; (2) RNase H-based approaches; (3) electrospray ionization-mass spectrometry (ESI-MS)-based approaches; and (4) semiquantitative PCR- or qPCR-based approaches.

Primer extension

This approach is based on reverse transcription and has been used extensively to detect and localize various RNA modifications, including m1A, Ψ, and m1G. In principle, a 5ʹ-labeled specific RT primer is hybridized with the RNA of interest and extended by reverse transcriptase. Hence, this approach relies on prior knowledge of the modification type and sequence information of the target RNA. Without modifications, reverse transcriptase can reach the 5ʹ end of RNA and generate full-length cDNA. When encountering modified nucleotides, the extension of reverse transcriptase is blocked immediately upstream of the modified site. Then, the RT products are separated using denaturing polyacrylamide gels, and the terminal position of truncated cDNA indicates where the modification occurred (Fig. 2a). This approach is very sensitive and can be applied to various RNA species, including rRNA, tRNA, snRNA and abundant mRNA42,43,74,75,76,77,78,79,80. In addition, owing to the hybridization step, this approach possesses high detection specificity and does not require purified homogenous RNA as the starting material. A major limitation of this method is that it can only detect the modifications or their chemical adducts that are capable of blocking reverse transcription; it is not suitable for RT-silent modifications such as m6A and m5C81,82,83.

Fig. 2: Schemes of locus-specific RNA modification detection methods.
figure 2

a Technical flowchart of primer extension. The truncated cDNA product indicates where the modification occurs. b Technical flowchart of SCARLET, an RNase H-based approach; the modified and unmodified form of the nucleotide of interest can be accurately quantified by TLC; TAP, thermosensitive alkaline phosphatase. c Technical flowchart of RTL-P, a semiquantitative PCR-based approach; Nm impedes RT under low-dNTP conditions and allows readthrough under high-dNTP conditions, and the Nm status can thus be determined by comparing the intensity of the longer PCR product with that of the shorter product. d Technical flowchart of the SELECT, qPCR-based approach; this approach utilizes the feature that m6A hinders the elongation of Bst and reduces the ligation efficiency of SplintR to distinguish modified and unmodified RNA. e Technical flowchart of the qPCR-based Ψ detection approach; this approach utilizes the feature that the Ψ-CMC adducts induce mutation/deletion in cDNA, which alters the melting curves of the qPCR products. f Technical flowchart of the ESI-MS-based approach. Modifications that can be detected by the corresponding approaches are listed below.

RNase H-based approach

The RNase H-based approach is independent of reverse transcription and thus can detect and quantify RNA modifications that do not affect Watson-Crick base pairing. In this method, purified RNA is cleaved into two halves at the 5ʹ end of the nucleotide of interest by RNase H, and cleavage specificity is achieved by annealing with a specific 2′-O-methyl RNA–DNA chimera oligonucleotide. Then, the 3ʹ-half of the RNA is purified, and its 5′ terminus is further labeled with 32P. Furthermore, the oligonucleotides are completely digested into single nucleotides and subsequently resolved by TLC. Owing to the radiolabeling specificity, only the modified and unmodified forms of the nucleotide of interest can be detected and accurately quantified by TLC analysis64,84. This approach is sensitive and powerful for quantifying the stoichiometry of modified nucleotides. However, as the specific 2′-O-methyl RNA–DNA chimera oligonucleotide is base-paired with the target RNA sequence to guide RNase H cleavage, sequence information is needed, and modifications that affect Watson-Crick base pairing cannot be detected and quantified by this method. In addition, to reduce false signals from other RNA sequences, homogenously purified RNA, isolated by the hybridization method and gel purification, is used as the starting material. Therefore, this method requires a large amount of RNA and can be applied to abundant non-coding RNAs, such as rRNA and snRNA64,84.

To quantitatively detect modifications in mRNA/lncRNA, a new method, named SCARLET (site-specific cleavage and radioactive labeling followed by ligation-assisted extraction and thin-layer chromatography), has been developed85. In SCARLET, a 116-nucleotide single-stranded DNA oligonucleotide is splint-ligated to the 32P-labeled RNA fragment of interest and then subjected to RNase T1/A digestion. The splint ligation enables RNA mixtures to be used as the starting material rather than a specific purified RNA, and modifications in mRNA and lncRNA can thus also be quantified. The accurate size of the DNA oligonucleotide is recovered by gel purification. Then, the 32P-labeled nucleotides are released by nuclease P1 digestion and further analyzed by TLC (Fig. 2b). This approach has been applied to quantify m6A and Ψ modification status in mRNA85,86. Since this approach avoids reverse transcription, it can be used as an orthogonal method to validate modification sites identified by transcriptome-wide methods that are dependent on reverse transcription. As with other RNase H-based methods, SCARLET also requires prior knowledge of the sequence information to target a single specific nucleotide and cannot quantify the modification status in a de novo manner. In addition, the requirement of radioactive reagents and complicated procedures limit wider application of this method.

Semiquantitative PCR or qPCR-based approach

Similar to primer extension assays, semiquantitative PCR or qPCR-based approaches are also based on the fact that modified nucleotides impede reverse transcriptase extension. In contrast to primer extension and RNase H-based approaches, qPCR-based approaches are free of radiolabeling and hence are time-saving and easy to perform in the laboratory. To date, several semiquantitative PCR- or qPCR-based approaches have been developed and have successfully detected Nm, Ψ and m6A in diverse RNA species (Fig. 2c–e)87,88,89,90,91,92,93. In addition, owing to the high sensitivity and specificity of PCR and qPCR, these approaches do not require purified RNA as the starting material and can be applied to various RNA species, including less abundant mRNA and lncRNA.

Nm blocks reverse transcriptase extension at a low dNTP concentration and allows readthrough at a high dNTP concentration82,94. Taking advantage of this property, researchers have developed a method referred to as RTL-P (reverse transcription at low dNTP concentrations followed by PCR). In this method, the RNA of interest is first reverse-transcribed at both low and high dNTP concentrations. The truncated and full-length cDNA are amplified by specific primers. The PCR products are analyzed by gel electrophoresis, and the Nm status can be determined by comparing the intensity of the longer PCR product with that of the shorter product (Fig. 2c)87. In addition to altering RT conditions, RT enzymes can also be engineered to facilitate modified nucleotide detection. For instance, an engineered thermostable KlenTaq DNA polymerase variant possesses reverse transcription activity and can discriminate Nm at normal dNTP concentrations. The combination of this engineered DNA polymerase with qPCR has achieved expeditious quantification of Nm88.

In addition to the engineered thermostable KlenTaq DNA polymerase variant, two other DNA polymerases, Tth and Bst, also have reverse transcriptase activity and exhibit distinct capacities to extend when encountering m6A residues versus A residues, which allows locus-specific detection of m6A89,90,91. Moreover, to increase the detection sensitivity of m6A, another method referred to as the single-base elongation- and ligation-based PCR amplification method (SELECT) has been developed92,93. Because m6A can both hinder the elongation activity of DNA polymerases and reduce the nick ligation efficiency of SplintR ligase, in SELECT, cDNA products formed from m6A-containing RNA templates are dramatically reduced, hence significantly increasing detection robustness. Based on qPCR, SELECT is a powerful tool to quantify the m6A fraction in a linear manner (Fig. 2d).

In addition to truncations, the property of induced mutation/deletions by modified nucleotides can also be utilized for locus-specific detection. For instance, Ψ can be selectively labeled by N-cyclohexyl-N′-(2-morpholinoethyl) carbodiimide (CMC), and the Ψ-CMC adducts interfere with Watson-Crick base pairing during reverse transcription. Under optimized RT conditions, Ψ–CMC adducts can be read through and induce mutation/deletion in cDNA. Such mutation/deletions alter the melting curves of the qPCR products, thus enabling the locus-specific detection of Ψ modification (Fig. 2e)95.

ESI-MS-based approach

In contrast to the LC–MS-based nucleoside quantification strategy described above, in this approach, isolated RNA samples are first digested into 5–15 nucleotide fragments by selective endoribonucleases, such as RNase T1, RNase A, and RNase U2. Then, the oligonucleotides are separated by HPLC, and the sequence ladders from the oligonucleotides are further generated through ESI-MS, which can be used for sequence reconstruction and modification identification (Fig. 2f). Hence, this approach can provide both site and stoichiometry information for the modifications of interest70,96,97,98,99. Compared with other locus-specific methods, the ESI-MS-based approach does not rely on prior knowledge of sequence information and thus is able to detect and quantify RNA modification in a de novo manner. Therefore, this approach has been widely applied for modification detection in abundant RNAs, including tRNA, rRNA and snRNA100,101,102,103,104,105,106,107,108,109. Recently, this approach has been used to detect modifications in miRNA and cap modifications in mRNA53,110. The major limitation of this approach is the requirement for highly sensitive ESI-MS. In addition, given the detection principle, the requirement of starting material is large; thus, this method is only suitable for abundant RNA species.

Next-generation sequencing-based detection technologies

With the benefit of advances in next-generation sequencing, an increasing number of RNA modification sequencing technologies have been developed. Such technologies represent powerful tools to map modified nucleotides in a transcriptome-wide manner and promote the elucidation of the regulatory roles of RNA modifications. Chemical modifications alter the inherent features of the original nucleotides, including base-pairing performance in reverse transcription, chemical reaction activities, enzymatic reaction activities and binding affinities with specific proteins or antibodies. Therefore, coupled with next-generation sequencing, these properties of modified nucleotides can be used to characterize RNA modifications throughout the transcriptome. The existing sequencing technologies can be categorized according to the detection principles into four classes: (1) direct sequencing technology, (2) chemical-assisted sequencing technology, (3) antibody-based sequencing technology, and (4) enzyme/protein-assisted sequencing technology (Table 1).

Table 1 Features of next-generation sequencing-based detection technologies.

Direct sequencing technology

For some modified nucleotides, the existence of chemical modifications alters canonical Watson-Crick base pairing during reverse transcription and further leads to truncation or misincorporation in cDNA synthesis. Therefore, this feature can be used to map modified nucleotides throughout the transcriptome.

For instance, in contrast to adenosine, inosine pairs with cytidine in reverse transcription, and thus, the A-to-I editing position can be identified by comparing the genomic DNA and RNA sequencing data and detecting A-to-G mismatch sites (Fig. 3a)111,112,113,114. Although this approach is widely applied to A-to-I editing site detection, caution is still needed to reduce the false positives introduced by SNPs, somatic mutations, pseudogenes and sequencing errors22.

Fig. 3: Schemes of direct sequencing technologies.
figure 3

a Detection of A-to-I editing sites by detecting A-to-G mismatches between genomic DNA and RNA sequencing (RNA-seq) data from the same individuals. b Detection of modified nucleotides by utilizing demethylase treatment, altered RT conditions, engineered reverse transcriptase, or modified dNTPs to generate distinct RT signatures at modified sites between treated and untreated samples. Modifications that can be detected by the corresponding approaches are listed below.

Aside from inosine, several other chemical modifications at the Watson-Crick face of the nucleobase can also induce RT stops or mutations in cDNA synthesis, thereby allowing base-resolution detection115. In contrast to inosine, the reverse transcription signatures of these modifications are not constant and dependent on both the surrounding sequences and the RT conditions116. Furthermore, to increase the detection accuracy, several improvements have been made. For instance, the E. coli AlkB demethylase and its mutant treatment are introduced to remove m1A, m3C, m1G, N2,N2-dimethyl-guanosine (m22G), and 3-methyluridine (m3U) in RNA prior to cDNA synthesis, and high-confidence methylation sites in the tRNA transcriptome can thus be identified by comparing the data for parallel sequenced, demethylase-treated and untreated samples (Fig. 3b)117,118,119,120. Moreover, the RT signatures vary substantially under different reverse transcription conditions42,121. Hence, to improve detection sensitivity, an HIV-1 reverse transcriptase against m1A was developed that allows m1A detection in human mRNA122. Moreover, RT systems, including enzymes and reaction conditions, can be optimized to detect modified nucleotides that do not interfere with Watson-Crick base pairing. For instance, Nm, an RT-silent modification, can block cDNA synthesis when the concentration of dNTPs or Mg2+ in RT reactions is limited. Coupling this feature with next-generation sequencing, two methods, 2OMe-seq and MeTH-seq, have achieved a transcriptomic profile of Nm at base resolution (Fig. 3b)123,124. As mentioned above, some DNA polymerases also possess reverse transcriptase activity and show differential elongation ability when encountering m6A residues versus A residues. To advance this feature, the KlenTaq DNA polymerase was evolved and the evolved variant exhibited significantly increased error rates opposite m6A but not unmodified A, enabling direct identification of m6A by analyzing the mutational signal from sequencing data (Fig. 3b)125. In addition, substitution of 4SedTTP (atom-specific replacement of oxygen with selenium at the 4-position) for dTTP under RT conditions can also facilitate m6A detection. In principle, compared with 4SeT-A, 4SeT-m6A pairing is unfavorable and results in aborted cDNA synthesis opposite m6A sites, thereby allowing for the discrimination of m6A from A. Furthermore, with the assistance of the m6A demethylase FTO through high-throughput sequencing, m6A can be precisely identified within the mammalian transcriptome at single-nucleotide resolution (Fig. 3b)126.

Chemical-assisted sequencing technologies

Chemical treatments are widely exploited to discriminate modified nucleotides from unmodified nucleotides in three ways: (1) installing biotin tags to enrich modified transcripts; (2) altering the base-pairing features to induce misincorporation or truncation in reverse transcription; and (3) chemical-induced cleavage followed by specific adaptor ligation (Fig. 4).

Fig. 4: Schemes of chemical-assisted sequencing technologies.
figure 4

a Schematic diagrams and sequencing features of different technologies. Corresponding technologies are listed below. b Flowchart of m6A-label-seq.

The enrichment strategy facilities RNA modification detection in low-abundance RNA species and low-stoichiometry modifications. Some modified nucleotides can be labeled by specific chemical reactions and further conjugated with the biotin molecules, thus enabling streptavidin enrichment. For instance, in borohydride reduction sequencing (BoRed-seq), total RNA is treated with NaBH4 and subsequently exposed to low pH to generate abasic sites at m7G positions, which can be tagged with biotin molecules for further streptavidin pulldown (Fig. 4a)53. Direct m6A labeling is difficult due to the inert reactivity of the methyl group. To solve this challenge, researchers developed a method named m6A-SEAL-Seq (an FTO-assisted m6A selective chemical labeling method). In this method, m6A is first oxidized to hm6A by the demethylase FTO, and hm6A is further converted to N6-dithiolsitolmethyladenosine (dm6A) through DTT-mediated thiol addition. The free sulfhydryl group of dm6A enables biotin to be conjugated to the m6A-modified transcripts, thereby facilitating streptavidin enrichment and sequencing (Fig. 4a)127.

The detection strategy of altering the base-pairing properties of nucleotides by chemical treatment has been widely applied to the transcriptome-wide detection of various modifications, including inosine, m5C, Ψ, m7G, ac4C, D and m6A. For inosine detection, the direct sequencing approach is disturbed by background noise. To overcome this limitation, a selective chemical labeling reaction with acrylonitrile is adopted, and the formed N1-cyanoethylinosine (ce1I) blocks reverse transcription, resulting in truncation of cDNA. Coupling this chemical reaction with sequencing, referred to as inosine chemical erasing sequencing (ICE-seq), can achieve base-resolution inosine detection throughout the transcriptome (Fig. 4a)128. Bisulfite treatment selectively converts unmethylated cytidine into uridine, leading to a C-to-T transition in the sequencing data; thus, transcriptome-wide m5C can be characterized by detecting nonconverted Cs in the sequencing data (Fig. 4a)129,130. Furthermore, to reduce false positives caused by insufficient conversion, several improvements have been exploited, including optimizing bisulfite treatment conditions to increase deamination efficiency131,132, using ACT random hexamers devoid of Gs to avoid copying inefficiently deaminated RNA templates16, and developing robust computational pipelines to accurately identify m5C sites132,133. Ψ can be labeled by CMC at the Watson-Crick face, and the CMC-Ψ adducts stall reverse transcription, thus inducing truncations in cDNA synthesis. Combining this chemical reaction with next-generation sequencing, researchers developed Ψ-Seq, Pseudo-seq and PSI-seq, achieving base-resolution pseudouridylation detection in yeast and mammalian transcriptomes134,135,136. To improve the robustness of Ψ detection, the chemical reaction is adapted by using a synthesized CMC derivative, azido-CMC (N3-CMC), instead of CMC. The presence of an azido group enables conjugation with biotin through a click reaction. Hence, the Ψ-containing RNA can be pre-enriched before sequencing, and this method is named CeU-seq86. The pre-enrichment step enables CeU-seq to identify thousands of Ψ sites in the mammalian transcriptome (Fig. 4a). In addition to CMC labeling, a recent work showed that Ψ can form a stable monobisulfite adduct upon bisulfite treatment and further leave a deletion signature at the exact modified sites, thereby providing an orthogonal Ψ detection strategy (Fig. 4a)131. As described above, m7G residues can be converted to abasic sites upon NaBH4 treatment and further recorded as misincorporations through reverse transcription and sequencing. Accordingly, the m7G Mutational Profiling sequencing (m7G-MaP-seq) can map internal m7G modifications at nucleotide resolution in tRNA and rRNA137. To achieve m7G detection in mRNA, m7G-seq conjugates a biotin molecule to the generated abasic sites, thus enabling pre-enrichment of m7G-containing RNA. Moreover, the biotinylated sites induce misincorporations during cDNA synthesis; therefore, transcriptome-wide base-resolution m7G mapping can be achieved (Fig. 4a)54. In addition, D can also be reduced by NaBH4 treatment, and the reduction product can be further labeled by Rho (rhodamine). Furthermore, the Rho-adducts block reverse transcriptase elongation and thus can be identified by analyzing induced RT-stops in sequencing data (Fig. 4a)138. ac4C can react with NaCNBH3 under acidic conditions, and the formed reduced nucleobase, N4-acetyltetrahydrocytidine, causes misincorporation during cDNA synthesis. Taking advantage of this reaction, ac4C-seq can map ac4C at single-nucleotide resolution (Fig. 4a)139. In addition to altering the structure of nucleotides by chemical reactions, the chemical group can also be introduced through metabolic labeling. For instance, when feeding cells with an S-adenosyl methionine (SAM) analog, Se-allyl-L-selenohomocysteine, the cellular RNAs could be modified with N6-allyladenosine (a6A) at supposed m6A-generating sites. Furthermore, a6A-containing RNAs can be enriched by a specific antibody, and a6A sites are converted to N1,N6-cyclized adenosine (cyc-A) through the iodination-induced cyclization reaction. As cyc-A induces misincorporations in cDNA synthesis, m6A can be mapped at base resolution by detecting mutation signals in sequencing data (Fig. 4b)140. With the exception of metabolic labeling, the allyl group can also be transferred to m6A by the Dim1/KsgA family of dimethyltransferases, which can specifically convert m6A into allyl-modified m6A (N6-allyl, N6-methyladenosine, a6m6A). Since cyclized a6m6A can induce misincorporation in RT, the technology, named m6A-SAC-seq, achieves quantitative, transcriptome-wide mapping of m6A at single nucleotide resolution141.

Compared with unmodified nucleotides, modified nucleotides exhibit distinct resistance under chemical hydrolysis treatment. Based on this principle, several sequencing methods have been developed and have achieved transcriptome-wide profiling for Nm, m7G, m3C and Ψ. Nm is resistant to alkaline hydrolysis and thus can be mapped at single-base resolution by analyzing read-ends information in sequencing data. Since RNA fragmentation is random and irregular, this strategy requires rather high read coverage and is limited to highly abundant RNAs142,143,144. To overcome this limitation, ribose oxidation sequencing (RibOxi-seq) and Nm-seq have been developed. In these two methods, RNA is treated with iterative oxidation–elimination–dephosphorylation (OED) cycles to remove unmodified nucleotides, and the non-methylated ends cannot be ligated to linkers for sequencing library construction145,146. The selective cleavage and ligation allow detection of low-stoichiometry 2ʹ-O-methylation sites using this approach. Similarly, m7G and m3C are resistant to NaBH4-aniline treatment, and the generated 5′-phosphate end during aniline cleavage could be exploited for selective ligation to enrich modified fragments147,148. This positive selection strategy facilitates transcriptome-wide detection of m7G and m3C. In addition, upon hydrazine-aniline treatment, RNA can be specifically cleaved at m3C-modified sites and resistant to cleavage at Ψ-modified sites. Based on this approach, hydrazine-aniline cleavage sequencing (HAC-seq) and HydraPsiSeq have been developed to detect m3C and Ψ, respectively (Fig. 4a)149,150.

Antibody-based sequencing technologies

Antibody-based strategies have been exploited for transcriptome-wide mapping of several RNA modifications, including m6A/m6Am, m1A, hm5C, ac4C and m7G (Fig. 5)39,53,54,55,148,151,152,153,154. In this strategy, isolated RNA is first fragmented to 100–200 nt, and certain modification-containing RNA fragments are enriched by specific antibody immunoprecipitation. The enriched RNAs are subjected to high-throughput sequencing, and the modifications of interest can be identified by bioinformatic analysis. Robust enrichment allows antibody-based strategies to be very sensitive in detecting low-abundance modifications in mRNA and other rare RNA species.

Fig. 5: Schemes of antibody-based sequencing technologies.
figure 5

a Schematic diagrams and sequencing features of RIP-seq, miCLIP and antibody-based technologies coupled with chemical/enzyme treatment technologies. Modifications that can be detected by the corresponding approaches are listed below. b Flowchart of m6Am-seq.

However, limited by the fragmentation size, the resolution of the antibody-based strategy is approximately 100–200 bp. To improve the detection resolution, several improvements have been made. For instance, introducing UV-induced RNA-antibody crosslinking can lead to truncation or misincorporation at protein–RNA crosslinking sites during reverse transcription, thereby allowing transcriptome-wide single-base resolution detection for m6A and m7G, respectively (Fig. 5a)155,156,157. In addition, coupling antibody immunoprecipitation with enzyme/chemical treatment can not only increase detection resolution but also reduce false positives. For example, taking advantage of the fact that the m1A-induced mutational RT signatures can be erased by AlkB demethylase treatment or Dimroth rearrangement, m1A-MAP, m1A-seq and m1A-IP-Seq have achieved base-resolution m1A methylome detection (Fig. 5a)42,122,158. In addition, utilizing selective in vitro demethylation for m6Am, m6Am-seq has the capability of discriminating m6Am from m6A and can identify m6Am at base resolution (Fig. 5b)159.

Enzyme/protein-assisted sequencing technologies

In addition to antibody immunoprecipitation, some enzymes or RNA modification-related proteins can also be utilized for affinity capture or editing modification-containing transcripts, thereby enabling transcriptomic RNA modification detection (Fig. 6). For instance, by feeding a 5-aza-C analog or overexpressing mutated Nsun2, a covalent bond can be formed between m5C methyltransferase and its target sites. Based on this approach, Aza-IP (5-azacytidine–mediated RNA immunoprecipitation) and miCLIP (methylation iCLIP) can enrich target sites by immunoprecipitation, thereby enabling identification of the direct targets of m5C methyltransferases (Fig. 6a)160,161. In addition to methyltransferase, reader proteins can also be used to target modified nucleotides. For example, in DART-seq (deamination adjacent to RNA modification targets), the cytidine deaminase APOBEC1 is fused to the m6A-binding YTH domain and thus leads to C-to-U deamination at sites adjacent to m6A residues. Furthermore, m6A residues can be identified by analyzing C-to-T mismatches in sequencing data (Fig. 6b)162. Recently, the authors further integrated DART-seq with a single-cell RNA-sequencing platform and thus developed scDART-seq, achieving profiling of the m6A methylome in single cells163. In addition, some RNA exonuclease also possesses binding affinity for certain RNA modifications under certain conditions. For instance, in the presence of Ca2+, E. coli Endonuclease V (eEndoV) promotes binding of inosine in RNA instead of cleavage. Taking advantage of this approach, Endonuclease V immunoprecipitation enrichment sequencing (EndoVIPER-seq) can enrich A-to-I edited transcripts from cellular RNA164.

Fig. 6: Schemes of enzyme/protein-assisted sequencing technologies.
figure 6

Flowchart of a AZA-IP and m5C-miCLIP; b DART-seq; c MAZTER-seq and m6A-REF-seq.

Similar to chemical-induced hydrolysis, some endonucleases also show distinct cleavage efficiency between modified and unmodified transcripts, which can be utilized to enrich modification-containing transcripts and identify modified nucleotides. For instance, the E. coli RNA endoribonuclease MazF can specifically cleave the unmethylated 5ʹ-ACA-3ʹ motif but not the 5ʹ-m6ACA-3ʹ motif165. Taking advantage of this specificity, RNA-endoribonuclease–facilitated sequencing (m6A–REF-seq)/MAZTER-seq allows quantitative profiling of m6A at single-nucleotide resolution (Fig. 6c)166,167. However, a major limitation of this approach is that it can only detect m6A in the ACA context, which is only a small portion (16-25%) of m6A-modified sites.

Nanopore direct RNA sequencing-based detection technology

Next-generation sequencing-based detection technologies have been widely applied for transcriptome-wide RNA modification detection. However, limited by the sequencing length (from 50 to 300 bp) and distinct detection principles for different modifications, next-generation sequencing-based detection technologies cannot map diverse RNA modifications simultaneously. The development of the Oxford Nanopore Technologies (ONT) sequencing platform shows promise in overcoming these challenges. In contrast to next-generation RNA sequencing, nanopore sequencing can sequence RNA directly without the requirement of additional reverse transcription and PCR amplification, thus decreasing the biases caused by these steps168,169,170,171. Mechanistically, single-stranded RNA is driven through the nanopore by the motor protein and thus causes ionic current changes for a set of k nucleotides residing within the pore (kmer; typically, k is 5), which enables decoding of the nucleotide sequence by computational analysis (Fig. 7). In addition to sequence information, chemical modifications and secondary structure, which also influence RNA translocation in nanopores, can be determined directly by computational algorithms172. In addition, the reads generated by nanopore sequencing are long enough to cover the full length of a transcript, thereby enabling accurate identification of highly repetitive regions, spliced products and polyadenylation tail length.

Fig. 7: Schemes of nanopore direct sequencing-based detection technology.
figure 7

a Ionic current changes caused by a single RNA molecule passing through the nanopore. b Current fluctuation caused by modified nucleotides is distinct from that of unmodified nucleotides.

To identify RNA modification from the nanopore RNA direction, robust computational algorithm analysis is vital. There are two major analysis strategies, including alterations of raw signal intensity analysis (signal intensity, dwell time and trace) and base-called “error” feature analysis (base quality, mismatch frequency, and deletion frequency)170. Furthermore, modifications can be identified by the algorithms previously trained with modified and unmodified kmer contexts or comparison with the nonmodified control samples. These computational algorithms have allowed the identification of several modifications, including m6A, m7G, m5C, hm5C, Ψ and Nm, using nanopore direct RNA sequencing169,172,173,174,175,176,177,178,179,180. In addition, owing to the long sequencing length, the sequence information and modification landscape of SARS-CoV-2 can be determined simultaneously by nanopore direct RNA sequencing181,182. However, there are still many challenges and limitations for nanopore RNA sequencing-based detection technology. First, the sequencing error rate (~1–4%)168 is still much higher than that of next-generation sequencing (~0.1-1%). Since some algorithms exploit systematic base-calling errors to identify RNA modifications, the high sequencing error rate increases the complexity for base calling and modification identification. Second, the throughput of nanopore RNA sequencing is relatively low (1–3 Gb per flow cell), and the costs of direct RNA sequencing are high. Recently, several studies have shown that accurate modification detection requires high sequencing coverage (at least 30X)173,180. Hence, the sequencing depth and costs limit broader application of this strategy. Third, nanopore direct RNA sequencing requires a large amount of RNA (~500 ng polyA+ RNA), and thus, it is not suitable for samples that are difficult to obtain. Therefore, efforts are still needed to improve nanopore RNA direct sequencing, including the development of more powerful nanopore proteins and robust computational algorithms.

Discussion

Since all existing detection methods confer both advantages and drawbacks, appropriate detection strategies should be chosen by considering the abundance of RNA modifications, RNA types of interest (rRNA, tRNA, mRNA or miRNA), the amount of starting material, etc. For instance, direct sequencing methods preserve the modification status but are often confounded by other factors, including RNA secondary structure, SNPs, and sequencing errors. Direct sequencing methods also require a high sequencing depth to detect modifications in low-abundance RNA. Antibody-based detection technologies are powerful and capable of detecting low-abundance modifications in rare RNA species. However, due to the enrichment step, the stoichiometry information of modification is lost, and large starting materials are needed. In addition, since antibodies may have intrinsic bias on RNA sequences and secondary structures, caution is needed to eliminate false-positive signals. Chemical treatment-based approaches have been widely applied to RNA modification detection owing to their high labeling specificity. For this strategy, both high labeling efficiency and mild reaction conditions are required for transcriptome-wide modification detection and quantification. Metabolic labeling or RNA modification-related protein-based approaches provide an alternative strategy to characterize modified residues but can only be applied to cultured cells. Therefore, appropriate negative controls are necessary to reduce false positives, including modification status manipulation (enzyme knockdown/knockout), chemical or enzyme treatment, and differential RT conditions. Orthogonal locus-specific detection methods are also necessary to validate the modification sites and status. In addition, synthetic modified spike-in controls preserve stoichiometry information during sequencing122,183. In addition, synthetic RNAs can also be developed for reducing false positives in sequencing-based technologies. For instance, an elegant study, which utilizes an unmodified IVT RNA library resembling the endogenous transcriptome as a negative control, can precisely and quantitatively map m6A and m5C through calibration184.

Nanopore direct RNA sequencing has been applied to the detection of several RNA modifications, showing promise for the simultaneous identification of distinct modifications in a single molecule. However, the accuracy and sensitivity of nanopore direct RNA modification sequencing are limited, and the simultaneous detection of different modifications is still unachieved. Further efforts should be made to improve the instrument to reduce the signal noise and develop robust computational analysis to identify multiple modifications simultaneously. Finally, the development of RNA modification detection technologies for low-input and single-cell samples is still an urgent need for facilitating understanding of the roles of RNA modifications in physiological and pathological processes. Improvement in integrating existing RNA modification detection strategies with single-cell RNA sequencing platforms shows promise. Collectively, recent progress in detection technologies promotes functional studies of RNA modifications, and we anticipate that further advancement will lead to a more comprehensive understanding of the epitranscriptome.