Profiling of RNA-binding protein binding sites by in situ reverse transcription-based sequencing

Xiao, Yu; Chen, Yan-Ming; Zou, Zhongyu; Ye, Chang; Dou, Xiaoyang; Wu, Jinjun; Liu, Chang; Liu, Shun; Yan, Hao; Wang, Pingluan; Zeng, Tie-Bo; Liu, Qinzhe; Fei, Jingyi; Tang, Weixin; He, Chuan

doi:10.1038/s41592-023-02146-w

Download PDF

Article
Open access
Published: 10 January 2024

Profiling of RNA-binding protein binding sites by in situ reverse transcription-based sequencing

Nature Methods volume 21, pages 247–258 (2024)Cite this article

18k Accesses
2 Citations
18 Altmetric
Metrics details

Subjects

Abstract

RNA-binding proteins (RBPs) regulate diverse cellular processes by dynamically interacting with RNA targets. However, effective methods to capture both stable and transient interactions between RBPs and their RNA targets are still lacking, especially when the interaction is dynamic or samples are limited. Here we present an assay of reverse transcription-based RBP binding site sequencing (ARTR-seq), which relies on in situ reverse transcription of RBP-bound RNAs guided by antibodies to identify RBP binding sites. ARTR-seq avoids ultraviolet crosslinking and immunoprecipitation, allowing for efficient and specific identification of RBP binding sites from as few as 20 cells or a tissue section. Taking advantage of rapid formaldehyde fixation, ARTR-seq enables capturing the dynamic RNA binding by RBPs over a short period of time, as demonstrated by the profiling of dynamic RNA binding of G3BP1 during stress granule assembly on a timescale as short as 10 minutes.

TREX reveals proteins that bind to specific RNA regions in living cells

Article Open access 19 February 2024

Transcriptome-wide identification of RNA-binding protein binding sites using seCLIP-seq

Article 23 March 2022

Robust single-cell discovery of RNA targets of RNA-binding proteins and ribosomes

Article 07 May 2021

Main

RBPs dynamically interact with their RNA targets to regulate RNA fate in all aspects, including transcription, splicing, modification, localization, translation and degradation¹. The dysfunction of RBPs or their binding to RNA substrates can lead to various defects or even diseases. Effective methods to capture RBP–RNA interactions, particularly dynamic or even transient interactions, are critical for a better understanding of RBP and its functional effect on target RNAs².

The widely used approaches to identify RBP targets are based on immunoprecipitation (IP) of the specific RBP along with its bound RNAs, either through direct RNA IP (RIP) or crosslinking IP (CLIP) assisted by covalent capture^{3,4,5,6,7,8,9,10,11,12,13,14,15}. Substrate RNAs bound by a specific RBP can be enriched through either RIP or CLIP using the antibody against the RBP, followed by high-throughput sequencing (seq) to profile RBP targets across the whole transcriptome. CLIP-seq captures RBP binding sites on substrate RNAs via covalent crosslinking. RNase treatment digests RBP-free regions of RNAs, increasing the resolution of binding site detection^{7,8,9,10,14,15}. CLIP-seq variants such as PAR-CLIP or eCLIP improve the crosslinking efficiency, specificity or binding site resolution^7,9. While effective and widely used, these methods also have limitations. They often require a large amount of starting materials due to the low IP efficiency; the ultraviolet (UV) crosslinking in CLIP-based methods is a low-efficiency chemical reaction. Recently reported tRIP-seq and LACE-seq can be applied in low-input samples but at the cost of reducing the library complexity^12,13.

TRIBE and STAMP type approaches fuse RBPs with an RNA base editor to introduce mutations nearby RBP binding sites, bypassing IP to identify RBP binding sites^{16,17,18,19,20,21}. These methods could be readily applied to study RBP binding in live cells and with limited materials down to single-cell level. Their deployments into research have offered new opportunities; however, these editing-based methods still have limitations. They require genome manipulation by inserting base editing proteins in germlines or cell lines, hindering their application in primary cells and tissues. Inducing editing protein expression typically takes roughly 24 hours or longer, which cannot be applied to monitor dynamic RNA binding by RBPs. These base editors have their own sequence preferences, potentially changing the native binding profile of the target RBP. While we were working on our method, RT&Tag, a method derived from the CUT&Tag strategy, was published^22,23. This method profiles RBP–RNA interaction by oligo(dT) primer-initiated reverse transcription (RT) and Tn5 tagmentation of the resulting full-length RNA–complementary DNA (cDNA) heteroduplex in isolated nuclei. RT&Tag can identify RBP binding in polyadenylated RNAs but is ineffective in nonpolyadenylated RNAs and cytoplasmic RBP binding. Due to the low efficiency of the Tn5 enzyme on heteroduplex, it requires 25,000–100,000 nuclei to obtain sufficient transcriptome-wide binding signals.

To overcome the limitations of existing methods, we introduce an assay of RT-based RBP binding site sequencing (ARTR-seq) to capture RBP–RNA interactions through in situ RT. We demonstrate that ARTR-seq sensitively profiles RBP targets with good sequencing quality, using as few as 20 cells or a single tissue section. Additionally, an imaging step can be readily built into the ARTR-seq procedure, providing direct spatial information of RBPs. With ARTR-seq, we show distinct binding patterns of splicing factors and the YTH family reader proteins of RNA N⁶-methyladenosine (m⁶A) modification. ARTR-seq unbiasedly detects RNA binding by RBPs in both cytoplasm and nucleus and measures RBP binding strength on RNA substrates. Furthermore, ARTR-seq could monitor dynamic RNA binding by G3BP1 during stress granule (SG) assembly on a small timescale of 10 minutes.

Results

Strategy and development of ARTR-seq

In ARTR-seq, we started with rapid formaldehyde fixation to preserve the cellular structure, followed by permeabilization of cell membranes (Fig. 1a(i)). We then targeted the reverse transcriptase (RTase) to the RBP of interest using corresponding antibodies (Fig. 1a(ii)). This involved delivering the primary antibody for RBP recognition (Fig. 1a(ii)1), followed by a secondary antibody to enhance the local antibody concentration, capitalizing on the potential for multiple secondary antibodies to bind a single primary antibody (Fig. 1a(ii)2). Subsequently, a fusion protein of protein A/G and RTase (pAG-RTase) was delivered to bind both primary and secondary antibodies, enabling site-specific attachment of RTase to the target RBP (Fig. 1a(ii)3). Each step was followed by thorough washing to remove any unbound antibodies or pAG-RTase.

**Fig. 1: ARTR-seq strategy and validation.**

After localizing RTase to the RBP, we initiated in situ RT at RBP binding sites by adding necessary RT components (Fig. 1a(iii)). To achieve efficient RT, we screened three commonly used RTases, including engineered Moloney murine leukemia virus (MMLV) RTase^24,25, human immunodeficiency virus RTase and a truncated version of engineered MMLV RTase (25–497) in the pAG-RTase fusion constructs with a 30-amino-acid linker (Extended Data Fig. 1a,b). By employing RT with quantitative polymerase chain reaction (RT–qPCR), we confirmed pAG-MMLV RTase (25–497) as the most active and selected it for subsequent studies (Fig. 1b and Extended Data Fig. 1c).

To identify all RBP binding sites without sequence bias, we applied random RT primers with an adapter tag for library construction, and extended the primer length from commonly used 6 nucleotides (nts) to 10 nts to enhance RT efficiency (Extended Data Fig. 1d). For effective cDNA enrichment, biotinylated dNTPs were introduced into cDNA products. After screening, we found that biotin-16-dUTP and biotin-16-dCTP exhibited the least hindrance on RT efficiency (Extended Data Fig. 1e). These were included in a 1:1 ratio with regular dTTP and dCTP, respectively, in the current ARTR-seq protocol. Following cDNA enrichment with streptavidin beads, we performed adapter ligation, library amplification and high-throughput sequencing to acquire the RBP binding profile (Fig. 1a(iv)). Note that after in situ RT, immunofluorescence imaging could be performed to reveal RBP subcellular localization without disturbing the subsequent library construction if the secondary antibody and pAG-RTase are fluorophore-modified.

Validation of ARTR-seq using PTBP1

To evaluate ARTR-seq, we applied ARTR-seq to PTBP1, a well-studied splicing factor with a variety of published CLIP-seq datasets for comparison. To verify the production of biotinylated cDNAs from in situ RT, we monitored the biotin group in the cDNA products by dot plot, confirming the incorporation of biotin and requirements of pAG-RTase and primary antibody for successful cDNA synthesis (Fig. 1c). With immunofluorescence staining, we further validated the colocalization of pAG-RTase, the secondary antibody and newly synthesized cDNA, and their signals largely disappeared on exclusion of the primary antibody, supporting the localized RT reaction performed by pAG-RTase tethered to the targeted RBP (Fig. 1d and Extended Data Fig. 1f). Note that the use of the secondary antibody increased the biotinylated cDNA yield (Fig. 1d and Extended Data Fig. 1f,g). Altogether, ARTR-seq specifically and effectively reverse transcribes RNAs near the targeted protein into biotinylated cDNA products.

We next tested ARTR-seq on PTBP1 using 40,000 HepG2 or HeLa cells, and compared the results with the published data from several known methods, namely CLIP, iCLIP, irCLIP, eCLIP, sCLIP, tRIP, LACE-seq and RT&Tag^{9,10,11,12,13,22,26,27}. We observed that ARTR-seq displayed a comparable or higher percentage of usable reads compared to published methods, indicating a high complexity of the ARTR-seq libraries (Extended Data Fig. 2a,b). Then, we calculated the correlation between biological replicates (R = 0.98 for both HepG2 and HeLa samples), and confirmed good reproducibility of ARTR-seq (Fig. 2a).

**Fig. 2: ARTR-seq captures binding sites of RBPs using as few as 20 cells.**

Further, we introduced input samples prepared by ARTR-seq with the omission of the primary antibody as controls to help filter out potential background signals from the nonspecific binding of the secondary antibody and RTase (Extended Data Fig. 2c). For PTBP1, we found that over 70% of usable reads and over 80% of ARTR-seq peaks were annotated to introns, with most exon peaks located within the 3′ untranslated region (3′ UTR), consistent with results reported by other methods^{10,12,13,26,27,28,29} (Fig. 2b and Extended Data Fig. 2d,e). The consensus motif of PTBP1 ARTR-seq peaks was identified as the canonical CU-enriched sequence, as known previously³⁰ (Fig. 2b). At the whole-transcriptome scale, ARTR-seq reads for PTBP1 piled up at the eCLIP peaks, while the input sample did not show such accumulation³¹ (Extended Data Fig. 3a,b). Additionally, we observed that more than 50% of genes identified by ARTR-seq were also detected by other methods (52% for eCLIP, 51% for LACE-seq and 82% for iCLIP). At the peak level, ARTR-seq successfully identified 41% of eCLIP-targeted peaks (Extended Data Fig. 3c). Examination of individual PTBP1 binding sites revealed similar read distribution and density between ARTR-seq and eCLIP or iCLIP results (Fig. 2c and Extended Data Fig. 3d). To further validate PTBP1 bindings captured by ARTR-seq, we knocked down PTBP1 in HepG2 cells using two distinct small-interfering RNAs (siRNAs) and performed ARTR-seq (Extended Data Fig. 3e). The reads located around the ARTR-seq peaks reduced accordingly on PTBP1 knockdown, indicating the high specificity of ARTR-seq (Fig. 2d).

Direct versus indirect binding sites detected by ARTR-seq

ARTR-seq identifies RBP binding by in situ RT, enabling the capture of RNAs directly bound by the RBP (direct targets) or potentially those spatially close to the RBP (indirect targets) (Extended Data Fig. 4a). To evaluate direct versus indirect targets, we used the splicing factor RBFOX2 as an example; RBFOX2 possesses a well-defined canonical binding motif ‘UGCAUG’^9,31. Peaks near the UGCAUG motifs likely represent direct targets, while those farther away may indicate indirect targets. We found more than 70% of ARTR-seq peaks were within 500 nts from UGCAUG. This percentage is slightly higher than that of eCLIP⁹. The two methods were comparable when the distance from peaks to UGCAUG was within 200 nts (Extended Data Fig. 4b). It is worth noting that RBFOX2 may have other noncanonical binding sites beyond the UGCAUG motif, as suggested by the similar percentage of distant RBFOX2 eCLIP peaks from this motif. Stringent cutoffs of signal values and q values for peaks increased confidence in identifying the direct targets, albeit at the expense of target numbers (Extended Data Fig. 4c,d). Furthermore, we also examined YTHDF2, an m⁶A binding protein³². Approximately 80% of YTHDF2 ARTR-seq peaks were within 300 nts from m⁶A sites identified by m⁶A-SAC-seq³³, comparable to that from the PAR-CLIP method³² (Extended Data Fig. 4e). These results indicate that the indirect interactions captured in ARTR-seq are likely limited. The percentage of direct targets identified by ARTR-seq is comparable to those observed in CLIP-based methods.

To further interrogate potential indirect targets identified in ARTR-seq, we limited the movement range of RTase by shortening the linker in pAG-RTase or omitting the secondary antibody (Extended Data Fig. 5a–c). We found shorter linkers reduced RT activity of pAT-RTase, indicating that shorter linkers might lead to a slowdown in the RTase kinetics (Extended Data Fig. 5d). In RBFOX2 ARTR-seq, the use of shorter linkers or omitting the secondary antibody resulted in decreased biotinylated cDNA yields but slightly increased read accumulation at RBFOX2 ARTR-seq peaks, indicating reduced RT efficiency but concentrated signals (Extended Data Fig. 5e–g). Moreover, we observed a little higher percentage (1.9–3.4%) of peaks within 500 nts of UGCAUG with a shorter linker or omitting the secondary antibody (Extended Data Fig. 5h). These findings indicate that restricting the RTase movement range tested here moderately reduced potential indirect RNAs captured by ARTR-seq. Optimal RT efficiency is another factor that needs to be considered when designing linkers.

Resolution of ARTR-seq

To assess the resolution of ARTR-seq, we examined the distribution of RBFOX2 peak centers around UGCAUG sites, and observed a clear enrichment with most peaks positioned within 200 nts flanking the UGCAUG motif (Extended Data Fig. 6a). Furthermore, we conducted a parallel analysis on YTHDF2. Compared to RBFOX2, we observed a similar but more enriched distribution for YTHDF2 around the corresponding m⁶A sites, further supporting the capability of ARTR-seq in capturing RBP binding sites (Extended Data Fig. 6b).

In an attempt to improve the resolution of binding site identification by ARTR-seq, we evaluated the impact of RNase treatment on RBFOX2 ARTR-seq. As expected, the stronger RNase treatment reduced the library fragment lengths (Extended Data Fig. 6c). We observed that the stronger RNase treatment led to a sharper enrichment of RBFOX2 ARTR-seq peaks around UGCAUG sites, indicating an improved resolution upon RNase treatment (Extended Data Fig. 6d). Through quantification of biotinylated cDNA, we found that samples with stronger RNase treatment exhibited lower RT efficiency (Extended Data Fig. 6e). Moreover, stronger RNase treatment markedly reduced the proportion of peaks located within 500 nts of the canonical UGCAUG motif. This suggests that the application of RNase may reduce reads from direct targets, thereby potentially elevating the ratio of nonspecific or indirect binding signals (Extended Data Fig. 6f). Overall, our studies revealed that RNase treatment could improve ARTR-seq resolution. The strength of RNase treatment in ARTR-seq needs to be optimized to achieve the desired balance between resolution and sensitivity, especially for samples with limited starting materials.

ARTR-seq detects PTBP1 binding sites with as few as 20 cells

The in situ RT-based ARTR-seq bypasses the IP step to minimize sample loss, potentially making it feasible for low cell number samples. To test this, we generated libraries for PTBP1 using different numbers of HepG2 cells and compared the results with published data from LACE-seq and RT&Tag of low cell number samples^13,22. The correlations remained strong for ARTR-seq libraries prepared from as few as 20 cells (Extended Data Fig. 7a). Additionally, ARTR-seq libraries exhibited a much higher percentage of usable reads compared to other methods when using comparable numbers of cells (Fig. 2e and Extended Data Fig. 7b,c). Furthermore, PTBP1 ARTR-seq presented a consistently high percentage of intronic reads, suggesting its effectiveness in capturing informative reads even with the limited starting materials (Extended Data Fig. 7d). We further subsampled libraries to an equal sequencing depth and examined their reads distribution at peaks identified in the corresponding bulk samples. Compared to LACE-seq, ARTR-seq exhibited a clearer accumulation at the peak center with a higher proportion of effective reads (Fig. 2f and Extended Data Fig. 7e). Visible ARTR-seq signal remained stable for libraries with different numbers of cells as exemplified in the Integrative Genomics Viewer (IGV) plot (Fig. 2g).

Because PTBP1 binds to a canonical CU-enriched sequence, we compared the CT percentages in usable reads of PTBP1 libraries constructed by different methods. We found that all the ARTR-seq libraries showed comparable or higher CT percentages compared to that of other methods^{10,13,26,27,28} (Fig. 2h). We further assessed the read distribution around CU-enriched regions and observed the stable read accumulation in ARTR-seq libraries of all cell numbers, peaking at the region center (Fig. 2i). Taken together, ARTR-seq can effectively and specifically capture the RBP binding sites, even with limited starting materials.

Application of ARTR-seq in mouse embryo sections

RBPs can have strong tissue-specific expression, or are only expressed in certain tissues rather than cultured cells. Identifying RBP binding sites in tissues remains technically challenging³⁴. IP-based methods require dissociating tissues into single cells for UV crosslinking, limiting their application to whole tissues, particularly embedded frozen tissues or formalin-fixed tissues. Editing-based methods require genetic modification and cannot be applied to patient tissues.

ARTR-seq offers an opportunity for identifying RBP binding sites in tissues. We studied RBFOX2 with a section of OCT-embedded E11 mouse embryo to validate the feasibility of ARTR-seq in tissue samples (Fig. 3a). We first confirmed the nuclear localization of RBFOX2 with the ARTR-seq built-in imaging (Fig. 3b). The ARTR-seq reads for mouse embryo tissue showed a high percentage of usable reads and good reproducibility between biological replicates (Supplementary Fig. 1a,b). Compared to the input, a higher percentage of usable reads from RBFOX2 ARTR-seq were mapped to introns, consistent with the known binding preference of RBFOX2 (ref. ³¹) (Supplementary Fig. 1c). RBFOX2 binding peaks were mostly located in introns and contained the canonical UGCAUG motif⁹ (Fig. 3c). Additionally, we observed that mouse tissue samples displayed a similar percentage of usable reads containing UGCAUG motifs to that of HepG2 cell samples, indicating comparable signal detection efficiency of ARTR-seq for tissues and cultured cells (Fig. 3d). Examination of individual binding sites further supported the recognition of UGCAUG by RBFOX2 (Fig. 3e). Overall, ARTR-seq can identify RBP binding sites in embedded tissue samples with high specificity.

**Fig. 3: ARTR-seq maps RBP binding sites in tissues.**

ARTR-seq profiles regulatory features of splicing factors

PTBP1 and RBFOX2 are well-known splicing factors, with PTBP1 belonging to the heterogeneous ribonucleoprotein (hnRNP) family³⁵. To show broader applicability of ARTR-seq, we also studied HNRNPC, another splicing factor belonging to the hnRNP family (Extended Data Fig. 8a). Consistent with the binding preference of the splicing factors, both reads (over 70%) and peaks (over 80%) from the ARTR-seq libraries of all three splicing factors (PTBP1, HNRNPC and RBFOX2) were mainly located in introns in HepG2 cells (Fig. 4a,b and Extended Data Fig. 8b). The RNA-binding motifs of RBFOX2 and HNRNPC were the canonical UGCAUG and U-rich sequences, respectively, consistent with the previous report³¹ (Fig. 4a,b).

**Fig. 4: RNA binding by splicing factors identified in ARTR-seq.**

To explore the association between splicing factor binding and splicing regulation, we identified the alternative splicing events by comparing the ENCODE (Encyclopedia of DNA Elements) RNA sequencing (RNA-seq) data from RBP-knockdown (KD) cells with those from control cells³⁶. We found most alternative splicing events were categorized as exon skipping (Fig. 4c). We then generated ‘splicing maps’ for exon skipping events³⁷ (Fig. 4d). The corresponding ARTR-seq peaks were predominantly enriched at upstream proximal introns of the included exons upon RBP-KD, at downstream proximal introns of the excluded exons upon RBFOX2-KD and at both upstream and downstream proximal introns of the included exons upon HNRNPC-KD, but not around native cassette exons and constitutive exons. We quantified relative RBP binding strength by ARTR-seq enrichment at the gene level, and observed that genes with higher ARTR-seq enrichment tend to present a higher splicing difference upon RBP-KD (Fig. 4e and Extended Data Fig. 8c). In addition to exon skipping, the number of included retained introns upon PTBP1-KD (491 events) outnumbered other splicing modes. With further inspection, we found that higher enrichment corresponded to higher splicing inclusion differences of retained introns, similar to the trend observed for exon skipping instances (Extended Data Fig. 8d). Altogether, ARTR-seq robustly captures distinctive binding patterns for different splicing factors, and the ARTR-seq enrichment could indicate differences in splicing.

ARTR-seq identifies binding features of m⁶A reader proteins

In addition to sequence recognition, RBPs can also target RNAs in a chemical modification-dependent manner. m⁶A modification is the most prevalent chemical modification in mammalian messenger RNA (mRNA), and m⁶A reader proteins can preferentially bind m⁶A-modified RNAs to regulate its processing and metabolism in both the nucleus and cytoplasm^{32,38,39,40,41}. We performed ARTR-seq for two cytosolic m⁶A readers YTHDF1 and YTHDF2, and a nuclear reader YTHDC1 in HeLa cells.

We first verified the subcellular localization of the three readers with ARTR-seq built-in imaging (Extended Data Fig. 9a). Sequencing data from ARTR-seq remained highly reproducible between replicates (Extended Data Fig. 9b). Over 80% of the peaks of the two cytoplasmic m⁶A readers (YTHDF1 and YTHDF2) were located in exons, whereas roughly 81% of the peaks of nuclear reader YTHDC1 were located in introns or intergenic regions, consistent with their distinct subcellular localization (Fig. 5a and Extended Data Fig. 9a,c). The high unique peak ratios observed for the three reader proteins (84.2% for YTHDC1, 34.3% for YTHDF1 and 47.5% for YTHDF2) are attributed to their unique subcellular localization; YTHDF1 and YTHDF2 display different sequences of the N-terminal low-complexity domains, which most likely affect their binding to different partner proteins and therefore different RNA targets⁴² (Extended Data Fig. 9d). We further investigated the much more abundant non-exonic peaks of YTHDC1, and found more than half of them located in repeat elements, with long interspersed nuclear elements (roughly 45%) being the most prevalent, consistent with a previous report⁴¹ (Fig. 5b). Analysis of exonic peak distribution along mRNA showed enrichment around stop codons for all these m⁶A readers, resembling the meta profile of m⁶A modifications, especially for YTHDF1 and YTHDF2 (ref. ³³) (Fig. 5c and Extended Data Fig. 9e).

**Fig. 5: ARTR-seq maps binding features of the selected m⁶A binding proteins.**

Further, we calculated the percentage of exonic peaks overlapping with m⁶A sites in polyadenylated RNAs identified by m⁶A-SAC-seq (ref. ³³). The ARTR-seq peaks for all three readers showed higher percentages than random peaks, comparable to the YTHDF2 peaks from PAR-CLIP³², supporting the m⁶A-dependent binding features of these three readers (Fig. 5d). We then analyzed the association between the m⁶A fraction and RBP binding strength, and observed that the group with higher m⁶A fractions showed higher RBP enrichment signals for YTHDF1 and YTHDF2, further suggesting ARTR-seq can measure the relative binding strength of RBPs (Fig. 5e). However, the association for YTHDC1 was weaker, potentially due to the limited number of exonic YTHDC1 peaks (Extended Data Fig. 9f). Overall, ARTR-seq captures different features of three m⁶A binding proteins in cytoplasm and nucleus.

Dynamic RNA binding of G3BP1 during SG assembly

SGs are membraneless organelles composed of proteins and RNAs and formed in response to stress. The RBP G3BP1 is the central node in the network of protein–RNA interaction during SG assembly^43,44. Under sodium arsenite (NaAsO₂) treatment, SGs could be observed after 13 min with a progressive increase in size over time, with most of the SG assembly completed by 40 min, providing a rapid stress response⁴⁵. However, whether RNA targets of G3BP1 vary during SG assembly has yet to be investigated.

Taking advantage of the potential high temporal resolution offered by fast formaldehyde fixation and low material requirements of ARTR-seq, we performed ARTR-seq for G3BP1 in HeLa cells with 0.5 mM NaAsO₂ treatment and monitored the SG assembly process at time intervals of 0, 10, 20 and 60 min poststress. We first visualized G3BP1 localization using immunofluorescence imaging, and confirmed the gradual condensation of G3BP1 into granules over time (Fig. 6a). The colocalization of G3BP1 and biotinylated cDNA products was further verified (Fig. 6b). Subsequently, the verified samples were used for ARTR-seq library construction and sequencing. We determined G3BP1 binding strength by calculating the ARTR-seq log₂ fold change (log₂FC) between G3BP1 and input samples at the gene level. Roughly 78% of G3BP1–RNA targets (log₂FC ≥ 1, P < 0.05) were no longer enriched at 60 min (T60) post-NaAsO₂ treatment (Fig. 6c). SG enrichment of RNA was previously assessed by sequencing RNAs isolated from NaAsO₂-induced SGs to quantify their relative localization within SGs⁴⁶. Through integrative analysis, we observed that G3BP1 targets at T60 showed notably higher SG enrichment compared to the starting point without stress (Fig. 6d). These results support the accuracy of ARTR-seq and revealed distinct RNA binding of G3BP1 in the presence and absence of stress. The functions of stress-induced G3BP1 targets (T60_only) were enriched to Kyoto Encyclopedia of Genes and Genomes (KEGG) pathways of protein processing in the endoplasmic reticulum and human papillomavirus infection, consistent with previous observations^47,48 (Fig. 6e).

**Fig. 6: Dynamic RNA binding of G3BP1 during the assembly of SGs.**

To further explore the dynamic RNA targeting of G3BP1 over time, we calculated pairwise correlations of the G3BP1 binding strength among time points. The generally low correlation coefficients (R = 0.38–0.57) suggested distinct G3BP1 bindings at different time intervals (Extended Data Fig. 10a). RNAs were previously classified into SG-enriched RNAs and SG-depleted RNAs according to their SG enrichment⁴⁶. We found that during SG assembly, G3BP1 binding strength from ARTR-seq gradually increased for SG-enriched RNAs and decreased for SG-depleted RNAs, suggesting a shift of G3BP1 targets toward SG-enriched RNAs (Fig. 6f,g). Some RNAs displayed stable G3BP1 binding, while others showed dynamic G3BP1 binding across time intervals (Fig. 6h and Extended Data Fig. 10b,c). We then grouped these RNAs based on G3BP1 binding strength using the fuzzy c-means clustering algorithm. We found that G3BP1 binding strength for these RNAs displayed not only unidirectional trajectories of increasing or decreasing, but also transient changes during 60 minutes of NaAsO₂ treatment, suggesting rapid and dynamic cellular responses to stress (Fig. 6h,i and Extended Data Fig. 10d). Taken together, ARTR-seq unveiled the highly dynamic nature of G3BP1–RNA interactions during SG assembly, demonstrating its capability in tracking temporal changes of protein–RNA interactions with limited starting materials.

Discussion

In this work, we present ARTR-seq, a method that captures RBP binding sites using in situ RT by antibody-located RTase. ARTR-seq demonstrated high sensitivity and specificity, even when using as few as 20 cells or limited tissues. The procedure is compatible with immunofluorescence imaging, providing direct spatial information of the targeted proteins without affecting downstream sequencing. With ARTR-seq, we observed the unique binding characteristics of PTBP1, RBFOX2 and HNRNPC related to their splicing regulatory roles. ARTR-seq also detected the preferences of m⁶A reader proteins, YTHDF1, YTHDF2 and YTHDC1. Furthermore, we showed dynamic RNA binding of G3BP1 during SG assembly.

One advantage of ARTR-seq is the use of in situ RT to bypass the antibody-based IP step, thereby reducing material loss. ARTR-seq is also highly versatile and applicable for cell lines, tissues, and even clinical formaldehyde-fixed samples. Both inspired by CUT&Tag⁴⁹, ARTR-seq displays distinct advantages compared to the recently reported RT&Tag²². First, ARTR-seq uses random primers to unbiasedly capture local signals, while RT&Tag uses oligo(dT) primer for RT, potentially losing signals from nonpolyadenylated RNAs. Additionally, RT&Tag may experience reduced local resolution due to uniform RT initiation from the poly-A tail and long matured mRNA length (roughly 2,065 bp)⁵⁰, leading to coverage bias toward the RNA 3′ end. Second, Tn5 tagmentation on the RNA–cDNA heteroduplex is less efficient, hindering its applications when using limited starting materials. Third, ARTR-seq can be applied in various cellular compartments, whereas RT&Tag is limited to the isolated nucleus.

Investigations of dynamic RBP binding have been hindered by low UV-crosslinking efficiency, long incubation time and high material demands using the existing methods. Benefiting from highly efficient formaldehyde crosslinking and low starting material requirements, ARTR-seq excels at capturing transient RBP binding across various time intervals. In this work, we have demonstrated its application in capturing dynamic RNA binding of G3BP1 during SG assembly on a timescale of 10 minutes. We envision that the high temporal resolution of ARTR-seq will enable the investigation of dynamic or even transient RBP–RNA interaction in many other events.

Limitations

The good quality of the primary antibody is a prerequisite for ARTR-seq. For those RBPs without good quality antibodies, ARTR-seq may not accurately capture RBP–RNA interactions. However, the availability of a suitable antibody is a common challenge faced by all antibody-based methods. To overcome this limitation, strategies such as knocking in a tag protein in frame with the targeted RBP or expressing the tagged RBP could be used.

Formaldehyde fixation preserves biological samples at a high temporal resolution, but limitations exist, such as perturbing biomolecular condensates due to the faster protein–protein interaction dynamic than the fixation rate⁵¹. Strategies to increase the fixation rate, such as increasing the formaldehyde concentration or moderately raising the fixation temperature, can mitigate such artifacts. Like most other methods, ARTR-seq may face challenges when applied to low-abundance RBPs. Approaches such as increasing starting materials or RBP overexpression could be used. Additionally, unlike the editing-based methods, which are compatible with long-read sequencing, ARTR-seq typically shows short fragment lengths (averaging around 60 bp), hindering the identification of isoform-specific binding patterns (Extended Data Fig. 6c). Last, the linker length needs to be optimized when detecting direct versus indirect targets using ARTR-seq, and RNase treatment could be considered to obtain higher resolution binding sites.

Methods

Cell culture and stress treatment

HeLa cells (American Type Culture Collection (ATCC) catalog no. CCL-2) and HepG2 cells (ATCC, catalog no. HB-8065) were purchased from ATCC and cultured in DMEM medium (Gibco) supplemented with 10% fetal bovine serum (Gibco) and penicillin-streptomycin (Gibco). K562 cells (ATCC, catalog no. CCL-243) were obtained from ATCC and cultured in RPMI 1640 Medium (Gibco) supplemented with 10% (v/v) fetal bovine serum. Penicillin-streptomycin (Gibco) and 2 mM l-glutamine (Gibco). Cells were grown at 37 °C with 5% CO₂. For NaAsO₂ treatment, HeLa cells were grown to 90% confluence and replaced in the prewarmed DMEM medium containing 0.5 mM NaAsO₂, which was further maintained at 37 °C with 5% CO₂ for indicated times.

Expression and purification of recombinant protein A/G-RTase

The recombinant plasmids were constructed by assembly of pet28A vector, protein A/G (pAG), linkers of different lengths and RTase or the modified RTase with NEBuilder HiFi DNA Assembly Master Mix (NEB) or USER enzyme (NEB) following the manufacturer’s protocols. The Protein A/G DNA segment was amplified from the pAG/MNase plasmid (Addgene, catalog no. 123461). The engineered MMLV RTase was modified from the pCMV-PE2 plasmid (Addgene, catalog no. 132775). The recombinant proteins were expressed in BL21(DE3) Competent Escherichia coli (NEB) with isopropyl-β-d-thiogalactoside induction at 16 °C for 18 h. Cells were collected by centrifuge at 5,500g for 10 min and lysed in the buffer of 50 mM Tris-HCl pH 7.5, 300 mM NaCl and 1 mM PMSF with sonication at 10 s on and 10 s off setting for 10 min at 4 °C. The recombinant proteins were purified from the supernatant using HisTrap HP column (GE Healthcare), followed by an ion exchange chromatography column (GE Healthcare) on an AKTA Purifier 10 system (GE Healthcare) according to the manufacturer’s protocol, and then concentrated to about 20 mg ml⁻¹. The purified enzyme was supplemented with 40% glycerol and stored at −80 °C for future use.

RT–qPCR

RNA was reverse transcribed with the purified pAG-RTases or commercial RTases in reaction buffer (50 mM Tris-HCl, 150 mM NaCl, pH 7.5) at 37 °C for 15 min, and denatured at 85 °C for 5 min. qPCR was performed with FastStart Essential DNA Green Master (Roche) on LightCycler 96 System (Roche). The efficiency of RT was quantified using the delta quantitation cycle method.

Protein detection by Coomassie brilliant blue stain and western blot

The mammalian cell samples were lysed with cold RIPA buffer (Thermo Fisher Scientific) containing 1× protease inhibitor cocktail (Roche). The cell lysate was cleared with centrifugation at 15,000g for 10 min at 4 °C. The supernatant or purified protein was then mixed with LDS loading buffer (Bio-Rad) and boiled at 95 °C for 10 min. Denatured protein was loaded into 4–12% NuPAGE Bis-Tris gel (Thermo Fisher Scientific). For Coomassie brilliant blue stain, the gel was stained with Imperial Protein Stain (Thermo Fisher Scientific) and imaged by FluroChem R (Proteinsimple). For the western blot, the protein was transferred to the polyvinyl difluoride membrane from the gel. The membranes were blocked in 3% BSA (diluted in PBST (PBS with 0.1% Tween-20)) for 1 h at room temperature, incubated in a 1:1,000 diluted primary antibody solution at 4 °C overnight, washed four times with PBST (PBS with 0.1% Tween-20), and incubated in a 1:5,000 dilution of horseradish peroxidase (HRP)-conjugated secondary antibody for 1 h at room temperature if the primary antibody was not conjugated with HRP. The membranes were supplied with SuperSignal West Dura Extended Duration Substrate kit (Thermo Fisher Scientific) and imaged on the FluroChem R machine (Proteinsimple). Quantification was performed using ImageJ software (v.2.3.0).

Transfection

PTBP1 siRNA was purchased from Horizon Discovery/Dharmacon. Cells were seeded in 30% confluency. After incubation for 12 h, siRNA was transfected with RNAimax (Thermo Fisher Scientific) following the manufacturer’s manual. The fresh medium was changed at 6 h posttransfection. Cells were cultured for another 48 h, and the protein knockdown efficiency was quantified by western blot.

ARTR-seq

Cells were fixed to an imaging-compatible chamber with 1.5% paraformaldehyde (PFA) at room temperature for 10 min. To mitigate cell loss, 1.5% PFA crosslinking was applied instead of the commonly used 1% PFA crosslinking. The samples were then quenched with 125 mM glycine at room temperature for 5 min, washed twice with Dulbecco’s PBS (DPBS) and permeabilized with 0.5% Triton X-100 in DPBS on ice for 10 min. Each DPBS washing step involved 3 min of incubation at room temperature. Next, samples were washed twice with DPBS, blocked with the blocking buffer (1 mg ml⁻¹ UltraPure BSA, 0.2 U μl⁻¹ RNaseOUT in DPBS) at room temperature for 30 min and stained with the diluted primary antibody at room temperature for 1 h. The primary antibody was diluted with blocking buffer according to the manufacturer’s instructions for immunofluorescence or at a 1:200 dilution if no specific guidance was provided. For input samples, the primary antibody diluent was replaced by the blocking buffer. Subsequently, samples were stained with fluorophore-labeled secondary antibody (1:500 diluted in the blocking buffer) at room temperature for 30 min, followed by incubation with pAG-RTase (10 nM in the blocking buffer) for an additional 30 min. Cells were washed three times with DPBS after each staining step by shaking at room temperature for 3 min.

An RT reaction mixture was prepared by mixing 2 μM adapter-RT primer (5′-AGACGTGTGCTCTTCCGATCTNNNNNNNNNN-3′), 0.05 mM biotin-16-dUTP (Jena Bioscience), 0.05 mM biotin-16-dCTP (Jena Bioscience), 0.05 mM dTTP (Thermo Fisher Scientific), 0.05 mM dCTP (Thermo Fisher Scientific), 0.1 mM dATP (Thermo Fisher Scientific), 0.1 mM dGTP (Thermo Fisher Scientific), 1 U μl⁻¹ RNaseOUT (Thermo Fisher Scientific) in 50 μl buffer of DPBS supplemented with 3 mM MgCl₂. In situ RT was performed by immersing cells with the RT reaction mixture and incubating at 37 °C for 30 min, then stopped by adding 20 mM EDTA and 10 mM EGTA and incubating at room temperature for 3 min.

Next, cells were stained with 1:200 diluted biotin monoclonal antibody (BK-1/39), alexa fluor 488 (Thermo Fisher Scientific) in DPBS by incubation at room temperature for 1 h, followed by staining with 1 μg ml⁻¹ Hoechst 33342 dye (Thermo Fisher Scientific) at room temperature for 15 min. The samples were then imaged by Leica SP8 laser confocal microscope. The fluorescence intensity distribution on a line was quantified by ImageJ software.

After imaging, cells were digested with 1 mg ml⁻¹ proteinase K (Thermo Fisher Scientific) at 37 °C for 2 h. The nucleic acids were recovered by phenol-chloroform extraction (pH 8.0) and concentrated by ethanol precipitation. RNA was digested with 0.2 U μl RNase H (NEB) and 1:20 diluted RNase A/T1 (Thermo Fisher Scientific) in 50 μl of the RNase reaction buffer (50 mM Tris-HCl pH 7.5, 75 mM KCl, 10 mM MgCl₂, 10 mM DTT) at 37 °C for 1 h, followed by biotinylated cDNA enrichment using 10 μl preblocked Dynabeads MyOne Streptavidin C1 (Thermo Fisher Scientific) at room temperature for 20 min. The beads were preblocked with 1 μg μl⁻¹ UltraPure BSA (Thermo Fisher Scientific), 1 μg μl⁻¹ UltraPure Salmon Sperm DNA Solution (Thermo Fisher Scientific) and 1 μg μl⁻¹ Yeast transfer RNA (tRNA) (Thermo Fisher Scientific) with incubation at room temperature for 30 min before performing biotinylated cDNA enrichment.

Subsequently, the cDNA adapter ligation mixture was prepared by combining 50 mM Tris-HCl pH 7.5, 10 mM MgCl₂, 25% PEG 8000, 1 mM ATP, 1 U μl⁻¹ T4 RNA ligase 1 (NEB), and 5 μM of 3′ cDNA adapter (5′Phos-NNNNNNNNAGATCGGAAGAGCGTCGTGT-3′SpC3). The 3′ cDNA adapter ligation was performed by suspending the beads in the cDNA adapter ligation mixture and incubating at 25 °C for 16 h. The biotinylated cDNA was recovered using an elution buffer composed of 95% (v/v) formamide and 10 mM EDTA (pH 8.0) by boiling at 95 °C for 10 min, followed by ethanol precipitation. The cDNA was then dissolved in 10 μl of water.

For library amplification, 40 μl of mixture was prepared by mixing 1× NEBNext Ultra II Q5 Master Mix (NEB), 10 μl of cDNA solution and 0.5 μM Illumina sequencing primers, such as NEBNext Multiplex Oligos for Illumina (NEB catalog no. E7335S). The library PCR amplification followed this program: 98 °C for 30 s (98 °C for 10 s, 60 °C for 30 s, 65 °C for 45 s) for 13 cycles and 65 °C for 5 min; hold at 4 °C. The final libraries were purified using 6% Novex TBE Gel (Thermo Fisher Scientific) with size selection between 180 and 400 bp. Next-generation sequencing was carried out either at the University of Chicago Single Cell Immunophenotyping Core on an Illumina NextSeq 550 machine or Illumina NextSeq 2000 machine, or at the University of Chicago Genomics Facility on an Illumina NovaSeq 6000 platform.

RNase treatment in ARTR-seq

RNase treatment was incorporated into the ARTR-seq procedure with the following modifications: After permeabilization, Cells were incubated with 1 U μl⁻¹ RNase I (Thermo Fisher Scientific) at 37 °C for 5 min, followed by two washes with DPBS. For samples with strong RNase treatment, an additional RNase I treatment was conducted as previously described before initiating RT.

Dot blot

After the proteinase K digestion step in ARTR-seq, the total nucleic acids were recovered with Oligo Clean & Concentrator Kits (Zymo) to get rid of free biotinylated dNTP. The concentration of nucleic acids was measured by Nanodrop 8000 Spectrophotometer and adjusted to 50 ng µl⁻¹. Next, 1 µl of nucleic acids were loaded onto the Amersham Hybond- N+ membrane (GE Healthcare). Membranes were air-dried and crosslinked by UV strata linker 2400 at 150 mJ cm⁻² twice. The membranes were then blocked in 5% fatty-acid-free BSA in PBST at room temperature for 1 h, followed by incubation in streptavidin-HRP (Thermo Fisher Scientific) in PBST supplemented with 5% fatty-acid free BSA at room temperature for another 1 h. The membrane was washed with PBST four times before being supplied with SuperSignal West Dura Extended Duration Substrate kit (Thermo Fisher Scientific) and imaged by the FluroChem R machine (Proteinsimple).

ARTR-seq in the mouse embryo

C57 mouse embryo (E11) frozen tissue sections were purchased from Zyagen. The slide with frozen tissue sections was brought to room temperature for 10 min of incubation. The PAP pen was used to draw a circle around the mouse tissue on the slide, providing a thin film-like hydrophobic barrier for reagent incubation. Then the tissue was subjected to typical ARTR-seq procedures with the following change. The 2 μM adapter-barcoded RT primer (5′-AGACGTGTGCTCTTCCGATCT-(8 nt-barcode)-NNNNNNNNNN-3′) was applied for in situ RT.

ARTR-seq with low input

ARTR-seq was applied to 20 to 5,000 HepG2 cells with the following changes. 4% PFA was used to minimize cell loss for low-input samples. The 2 μM adapter-barcoded RT primer (5′-AGACGTGTGCTCTTCCGATCT-(8 nt-barcode)-NNNNNNNNNN-3′) was applied for in situ RT. After digestion of proteinase K, two biological replicates were pooled together for biotinylated cDNA enrichment, adapter ligation, library amplification and library sequencing. Sequence data were isolated based on the 8 nt barcode in adapter-barcoded RT primers.

Genome reference

Genome and the corresponding reference of Homo sapiens (GRCh38.p13, GENCODE Release 39), Mus musculus (GRCm39, GENCODE Release M29) and Drosophila melanogaster (BDGP6.32, Ensembl Release 107) were used for mapping the sequencing reads in this study. Ribosomal RNA (rRNA) reference sequences were downloaded from the National Center for Biotechnology Information (NCBI) for H. sapiens (NR_003285.3, NR_003286.4, NR_003287.4, NR_023363.1), M. musculus (NR_003278.3, NR_003279.1, NR_003280.2, NR_046156.1) and from FlyBase for D. melanogaster (5SrRNA-CR33353, 18SrRNA-CR45841, 5.8SrRNA-CR45842 and 28SrRNA-CR4584)

ARTR-seq primary data processing

Reads from the small cell number libraries containing cell barcodes were first demultiplexed with an in-house script using read 2. The adapter sequences were trimmed with Cutadapt⁵⁴ (v.4.2) using the parameter cutadapt–nextseq-trim=20 -a AGATCGGAAGAGCACACGTCTGAACTCCAG; the 8 nt unique molecular identifier sequences were moved and add to the read name for the further deduplication. An extra 4 nts at the reads’ 3′ end were removed from the adapter-free sequence to minimize mapping mismatch caused by the imperfect paired sequence in the random primer.

The reads were first mapped to the corresponding rRNA sequences using Bowtie2 (ref. ⁵⁵) (v.2.4.4) with parameters: –seedlen=15, and the mapped reads were discarded to avoid rRNA contamination. The remaining unmapped reads were mapped to the corresponding genome using STAR⁵⁶ (v.2.7.9a) with parameters: –readFilesCommand zcat–alignEndsType EndToEnd–genomeLoad NoSharedMemory–quantMode TranscriptomeSAM–alignMatesGapMax 15000–outFilterMultimapNmax 1–outFilterMultimapScoreRange 1–outSAMprimaryFlag AllBestScore–outSAMattributes All–outSAMtype BAM SortedByCoordinate–outFilterType BySJout–outReadsUnmapped Fastx–outFilterScoreMin 10–outFilterMatchNmin 24. Uniquely mapped reads were deduplicated to get the usable reads using UMI-tools⁵⁷ (v.1.1.2) with the parameter, –method unique. The usable reads were assigned to genomic regions with RNASeQC⁵⁸ (v.2.4.2) using default parameters. Deduplicated reads were assigned to genes with featureCounts⁵⁹ (v.2.0.3) for the calculation of Pearson’s correlation coefficient between biological replicates. For visualization in IGV⁶⁰ (v.2.13.1), .bam files of the usable reads were converted to bigWig with bamCoverage in the deepTools suite⁶¹ (v.3.5.1) with normalization by its respective sequencing depth using the parameters –normalizeUsing BPM–binSize 1. All the sample tracks were set to the same scale for display, except for the additional instruction noted in the legend.

Peaking calling

For peak calling, we first split the usable reads in one library into two .bam files containing reads aligned to the positive and negative strands, respectively. We used macs3 (ref. ⁶²) to identify peaks with default parameters, except for adding ‘–keep-dup all–nomodel –extsize 30’. macs3 gives the fold enrichment (signal value) and P value based on Poisson distribution, and corrects the P values for multiple comparison using the Benjamini–Hochberg correction. The peaks located in two strands were called separately using the corresponding strand read in the input libraries as background. The two peak files for one library were later combined. To generate the consensus motif for peaks, we first extended 20 nts to both upstream and downstream, and the overrepresented sequences were generated using findMotifsGenome.pl in the HOMER suite⁵² (v.4.11) with parameters: -rna -S 10 -len 5,6,7,8,9. Specifically, for motif generation for peaks in mouse tissue, the peak genomic coordinates were converted from mm39 to mm10 using liftOver from the UCSC Genome Browser⁶³. Peaks were assigned to specific genomic regions with in-house scripts, and the peaks overlapping two genomic regions were assigned to the region of longer overlapping base pairs. The peaks from the reader YTHDC1 were further assigned to repeats and other regions with annotatePeaks.pl in the HOMER suite.

Subsampling

To calculate the percentage of usable reads at different sequencing depths, we subsample the uniquely mapped reads with the samtools view in the Samtools suite⁶⁴ (v.1.16.1). For the comparison between small cell number input libraries for different methods, the sizes of all libraries were reduced to that of the smallest library. Specifically, instead of directly subsampling the fastq files, we subsampled the uniquely mapped reads to calculate the usable read percentage of each library.

Alternative splicing identification

The differential alternative splicing events of each gene were identified using rMATS (v.4.1.2). The RBP-knockdown RNA-seq libraries bam files and the corresponding control libraries’ .bam files with the annotation of ENCODE4 v.1.2.1 GRCh38 V29 were downloaded from the ENCODE and were analyzed by rMATS for the identification of five alternative splicing modes, including skipped exon, mutually exclusive exons, alternative 3′ splice site, alternative 5′ splice site and retained introns. Events of FDR ≥ 0.05 were discarded for the subsequent analysis.

ARTR-seq enrichment level at the gene level

To calculate the ARTR-seq enrichment at the gene level, we divided the reads in one library into two groups by whether they were in one specific gene, and had a pair of in–out read numbers for each of the IP and Input libraries. For each gene, we generated two-by-two tables for all the combinations of in–out read numbers between IP and Input libraries. The ARTR-seq enrichment for a gene is defined as the common odds ratio of the tables with significance determined by the Cochran–Mantel–Haenszel chi-squared test.

Data visualization and statistical analysis

Read heatmaps and profiles were generated with plotHeatmap and plotProfile in the deepTools suite⁶¹ (v.3.5.1), using genomic coordinates unless otherwise indicated. The splicing maps of splicing factors are generated by RBP-Maps³⁷ with default parameters in the ‘Plotting peaks’ mode (–peak), and the hg19 coordinates of native cassette exons and constitutive exons were downloaded from the software GitHub deposit. The peak genomic coordinates of the peaks for the splicing factors were first converted from GRCh38 to hg19 using liftOver from the UCSC Genome Browser⁶³. The random regions are random exonic regions with the same length as pooled ARTR-seq peaks from the three m⁶A reader proteins, generated by bedtools shuffle in the BEDTools suite⁶⁵ (v.2.30.0).

The meta distributions of binding peaks were generated by the R package Guitar⁶⁶ (v.2.16.0). All statistical analyses were performed with R⁶⁷, and all the plots were generated by the R package ggplot2 (ref. ⁶⁸) (v.3.4.1).

Quantification of ARTR-seq signal at the gene level

To analyze G3BP1 binding strength at the gene level, ARTR-seq reads were counted for genes in both G3BP1 and paired input samples, and FCs and significance between G3BP1 and input were determined by DESeq2 (ref. ⁶⁹). Only genes with the read sum equal to or greater than ten for G3BP1 and input samples were considered. RNA targets of G3BP1 were defined as those with a FC ≥ 2 and P < 0.05. Both FC and P value were calculated by DESeq2 with the default setting.

Clustering analysis of G3BP1 ARTR-seq signal

To track the changing pattern of G3BP1 binding single during the SG assembly, we used log₂FC (G3BP1/input) of genes to represent the G3BP1 binding signal, and performed fuzzy c-means clustering analysis on log₂FC by the Mfuzz package⁷⁰ (v.2.54.0). Only genes with the top 50% of the greatest standard deviation (s.d.) of log₂FC were considered, and the log₂FC values were scaled by z score before clustering. The cluster number was determined by the ‘Dmin’ function in the Mfuzz package. Clustering was calculated by the ‘mfuzz’ function in the Mfuzz package with 10,000 iterations with Euclidean distance as the clustering method. The membership values indicate the degree of association of genes with their respective clusters.

Functional enrichment analysis

KEGG enrichment analysis was carried out to compare G3BP1–RNA targets at different time points using the ‘compareCluster’ function in the clusterProfiler package⁵³ (v.4.4.4). The KEGG terms with adjusted P values less than 0.05 were visualized.

Statistics and reproducibility

Unless otherwise stated, a two-tailed Student’s t-test or Wilcoxon test were performed to assess the statistical significance between groups. The resulting P values are indicated in the figure or legends. For boxplots, the box represents the 25th to 75th percentiles with a line at the median, whiskers to 1.5 times the interquartile range, a dot at the mean (if applicable) and outliers omitted. Immunofluorescence imaging experiments were repeated in at least two biological samples with consistent results.

Reporting summary

Further information on research design is available in the Nature Portfolio Reporting Summary linked to this article.

Data availability

All the sequencing data generated in this study have been deposited in the NCBI’s Gene Expression Omnibus (GEO) under the accession number GSE226161. Previously published data are available under accession numbers GSE42701 (CLIP-seq²⁶), ENCSR384KAN and ENCSR981WKN (eCLIP²⁸), E-MTAB-3108 (iCLIP²⁷), GSE78832 (irCLIP¹⁰), GSE137925 (LACE-seq¹³), GSE92995 (sCLIP¹¹), DRA005743 (tRIP-seq¹²) and GSE195654 (RT&Tag²²). The data were downloaded and processed as described in the articles. The processed .bam files of RNA-seq data for knockdown HNRNPC, PTBP1 and RBFOX2, along with their corresponding control data, were downloaded from ENCODE portal²⁸ under the accession numbers of ENCSR052IYH, ENCSR305XWT, ENCSR767LLP, ENCSR104ABF, ENCSR064DXG and ENCSR603TCV. The published PAR-CLIP data and the corresponding peaks for YTHDF2 are available under the GEO accession number GSE49339. The m⁶A modification sites identified by m⁶A-SAC-seq are available under the GEO accession number GSE198246. Source data are provided with this paper.

Code availability

Codes for processing ARTR-seq data are available in the following GitHub repository: https://github.com/mingming-cgz/ARTR-seq.

References

Gerstberger, S., Hafner, M. & Tuschl, T. A census of human RNA-binding proteins. Nat. Rev. Genet. 15, 829–845 (2014).
Article CAS PubMed Google Scholar
Gebauer, F., Schwarzl, T., Valcarcel, J. & Hentze, M. W. RNA-binding proteins in human genetic disease. Nat. Rev. Genet. 22, 185–198 (2021).
Article CAS PubMed Google Scholar
Lerner, M. R. & Steitz, J. A. Antibodies to small nuclear RNAs complexed with proteins are produced by patients with systemic lupus erythematosus. Proc. Natl Acad. Sci. USA 76, 5495–5499 (1979).
Article CAS PubMed PubMed Central ADS Google Scholar
Tenenbaum, S. A., Carson, C. C., Lager, P. J. & Keene, J. D. Identifying mRNA subsets in messenger ribonucleoprotein complexes by using cDNA arrays. Proc. Natl Acad. Sci. USA 97, 14085–14090 (2000).
Article CAS PubMed PubMed Central ADS Google Scholar
Ule, J. et al. CLIP identifies Nova-regulated RNA networks in the brain. Science 302, 1212–1215 (2003).
Article CAS PubMed ADS Google Scholar
Licatalosi, D. D. et al. HITS-CLIP yields genome-wide insights into brain alternative RNA processing. Nature 456, 464–469 (2008).
Article CAS PubMed PubMed Central ADS Google Scholar
Hafner, M. et al. Transcriptome-wide identification of RNA-binding protein and microRNA target sites by PAR-CLIP. Cell 141, 129–141 (2010).
Article CAS PubMed PubMed Central Google Scholar
Konig, J. et al. iCLIP reveals the function of hnRNP particles in splicing at individual nucleotide resolution. Nat. Struct. Mol. Biol. 17, 909–915 (2010).
Article PubMed PubMed Central Google Scholar
Van Nostrand, E. L. et al. Robust transcriptome-wide discovery of RNA-binding protein binding sites with enhanced CLIP (eCLIP). Nat. Methods 13, 508–514 (2016).
Article PubMed PubMed Central Google Scholar
Zarnegar, B. J. et al. irCLIP platform for efficient characterization of protein-RNA interactions. Nat. Methods 13, 489–492 (2016).
Article CAS PubMed PubMed Central Google Scholar
Kargapolova, Y., Levin, M., Lackner, K. & Danckwardt, S. sCLIP-an integrated platform to study RNA-protein interactomes in biomedical research: identification of CSTF2tau in alternative processing of small nuclear RNAs. Nucleic Acids Res. 45, 6074–6086 (2017).
Article CAS PubMed PubMed Central Google Scholar
Masuda, A. et al. tRIP-seq reveals repression of premature polyadenylation by co-transcriptional FUS-U1 snRNP assembly. EMBO Rep. 21, e49890 (2020).
Article CAS PubMed PubMed Central Google Scholar
Su, R. et al. Global profiling of RNA-binding protein target sites by LACE-seq. Nat. Cell Biol. 23, 664–675 (2021).
Article CAS PubMed Google Scholar
Blue, S. M. et al. Transcriptome-wide identification of RNA-binding protein binding sites using seCLIP-seq. Nat. Protoc. 17, 1223–1265 (2022).
Article CAS PubMed Google Scholar
Lorenz, D. A. et al. Multiplexed transcriptome discovery of RNA-binding protein binding sites by antibody-barcode eCLIP. Nat. Methods 20, 65–69 (2023).
Article CAS PubMed Google Scholar
McMahon, A. C. et al. TRIBE: hijacking an RNA-editing enzyme to identify cell-specific targets of RNA-binding proteins. Cell 165, 742–753 (2016).
Article CAS PubMed PubMed Central Google Scholar
Brannan, K. W. et al. Robust single-cell discovery of RNA targets of RNA-binding proteins and ribosomes. Nat. Methods 18, 507–519 (2021).
Article CAS PubMed PubMed Central Google Scholar
Nguyen, D. T. T. et al. HyperTRIBE uncovers increased MUSASHI-2 RNA binding activity and differential regulation in leukemic stem cells. Nat. Commun. 11, 2026 (2020).
Article CAS PubMed PubMed Central ADS Google Scholar
Xu, W., Rahman, R. & Rosbash, M. Mechanistic implications of enhanced editing by a HyperTRIBE RNA-binding protein. RNA 24, 173–182 (2018).
Article CAS PubMed PubMed Central Google Scholar
Flamand, M. N., Ke, K., Tamming, R. & Meyer, K. D. Single-molecule identification of the target RNAs of different RNA binding proteins simultaneously in cells. Genes Dev. 36, 1002–1015 (2022).
CAS PubMed PubMed Central Google Scholar
Meyer, K. D. DART-seq: an antibody-free method for global m⁶A detection. Nat. Methods 16, 1275–1280 (2019).
Article CAS PubMed PubMed Central Google Scholar
Khyzha, N., Henikoff, S. & Ahmad, K. Profiling RNA at chromatin targets in situ by antibody-targeted tagmentation. Nat. Methods 19, 1383–1392 (2022).
Article CAS PubMed PubMed Central Google Scholar
Kaya-Okur, H. S. et al. CUT&Tag for efficient epigenomic profiling of small samples and single cells. Nat. Commun. 10, 1930 (2019).
Article PubMed PubMed Central ADS Google Scholar
Anzalone, A. V. et al. Search-and-replace genome editing without double-strand breaks or donor DNA. Nature 576, 149–157 (2019).
Article CAS PubMed PubMed Central ADS Google Scholar
Potter, R. J. & Rosenthal, K. High fidelity reverse transcriptases and uses thereof. US patent US7056716B2 (2006).
Coelho, M. B. et al. Nuclear matrix protein Matrin3 regulates alternative splicing and forms overlapping regulatory networks with PTB. EMBO J. 34, 653–668 (2015).
Article MathSciNet CAS PubMed PubMed Central Google Scholar
Xue, Y. et al. Direct conversion of fibroblasts to neurons by reprogramming PTB-regulated microRNA circuits. Cell 152, 82–96 (2013).
Article CAS PubMed PubMed Central Google Scholar
The ENCODE Project Consortium. An integrated encyclopedia of DNA elements in the human genome. Nature 489, 57–74 (2012).
Fred, R. G., Tillmar, L. & Welsh, N. The role of PTB in insulin mRNA stability control. Curr. Diabetes Rev. 2, 363–366 (2006).
Article CAS PubMed Google Scholar
Xue, Y. et al. Genome-wide analysis of PTB-RNA interactions reveals a strategy used by the general splicing repressor to modulate exon inclusion or skipping. Mol. Cell 36, 996–1006 (2009).
Article CAS PubMed PubMed Central Google Scholar
Van Nostrand, E. L. et al. A large-scale binding and functional map of human RNA-binding proteins. Nature 583, 711–719 (2020).
Article PubMed PubMed Central ADS Google Scholar
Wang, X. et al. N⁶-methyladenosine-dependent regulation of messenger RNA stability. Nature 505, 117–120 (2014).
Article PubMed ADS Google Scholar
Ge, R. et al. m⁶A-SAC-seq for quantitative whole transcriptome m⁶A profiling. Nat. Protoc. 18, 626–657 (2023).
Article CAS PubMed Google Scholar
Hafner, M. et al. CLIP and complementary methods. Nat. Rev. Methods Prim. 1, 20 (2021).
Article CAS Google Scholar
Dvinge, H. Regulation of alternative mRNA splicing: old players and new perspectives. FEBS Lett. 592, 2987–3006 (2018).
Article CAS PubMed Google Scholar
Luo, Y. et al. New developments on the Encyclopedia of DNA Elements (ENCODE) data portal. Nucleic Acids Res. 48, D882–D889 (2020).
Article CAS PubMed Google Scholar
Yee, B. A., Pratt, G. A., Graveley, B. R., Van Nostrand, E. L. & Yeo, G. W. RBP-Maps enables robust generation of splicing regulatory maps. RNA 25, 193–204 (2019).
Article CAS PubMed PubMed Central Google Scholar
Shi, H., Wei, J. & He, C. Where, when, and how: context-dependent functions of RNA methylation writers, readers, and erasers. Mol. Cell 74, 640–650 (2019).
Article CAS PubMed PubMed Central Google Scholar
Wang, X. et al. N⁶-methyladenosine modulates messenger RNA translation efficiency. Cell 161, 1388–1399 (2015).
Article CAS PubMed PubMed Central Google Scholar
Roundtree, I. A. et al. YTHDC1 mediates nuclear export of N⁶-methyladenosine methylated mRNAs. eLlife 6, e31311 (2017).
Article Google Scholar
Liu, J. et al. N⁶-methyladenosine of chromosome-associated regulatory RNA regulates chromatin state and transcription. Science 367, 580–586 (2020).
Article CAS PubMed PubMed Central ADS Google Scholar
Zou, Z., Sepich-Poore, C., Zhou, X., Wei, J. & He, C. The mechanism underlying redundant functions of the YTHDF proteins. Genome Biol. 24, 17 (2023).
Article CAS PubMed PubMed Central Google Scholar
Yang, P. et al. G3BP1 is a tunable switch that triggers phase separation to assemble stress granules. Cell 181, 325–345 e328 (2020).
Article CAS PubMed PubMed Central Google Scholar
Protter, D. S. W. & Parker, R. Principles and properties of stress granules. Trends Cell Biol. 26, 668–679 (2016).
Article CAS PubMed PubMed Central Google Scholar
Wheeler, J. R., Matheny, T., Jain, S., Abrisch, R. & Parker, R. Distinct stages in stress granule assembly and disassembly. eLlife 5, e18413 (2016).
Article Google Scholar
Khong, A. et al. The stress granule transcriptome reveals principles of mRNA accumulation in stress granules. Mol. Cell 68, 808–820 e805 (2017).
Article CAS PubMed PubMed Central Google Scholar
Chou, R. H. & Huang, H. Sodium arsenite suppresses human papillomavirus-16 E6 gene and enhances apoptosis in E6-transfected human lymphoblastoid cells. J. Cell. Biochem. 84, 615–624 (2002).
Article PubMed Google Scholar
Sun, H. et al. Sodium arsenite-induced learning and memory impairment is associated with endoplasmic reticulum stress-mediated apoptosis in rat hippocampus. Front. Mol. Neurosci. 10, 286 (2017).
Article PubMed PubMed Central Google Scholar
Henikoff, S. & Ahmad, K. In situ tools for chromatin structural epigenomics. Protein Sci. 31, e4458 (2022).
Article CAS PubMed PubMed Central Google Scholar
Lopes, I., Altab, G., Raina, P. & de Magalhaes, J. P. Gene size matters: an analysis of gene length in the human genome. Front. Genet. 12, 559998 (2021).
Article CAS PubMed PubMed Central Google Scholar
Irgen-Gioro, S., Yoshida, S., Walling, V. & Chong, S. Fixation can change the appearance of phase separation in living cells. eLife 11, e79903 (2022).
Article CAS PubMed PubMed Central Google Scholar
Heinz, S. et al. Simple combinations of lineage-determining transcription factors prime cis-regulatory elements required for macrophage and B cell identities. Mol. Cell 38, 576–589 (2010).
Article CAS PubMed PubMed Central Google Scholar
Wu, T. et al. clusterProfiler 4.0: a universal enrichment tool for interpreting omics data. Innov. 2, 100141 (2021).
CAS Google Scholar
Martin, M. Cutadapt removes adapter sequences from high-throughput sequencing reads. EMBnet. J. 17, 10–12 (2011).
Article Google Scholar
Langmead, B. & Salzberg, S. L. Fast gapped-read alignment with Bowtie 2. Nat. Methods 9, 357–359 (2012).
Article CAS PubMed PubMed Central Google Scholar
Dobin, A. et al. STAR: ultrafast universal RNA-seq aligner. Bioinformatics 29, 15–21 (2013).
Article CAS PubMed Google Scholar
Smith, T., Heger, A. & Sudbery, I. UMI-tools: modeling sequencing errors in unique molecular identifiers to improve quantification accuracy. Genome Res. 27, 491–499 (2017).
Article CAS PubMed PubMed Central Google Scholar
Graubert, A., Aguet, F., Ravi, A., Ardlie, K. G. & Getz, G. RNA-SeQC 2: efficient RNA-seq quality control and quantification for large cohorts. Bioinformatics 37, 3048–3050 (2021).
Article CAS PubMed PubMed Central Google Scholar
Liao, Y., Smyth, G. K. & Shi, W. featureCounts: an efficient general purpose program for assigning sequence reads to genomic features. Bioinformatics 30, 923–930 (2014).
Article CAS PubMed Google Scholar
Robinson, J. T. et al. Integrative Genomics Viewer. Nat. Biotechnol. 29, 24–26 (2011).
Article CAS PubMed PubMed Central Google Scholar
Ramirez, F., Dundar, F., Diehl, S., Gruning, B. A. & Manke, T. deepTools: a flexible platform for exploring deep-sequencing data. Nucleic Acids Res. 42, W187–W191 (2014).
Article CAS PubMed PubMed Central Google Scholar
Zhang, Y. et al. Model-based analysis of ChIP-Seq (MACS). Genome Biol. 9, R137 (2008).
Article PubMed PubMed Central Google Scholar
Kent, W. J. et al. The human genome browser at UCSC. Genome Res 12, 996–1006 (2002).
Article CAS PubMed PubMed Central Google Scholar
Danecek, P. et al. Twelve years of SAMtools and BCFtools. Gigascience 10, giab008 (2021).
Article PubMed PubMed Central Google Scholar
Quinlan, A. R. & Hall, I. M. BEDTools: a flexible suite of utilities for comparing genomic features. Bioinformatics 26, 841–842 (2010).
Article CAS PubMed PubMed Central Google Scholar
Cui, X. et al. Guitar: an R/Bioconductor package for gene annotation guided transcriptomic analysis of RNA-related genomic features. BioMed. Res. Int. 2016, 8367534 (2016).
Article PubMed PubMed Central Google Scholar
R Core Team. R: A Language and Environment for Statistical Computing (R Foundation for Statistical Computing, 2014); https://www.R-project.org/
Wickham, H. ggplot2: Elegant Graphics for Data Analysis (Springer, 2009).
Love, M. I., Huber, W. & Anders, S. Moderated estimation of fold change and dispersion for RNA-seq data with DESeq2. Genome Biol. 15, 550 (2014).
Article PubMed PubMed Central Google Scholar
Kumar, L. & Futschik, M. E. Mfuzz: a software package for soft clustering of microarray data. Bioinformation 2, 5–7 (2007).
Article PubMed PubMed Central Google Scholar
Oscorbin, I. P., & Filipenko, M. L. M-MuLV reverse transcriptase: selected properties and improved mutants. Comput. Struct. Biotechnol. J 19, 6315–6327 (2021).
Article CAS PubMed PubMed Central Google Scholar

Download references

Acknowledgements

We thank all He laboratory members for the discussions. We thank P. Faber and other staff of the University of Chicago Genomics Facility for sequencing support. This work was supported by US National Institutes of Health grant no. RM1 HG008935 (C.H.). C.H. is an investigator of the Howard Hughes Medical Institute. The funders had no role in study design, data collection and analysis, decision to publish or preparation of the paper.

Author information

These authors contributed equally: Yu Xiao, Yan-Ming Chen.

Authors and Affiliations

Department of Chemistry, The University of Chicago, Chicago, IL, USA
Yu Xiao, Yan-Ming Chen, Zhongyu Zou, Chang Ye, Xiaoyang Dou, Chang Liu, Shun Liu, Hao Yan, Pingluan Wang, Tie-Bo Zeng, Qinzhe Liu, Weixin Tang & Chuan He
Department of Biochemistry and Molecular Biology, The University of Chicago, Chicago, IL, USA
Yu Xiao, Yan-Ming Chen, Zhongyu Zou, Chang Ye, Xiaoyang Dou, Jinjun Wu, Chang Liu, Shun Liu, Pingluan Wang, Tie-Bo Zeng, Qinzhe Liu, Jingyi Fei & Chuan He
Institute for Biophysical Dynamics, The University of Chicago, Chicago, IL, USA
Yu Xiao, Yan-Ming Chen, Zhongyu Zou, Chang Ye, Xiaoyang Dou, Jinjun Wu, Chang Liu, Shun Liu, Hao Yan, Pingluan Wang, Tie-Bo Zeng, Qinzhe Liu, Jingyi Fei, Weixin Tang & Chuan He
Howard Hughes Medical Institute, Chicago, IL, USA
Yu Xiao, Yan-Ming Chen, Zhongyu Zou, Chang Ye, Xiaoyang Dou, Chang Liu, Shun Liu, Pingluan Wang, Tie-Bo Zeng, Qinzhe Liu & Chuan He

Authors

Yu Xiao
View author publications
You can also search for this author in PubMed Google Scholar
Yan-Ming Chen
View author publications
You can also search for this author in PubMed Google Scholar
Zhongyu Zou
View author publications
You can also search for this author in PubMed Google Scholar
Chang Ye
View author publications
You can also search for this author in PubMed Google Scholar
Xiaoyang Dou
View author publications
You can also search for this author in PubMed Google Scholar
Jinjun Wu
View author publications
You can also search for this author in PubMed Google Scholar
Chang Liu
View author publications
You can also search for this author in PubMed Google Scholar
Shun Liu
View author publications
You can also search for this author in PubMed Google Scholar
Hao Yan
View author publications
You can also search for this author in PubMed Google Scholar
Pingluan Wang
View author publications
You can also search for this author in PubMed Google Scholar
Tie-Bo Zeng
View author publications
You can also search for this author in PubMed Google Scholar
Qinzhe Liu
View author publications
You can also search for this author in PubMed Google Scholar
Jingyi Fei
View author publications
You can also search for this author in PubMed Google Scholar
Weixin Tang
View author publications
You can also search for this author in PubMed Google Scholar
Chuan He
View author publications
You can also search for this author in PubMed Google Scholar

Contributions

Y.X. and C.H. conceived and designed the study. Y.X. and C.H. designed and supervised the experiments with the help from Y.-M.C. Y.-M.C. and Y.X. performed computation analysis with help from C.Y., X.D. and S.L. Y.X. performed most experiments with help from Y.-M.C., Z.Z., J.W., H.Y., P.W., T.-B.Z., Q.L., J.F. and W.T. Y.X., Y.-M.C. and C.H. wrote the paper with input from C.Y., X.D. and C.L. All authors discussed the results and commented on the paper.

Corresponding author

Correspondence to Chuan He.

Ethics declarations

Competing interests

C.H. is a scientific founder, a member of the scientific advisory board and equity holder of Aferna Green, Inc. and AccuaDX Inc. and a scientific cofounder and equity holder of Accent Therapeutics. The remaining authors declare no competing interests.

Peer review

Peer review information

Nature Methods thanks Peng Jin, Yuanchao Xue and the other, anonymous, reviewer(s) for their contribution to the peer review of this work. Primary Handling Editor: Lei Tang and Hui Hua, in collaboration with the Nature Methods team.

Additional information

Publisher’s note Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Extended data

Extended Data Fig. 1 ARTR-seq setup and condition optimization.

a, Functional domains of the engineered MMLV RTase (H8Y, D200N, T306K, W313F, T330P, D524G, L603W)^24,25. The MMLV RTase (full length) is composed of three domains: polymerase (red), connection (purple) and RNase H (yellow). The RNase H domain and the first 24 N-terminal residues were omitted in MMLV RTase (25-497) to improve its RT activity⁷¹. b, Coomassie bright blue staining of three purified pAG-RTase fusion constructs with a link length of 30 amino acids. c, qRT-PCR analysis for ACTB, METTL14 and RBM15 showing the relative RT activity of three tested purified pAG-RTase fusion proteins. Two commercial RTases SuperScript II and SuperScript III were loaded as positive controls. n = 3 biological replicates. d, qRT-PCR showing the relative RT efficiency using the indicated random primers. pAG-MMLV RTase fusion protein (25-497) was used in this analysis. n = 2 biological replicates. e, qRT-PCR showing the effects of different biotinylated dNTPs on the relative RT efficiency using pAG-MMLV RTase (25-497). Biotin-16-dUTP and biotin-16-dCTP exhibited the least hindrance to RT efficiency. Both were used in the ARTR-seq procedure by mixing with regular dTTP and dCTP at a 1:1 ratio. n = 1 biological replicate. f, Immunofluorescence (IF) imaging of the secondary antibody (2nd Ab, yellow), pAG-RTase (red), newly synthesized cDNA (green), and nucleus (blue) for RBFOX2 ARTR-seq. Scale bars, 10 μm. g, qPCR analysis to measure relative cDNA yields of ARTR-seq samples. n = 2 biological replicates.

Source data

Extended Data Fig. 2 Reads comparison between ARTR-seq and other methods.

a, Bar plots showing numbers (left) and percentages (right) of the usable reads and reads filtered after each processing step for libraries constructed by using ARTR-seq, CLIP, eCLIP, iCLIP, irCLIP, LACE-seq, sCLIP, tRIP-seq, and RT&Tag, respectively. The usable reads are defined as reads uniquely mapped to the genome and remained after PCR deduplication. b, Percentages of usable reads in subsampled uniquely mapped reads from PTBP1 libraries constructed by ARTR-seq, CLIP, eCLIP, iCLIP, irCLIP, LACE-seq, sCLIP, and tRIP-seq, respectively. c, Snapshots from Integrative Genomics Viewer (IGV) showing the read coverage of ARTR-seq libraries. The read coverage of each library was normalized by its respective sequencing depth. All tracks were set to the same scale. Regions with higher read coverage in PTBP1 ARTR-seq libraries compared to the input libraries represent true positive RBP binding signals (enriched, orange); regions with lower or comparable read coverage indicate background noise signals (not enriched, blue). Input libraries were applied to help filter out not-enriched regions. d, A bar plot showing the usable reads distribution in the intronic (purple), intergenic (grey) and exonic (green) regions for libraries constructed by using ARTR-seq, CLIP, eCLIP, iCLIP, irCLIP, LACE-seq, sCLIP, and tRIP-seq, respectively. About 30% of usable reads for the ARTR-seq input samples were located in introns. e, Meta distributions of PTBP1 ARTR-seq peaks along mRNA transcripts and flanking 1 kb regions.

Extended Data Fig. 3 Comparison between ARTR-seq and other methods.

a. The signal profiles and heatmaps of read density from ARTR-seq library reads at the eCLIP-identified PTBP1 peaks in HepG2 cells²⁸. b, Signal profiles and heatmaps of read density from ARTR-seq and LACE-seq library at the eCLIP-identified PTBP1 peaks in K562 cells^13,28. c, Heatmaps exhibiting the transcriptome-wide pairwise overlap of PTBP1-targeted genes (top) or peaks (bottom) among libraries from ARTR-seq, eCLIP²⁸, and LACE-seq¹³ and iCLIP²⁷ using the same cell line. Notably, the iCLIP data from the HeLa S3 cell line was compared with ARTR-seq using the HeLa cell line and LACE-seq using the HeLa cell line. The overlap proportion was determined as the number of detected genes (or peaks) overlapped between sample A and sample B divided by the total number of detected genes (or peaks) in sample A. The maximum gap between overlapping peaks was set at 200 nt. The overlap proportion of genes (or peaks) and the cell line of sample A were labeled in the corresponding position. d, IGV snapshots showing the read coverage of ARTR-seq libraries corresponding to Fig. 2c. The read coverage of each library was normalized by its respective sequencing depth. According to the ARTR-seq library types (input and PTBP1), the tracks were adjusted to distinct scales. e, Western blot (left) and quantification (right) displaying PTBP1 protein levels in control (siCtrl) and PTBP1 knockdown (siPTBP1) HepG2 cells. GAPDH was used as an internal control for normalization.

Source data

Extended Data Fig. 4 The direct versus indirect binding in ARTR-seq.

a, A schematic diagram illustrating the simplified direct and indirect targets of the RNA binding protein (RBP). The symbol “X” represents the interacting protein or complex of the RBP. b, Cumulative curves displaying the proportion of RBFOX2 peaks with peak centers located within a certain absolute distance on the genome from the nearest RBFOX2 canonical motif ‘UGCAUG’ for both ARTR-seq and eCLIP²⁸ (ENCODE: ENCFF871NYM). c, Boxplots showing ARTR-seq peaks exhibiting reduced signal values (top) and q-values (bottom) as the absolute distance to the nearest ‘UGCAUG’ site increases. From left to right, n = 2164, 1263, 489, 677 and 869. d, A bar plot illustrating the impact of signal value cutoffs and q-value cutoffs on the proportion of RBFOX2 peaks within an absolute distance of 500 nts from the closest RBFOX2 canonical motif ‘UGCAUG’. The number of remaining peaks was labeled at the top of the bar after applying the cutoffs. e, Cumulative curves exhibiting the proportion of YTHDF2 peaks with peak centers located within a certain absolute distance on the transcriptome from the nearest m⁶A sites identified by m⁶A-SAC-seq³³ for both ARTR-seq and PAR-CLIP³².

Extended Data Fig. 5 Optimizations for reducing potential indirect binding in ARTR-seq.

a, A schematic diagram demonstrating the binding of protein A/G-reverse transcriptase (pAG-RTase), secondary antibody (2nd Ab), and the primary antibody (1st Ab). b, A schematic diagram showing the constructs of pAG-RTases with different amino acid (aa) linker lengths: 3 aa for pRT3, 13 aa for pRT13, and 30 aa for pRT30. c, Coomassie bright blue staining of the purified pRT3, pRT13 and pRT30. d, qRT-PCR analysis for GAPDH, ACTB, METTL14 and RBM15 showing the in-vitro RT efficiency of pAG-RTases with different linker lengths. n = 2 biological replicates. e, qPCR analysis to quantify relative cDNA yields of RBFOX2 ARTR-seq samples. n = 2 biological replicates. f, Signal profiles of ARTR-seq read density at RBFOX2 ARTR-seq peaks and flanking 0.3 kb. g, A snapshot from IGV showing signals of ARTR-seq libraries. h, Cumulative curves displaying the proportion of the top 3000 RBFOX2 peaks (with the highest signal values) with peak centers located within a certain absolute distance from the nearest ‘UGCAUG’ for ARTR-seq libraries constructed under different conditions.

Source data

Extended Data Fig. 6 The resolution of ARTR-seq.

a, Density plots showing the distribution of peak centers within a 400-nt window in genome flanking the RBFOX2 canonical binding motif ‘UGCAUG’ for RBFOX2 and PTBP1 (negative control) ARTR-seq libraries. b, Density plots showing the distribution of peak centers within a 400-nt window in transcriptome flanking m⁶A sites in HeLa cells identified by m⁶A-SAC-seq³³ for the YTHDF2 ARTR-seq libraries. The distributions in a,b are split into three groups based on the peak signal values. c, Density plots exhibiting the distribution of fragment length for ARTR-seq libraries with or without RNase treatment. d, Density plots showing the distribution of RBFOX2 peak centers within a 400-nt window flanking the RBFOX2 canonical binding motif ‘UGCAUG’ for ARTR-seq libraries without RNase treatment, with weak RNase I treatment and with strong RNase I treatment. The distribution is split into three groups based on the peak signal values. e, qPCR analysis to quantify the relative cDNA yields of ARTR-seq. n = 2 biological replicates. f, Cumulative curves displaying the proportion of the top 3000 RBFOX2 peaks (with the highest signal values) with peak centers located within a certain absolute distance from the nearest ‘UGCAUG’ for ARTR-seq libraries constructed with or without RNase treatment.

Source data

Extended Data Fig. 7 Application of ARTR-seq using low input samples.

a, ARTR-seq replicate correlations for usable reads per gene normalized to coverage (RPM) for PTBP1 with different numbers of HepG2 cells. The color scale shows the point density. The coefficient R and P-values were given by the two-tailed Pearson’s correlation. b, Bar plots showing numbers (left) and percentages (right) of the usable reads and reads filtered after each processing step for libraries constructed from different cell numbers by ARTR-seq, LACE-seq¹³, and RT&Tag²². The libraries generated by the same method are linked with the line and indicated in the same color. c, Percentages of usable reads in subsampled uniquely mapped reads from PTBP1 libraries constructed from different numbers of cells by ARTR-seq, LACE-seq¹³, and RT&Tag²², respectively. Different methods are indicated by colors. d, A bar plot showing the usable reads distribution in the intronic (purple), intergenic (grey) and exonic (green) regions of libraries constructed from different numbers of cells by ARTR-seq and LACE-seq, respectively¹³. The libraries generated by the same method are linked with the line and indicated with the same color. e, The signal profile and heatmap of read density in LACE-seq with different numbers of cells at LACE-seq-identified PTBP1 peaks¹³.

Extended Data Fig. 8 ARTR-seq-detected binding signals of splicing factors.

a, Bar plots showing numbers (left) and percentages (right) of the usable reads and reads filtered in each processing step for ARTR-seq libraries of PTBP1, RBFOX2 and HNRNPC. b, A bar plot showing the usable reads distribution in the intronic (purple), intergenic (grey) and exonic (green) regions for ARTR-seq libraries of PTBP1, RBFOX2 and HNRNPC. c, Cumulative curves and boxplots (inside) showing the absolute value of splicing difference upon RBFOX2 (left) or HNRNPC (right) knockdown. d, Cumulative curves and boxplots (inside) showing the absolute value of splicing differences of included RI upon PTBP1 knockdown. In c,d, RBP-regulated genes were divided into three groups according to their enrichment in ARTR-seq, including no enrichment (No, 0 ≤ enrichment ≤ 1), low enrichment (Low, 1 < enrichment ≤ 2) and high enrichment (High, 2 < enrichment). The sample size in c,d was indicated below the respective box. P-values in c,d were determined using the two-tailed Student’s t-test of indicated group versus ‘no enrichment’ group.

Extended Data Fig. 9 Binding features of m⁶A reader proteins detected by ARTR-seq.

a, IF imaging showing the subcellular localization of YTHDF1, YTHDF2, and YTHDC1 in HeLa cells. Scale bars, 5 μm. b, ARTR-seq replicate correlations for usable reads per gene normalized to coverage (RPM) for YTHDF1, YTHDF2, and YTHDC1. The color scale shows the point density. The coefficient R and P-values were given by the two-tailed Pearson’s correlation. c, Distribution of usable reads in the intronic (purple), intergenic (grey) and exonic (green) regions for ARTR-seq libraries of the individual m⁶A binding proteins. d, Venn plot illustrating overlap of peaks identified by ARTR-seq for YTHDF1, YTHDF2 and YTHDC1. e, Aggregation profiles showing the meta distributions of binding peaks along mRNA transcript detected in two biological ARTR-seq replicates for YTHDF1 (green), YTHDF2 (purple), and YTHDC1 (orange). f, Cumulative curves and boxplots (inside) showing the log₂ peak enrichment of ARTR-seq targets for YTHDC1. YTHDC1 peaks were divided into four groups according to the modification fraction of the containing m⁶A (sum value) quantified by m⁶A-SAC-seq³³. The peaks without m⁶A were categorized in one group (No), and other peaks were divided into three groups with an equal number of peaks, including low m⁶A fraction (Low), medium m⁶A fraction (Medium) and high m⁶A fraction (High). The sample size was indicated below the respective box. P-values were determined by the two-tailed Student’s t-test of indicated group versus ‘no m⁶A’ group.

Extended Data Fig. 10 G3BP1 binding at different time intervals during SG assembly captured by ARTR-seq.

a, The Pearson correlation heatmap among time intervals of ARTR-seq results based on G3BP1 binding strength. G3BP1 binding strength is defined as ARTR-seq read log₂FC(G3BP1/input). Pairwise correlation coefficients were indicated as circle size and noted in each circle. b, A heatmap exhibiting stable G3BP1 binding strength of selected RNAs at different time intervals, organized by hierarchical clustering. RNAs were ranked from small to large according to the standard deviation (SD) of G3BP1 binding intensity over different time intervals, and the top 5% of RNAs were selected for clustering (n = 677). The dendrogram was constructed using complete linkage based on Euclidean distance. c, IGV snapshots (top) of two RNAs with stable G3BP1 binding strength in ARTR-seq, with each panel normalized by CPM. d, IGV snapshots showing RNAs with gradually decreased (left) and increased (right) G3BP1 binding strength. Each panel was normalized by CPM. Heatmaps (bottom) in c,d show G3BP1 binding strength in ARTR-seq with the size of the circle representing its absolute value.

Supplementary information

Supplementary Information

Supplementary Fig. 1 and Tables 1–3.

Reporting Summary

Source data

Source Data Fig. 1

Unprocessed dot blot.

Source Data Fig. 1

Statistical source data.

Source Data Extended Data Fig. 1

Unprocessed gel.

Source Data Extended Data Fig. 1

Statistical source data.

Source Data Extended Data Fig. 3

Unprocessed western blots.

Source Data Extended Data Fig. 3

Statistical source data.

Source Data Extended Data Fig. 5

Unprocessed gel.

Source Data Extended Data Fig. 5

Statistical source data.

Source Data Extended Data Fig. 6

Statistical source data.

Rights and permissions

Open Access This article is licensed under a Creative Commons Attribution 4.0 International License, which permits use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons license, and indicate if changes were made. The images or other third party material in this article are included in the article’s Creative Commons license, unless indicated otherwise in a credit line to the material. If material is not included in the article’s Creative Commons license and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder. To view a copy of this license, visit http://creativecommons.org/licenses/by/4.0/.

Reprints and permissions

About this article

Cite this article

Xiao, Y., Chen, YM., Zou, Z. et al. Profiling of RNA-binding protein binding sites by in situ reverse transcription-based sequencing. Nat Methods 21, 247–258 (2024). https://doi.org/10.1038/s41592-023-02146-w

Download citation

Received: 05 March 2023
Accepted: 07 December 2023
Published: 10 January 2024
Issue Date: February 2024
DOI: https://doi.org/10.1038/s41592-023-02146-w

This article is cited by

Structure-based prediction and characterization of photo-crosslinking in native protein–RNA complexes
- Huijuan Feng
- Xiang-Jun Lu
- Chaolin Zhang
Nature Communications (2024)

Subjects

Abstract

Similar content being viewed by others

Main

Results

Strategy and development of ARTR-seq

Validation of ARTR-seq using PTBP1

Direct versus indirect binding sites detected by ARTR-seq

Resolution of ARTR-seq

ARTR-seq detects PTBP1 binding sites with as few as 20 cells

Application of ARTR-seq in mouse embryo sections

ARTR-seq profiles regulatory features of splicing factors

ARTR-seq identifies binding features of m6A reader proteins

Dynamic RNA binding of G3BP1 during SG assembly

Discussion

Limitations

Methods

Cell culture and stress treatment

Expression and purification of recombinant protein A/G-RTase

RT–qPCR

Protein detection by Coomassie brilliant blue stain and western blot

Transfection

ARTR-seq

RNase treatment in ARTR-seq

Dot blot

ARTR-seq in the mouse embryo

ARTR-seq with low input

Genome reference

ARTR-seq primary data processing

Peaking calling

Subsampling

Alternative splicing identification

ARTR-seq enrichment level at the gene level

Data visualization and statistical analysis

Quantification of ARTR-seq signal at the gene level

Clustering analysis of G3BP1 ARTR-seq signal

Functional enrichment analysis

Statistics and reproducibility

Reporting summary

Data availability

Code availability

References

Acknowledgements

Author information

Authors and Affiliations

Contributions

Corresponding author

Ethics declarations

Competing interests

Peer review

Peer review information

Additional information

Extended data

Supplementary information

Source data

Rights and permissions

About this article

Cite this article

Share this article

This article is cited by

Search

Quick links

ARTR-seq identifies binding features of m⁶A reader proteins