Dynamic repertoire of a eukaryotic transcriptome surveyed at single-nucleotide resolution

Wilhelm, Brian T.; Marguerat, Samuel; Watt, Stephen; Schubert, Falk; Wood, Valerie; Goodhead, Ian; Penkett, Christopher J.; Rogers, Jane; Bähler, Jürg

doi:10.1038/nature07002

Letter
Published: 18 May 2008

Dynamic repertoire of a eukaryotic transcriptome surveyed at single-nucleotide resolution

Brian T. Wilhelm¹^na1^nAff3,
Samuel Marguerat¹^na1^nAff3,
Stephen Watt¹^nAff3,
Falk Schubert¹^nAff3,
Valerie Wood¹,
Ian Goodhead¹^nAff3,
Christopher J. Penkett¹^nAff3,
Jane Rogers¹ &
…
Jürg Bähler¹^nAff3

Nature volume 453, pages 1239–1243 (2008)Cite this article

7996 Accesses
753 Citations
7 Altmetric
Metrics details

Abstract

Recent data from several organisms indicate that the transcribed portions of genomes are larger and more complex than expected, and that many functional properties of transcripts are based not on coding sequences but on regulatory sequences in untranslated regions or non-coding RNAs^{1,2,3,4,5,6,7,8,9}. Alternative start and polyadenylation sites and regulation of intron splicing add additional dimensions to the rich transcriptional output^10,11. This transcriptional complexity has been sampled mainly using hybridization-based methods under one or few experimental conditions. Here we applied direct high-throughput sequencing of complementary DNAs (RNA-Seq), supplemented with data from high-density tiling arrays, to globally sample transcripts of the fission yeast Schizosaccharomyces pombe, independently from available gene annotations. We interrogated transcriptomes under multiple conditions, including rapid proliferation, meiotic differentiation and environmental stress, as well as in RNA processing mutants to reveal the dynamic plasticity of the transcriptional landscape as a function of environmental, developmental and genetic factors. High-throughput sequencing proved to be a powerful and quantitative method to sample transcriptomes deeply at maximal resolution. In contrast to hybridization, sequencing showed little, if any, background noise and was sensitive enough to detect widespread transcription in >90% of the genome, including traces of RNAs that were not robustly transcribed or rapidly degraded. The combined sequencing and strand-specific array data provide rich condition-specific information on novel, mostly non-coding transcripts, untranslated regions and gene structures, thus improving the existing genome annotation. Sequence reads spanning exon–exon or exon–intron junctions give unique insight into a surprising variability in splicing efficiency across introns, genes and conditions. Splicing efficiency was largely coordinated with transcript levels, and increased transcription led to increased splicing in test genes. Hundreds of introns showed such regulated splicing during cellular proliferation or differentiation.

You have full access to this article via your institution.

Download PDF

Full-length RNA profiling reveals pervasive bidirectional transcription terminators in bacteria

Article 15 July 2019

Dissecting key regulators of transcriptome kinetics through scalable single-cell RNA profiling of pooled CRISPR screens

Article Open access 25 September 2023

Uncovering transcriptional dark matter via gene annotation independent single-cell RNA sequencing analysis

Article Open access 12 April 2021

Main

To analyse the S. pombe transcriptome at the best possible resolution, we used Illumina 1G to sequence directly cDNA synthesized from poly(A)-enriched RNA. This approach kept the proportion of sequence reads from ribosomal RNA low (<10%) without biasing against messenger RNAs with short poly(A) tails¹². We obtained >23 million reads of an average length of 39.1 base pairs (bp), representing ∼60 genome lengths, from cells proliferating exponentially in rich medium. In addition, we acquired >99 million reads of transcriptomes from five stages of meiotic differentiation, representing an additional ∼190 genomes (Supplementary Table 1). Sequence reads were mapped back to both the spliced and the unspliced reference genome¹³ to determine the numbers of reads hitting each genomic base-pair position. Approximately 60% of all reads specifically mapped to one genomic region over 100% of their sequence, whereas >85% of the reads uniquely mapped over 90% of their sequence. The remaining reads either mapped to repeated sequences or were of poor quality. RNA expression levels determined from sequence-read numbers strongly correlated with those determined from hybridization signals, indicating that sequencing provides quantitative data on transcript levels (Fig. 1a).

Figure 1: **Quantitation of sequence coverage.**

The 5% of transcripts present at the lowest steady-state levels in rapidly proliferating cells¹² accumulated ∼777 sequence-read hits and 94.9% coverage on average, indicating that the transcriptome was sampled deeply enough to detect even genes with low expression levels. We modelled sequencing depth for rapidly proliferating cells: given the expression scores for all annotated genes, the model predicts that 99% of these genes have >50% sequence-read coverage (Supplementary Fig. 1). In agreement with this prediction, we obtained >50% sequence-read coverage for 99.3% of all annotated genes. The 41 genes with <50% coverage included 20 transposon-related long terminal repeats and 13 dubious genes or pseudogenes (Supplementary Table 2). Using cDNA microarrays, only 80–90% of genes yield measurable signals in proliferating cells¹⁴, whereas the remaining genes are only highly expressed under specific conditions such as meiosis or stress^15,16. These data suggest that the sequencing approach is sensitive enough to detect basal ‘transcriptional noise’ from genes that are not actively expressed.

As expected, intergenic regions were hit by fewer sequence reads than coding regions (Figs 1b and 2a). However, we obtained sequence data from ∼94% and >99% of the nuclear and mitochondrial genomes, respectively, suggesting that almost the entire genome is transcribed to some degree, consistent with the considerable overlap and complexity among different transcripts reported for other eukaryotes⁹. Reverse transcription followed by polymerase chain reaction (RT–PCR) controls verified that even intergenic regions with poor sequence-read coverage reflect expressed RNAs rather than technical noise from spurious sequences (Supplementary Fig. 2). Thus, our sequence data provide direct evidence for widespread transcription; it has been suggested that as much as 90% of all RNA polymerase II (Pol II) initiation events represent transcriptional noise¹⁷. Taken together, unlike for hybridization-based approaches, sequencing appears to produce little or no background noise, and the dynamic range of detected transcripts is only limited by sequencing depth.

Figure 2: **Analyses of novel transcripts.**

To verify and compare the sequence data with an established platform, we used Affymetrix chips containing 25-mer probes tiled at ∼20-nucleotide intervals across both strands of the S. pombe genome. We interrogated transcriptomes under a wide range of conditions (Supplementary Table 1), thus independently sampling gene expression at lower resolution but with strand-specific information (Fig. 2a).

The combined sequence and hybridization data revealed hundreds of novel transcribed regions. To distinguish between separate transcripts and extensions to known gene structures, we analysed tiling-chip data from a prp2 splicing-factor mutant¹⁸ along with sequence ‘trans-reads’ spanning unannotated splice junctions (Figs 2a and 3d). Combined with manual curation, these analyses helped to refine annotated gene structures, including 75 revisions of protein-coding regions and identification of ∼20 new introns in known genes. Conservative data analysis also revealed 453 novel transcripts, only 26 of which seemed to be coding for small proteins (<150 amino acids); 37 of the apparently non-coding transcripts overlapped known genes in the anti-sense direction (Supplementary Table 3). The 427 non-coding RNAs showed an average length of ∼825 nucleotides and a GC content that was similar to the 135 annotated non-coding RNAs but higher than for intergenic regions overall (33.0% versus 30.6%; P < 2 × 10^-16, Wilcoxon test). The non-coding RNAs included the elusive, recently discovered Ter1 telomerase RNA^19,20, which was induced during meiosis (SPNCRNA.214; Supplementary Table 3). Expression of 14 non-coding RNAs was independently confirmed by RT–PCR (Supplementary Fig. 3). This analysis revealed bi-directional transcription across all tested regions, including the well-characterized nmt1 gene, although most regions showed more transcripts from one strand. Given the ubiquitous transcription throughout the genome, the novel transcripts described here probably only hint at the true level of transcriptional complexity.

Figure 3: **Analyses of transcript structures.**

Sequence-read numbers across the newly identified transcribed regions were lower than numbers across annotated coding regions (Fig. 1b). Only 13 of the novel transcripts were evident from the tiling-chip data in proliferating cells, whereas another 79 were only substantially expressed under specific conditions, most notably during meiosis or quiescence (Fig. 2b, c and Supplementary Table 3). The antisense RNAs were particularly enriched for highly regulated transcripts, many of which peaked during the meiotic divisions (Fig. 2c). To test whether some of the newly identified regions reflect cryptic transcripts that are degraded in the nucleus, we analysed RNA isolated from an rrp6 mutant defective in nuclear exosome function^21,22: 36 of the novel transcripts were more highly expressed in this mutant such that they became evident also on tiling chips (Supplementary Table 3). These data raised the possibility that many newly identified regions are strongly transcribed but rapidly degraded by different surveillance systems²¹. To test this hypothesis, we globally measured Pol II occupancy (reflecting transcriptional activity¹²). Overall, Pol II occupancy across the new regions was comparable to the location of 10–20% of genes with the lowest levels of transcription (Fig. 2d). We conclude that most newly identified regions were not robustly expressed in proliferating cells, but that the sequencing approach was sufficiently sensitive to detect transcriptional traces below the detection limit of hybridization-based approaches.

The combined sequence and hybridization data provided a rich source to analyse transcript structures at maximal resolution. High densities of overlapping transcripts can confound the sequence data, and decreasing read-numbers towards the 5′ ends, reflecting oligo(dT) priming (Figs 1b and 3a), render it difficult to determine accurately transcript lengths of long genes. The hybridization data are less affected by these issues because they distinguish transcriptional direction and do not show any 5′ bias (Fig. 3a and Supplementary Fig. 4). Together, the two approaches provided complementary data on untranslated regions (UTRs) for most S. pombe genes (Supplementary Table 4). For many other genes, which were mostly expressed at low levels and did not pass our confidence cutoffs, the UTRs could be mapped by visual inspection. UTRs determined by hybridization or sequencing showed good agreement with each other and also with the previously known UTRs (Supplementary Table 5). The median 5′- and 3′-UTR lengths determined by hybridization were 152 and 169 nucleotides, respectively, with a mean combined length of 465 nucleotides (Fig. 3b). Thus, the UTRs of fission yeast are substantially larger than those of budding yeast, which show a mean combined length of 211 nucleotides⁵.

We compared UTR-length distributions for different functional categories (Fig. 3c). The most stable transcripts¹² had short 5′ UTRs, whereas the least stable transcripts had long 5′ and 3′ UTRs, which may contain regulatory signals for RNA turnover. An analysis of Gene Ontology (GO) categories with significantly longer or shorter UTRs (Fig. 3c) uncovered similarities to budding yeast⁵. For example, transcripts encoding protein kinases and membrane proteins had long 5′ UTRs, whereas ribosome-biogenesis genes had short 5′ UTRs in both yeasts, indicating that UTR-length distributions show some conservation in these distantly related yeasts.

Sampling UTR lengths under different conditions allowed detection of transcript-size regulation (Supplementary Table 4). Our data confirmed the known transcripts with alternate start sites or polyadenylation sites produced from cig2 and wos2, respectively^23,24. Using a conservative approach, we identified 27 additional transcripts with alternate start sites during meiosis or stress (Figs 2b and 3d, and Supplementary Table 6). Alternate polyadenylation sites were more abundant, affecting ∼187 transcripts (Supplementary Table 6). Transcription-termination sites were generally less well defined than start sites and also varied across different conditions (Fig. 3d and Supplementary Fig. 5).

The resolution of the tiling chips was limiting to analyse splicing owing to the small size of most introns (<100 nucleotides). The sequence data, however, provided unprecendented insights into splicing of the 45.4% intronic genes of S. pombe¹³. Both unspliced and spliced transcripts were present in the total RNA preparations; accordingly, we also obtained reads covering introns, albeit at lower numbers than for exons (Figs 1b and 3a). Importantly, sequencing provided direct evidence for splicing owing to ‘trans-reads’ spanning exon–exon junctions, thus confirming ∼93% of predicted introns and hugely reducing unsupported gene structures. We found no evidence for the existence of alternate splicing in S. pombe.

To estimate splicing efficiencies, we determined normalized numbers of sequence reads spanning exon–exon and corresponding exon–intron junctions for all introns (Supplementary Table 7). This calculation of splicing efficiency exploits relative read numbers and is therefore internally normalized for expression levels and sequencing depth. Median numbers of spliced transcripts were only ∼2-fold higher than numbers of corresponding unspliced transcripts, suggesting a surprisingly large cellular portion of unprocessed mRNAs (Supplementary Table 7). Average splicing efficiency was similar for different intron positions within genes (Supplementary Fig. 6). Splicing efficiency strongly varied, however, among different genes and conditions. A conservative analysis uncovered 254 genes (314 introns) that were more efficiently spliced during meiotic differentiation than in proliferating cells (Supplementary Table 8). These genes included 9 of 12 known meiotically spliced genes²⁵, whereas the 3 remaining genes showed increased meiotic splicing below our cutoff. Such ‘regulated’ splicing was evident in all five differentiation stages tested, but was most prevalent during meiotic prophase and nuclear divisions (Fig. 4a). In some genes all introns showed regulated splicing, whereas in others only selected introns were regulated (Supplementary Table 8)—a finding that was robust to lowering the cutoff. The median proportion of introns per gene showing regulated splicing was 50%, and regulated splicing showed no preference for specific intron positions.

Figure 4: **Dynamics of splicing efficiency reflects transcription.**

The surprisingly large, yet conservative, list of genes with increased meiotic splicing was highly enriched for genes showing increased transcript levels during meiosis²⁶ (P ∼ 2 × 10^-20, hypergeometric test). Coordinated increases of meiotic gene expression and splicing were also directly evident from the sequence data (Fig. 4b). Moreover, meiotic transcripts showed similar profiles for gene expression and splicing efficiency during meiosis (Supplementary Fig. 7). A reciprocal analysis uncovered 478 genes (559 introns) that were more efficiently spliced in proliferating cells than during meiosis (Fig. 4a and Supplementary Table 8). This list was enriched for genes highly expressed in proliferating cells¹⁶, including ribosomal-protein genes (P <2 × 10^-7, hypergeometric test). These data suggest that increased transcription can promote splicing. Indeed, splicing efficiency was significantly correlated with mRNA levels (Fig. 4c). Moreover, a functional analysis revealed widespread relationships between expression levels and splicing efficiency in proliferating cells (Supplementary Table 9). For example, highly expressed genes, such as those repressed during stress¹⁵, or conserved genes¹⁶ were more efficiently spliced than genes induced during stress or than S. pombe-specific genes.

To test directly whether increased transcription can lead to increased splicing, we activated transcription of the meiotically spliced spo6 and spn7 genes, either by placing them under the control of an ectopic regulatable promoter or by overexpressing the transcription factor Mei4, which activates spo6 and spn7 (ref. 27) and has been implicated in the regulation of meiotic splicing²⁸. The proportion of spliced transcripts increased after activating transcription, using either the ectopic or the native transcription factor (Fig. 4d; Supplementary Fig. 7). We conclude that activation of transcription itself is sufficient to promote splicing during meiosis, without the specific need for the meiotic factor Mei4. This finding raises the possibility that transcriptional and splicing efficiencies are mechanistically linked. Taken together, our results reveal a surprising genome-wide regulation of splicing, largely reflecting transcript levels during proliferation or differentiation. These data point to a global and condition-specific coupling between splicing efficiency and transcription, which may help to optimize and streamline gene expression programmes.

Methods Summary

Strains and experimental conditions are listed in Supplementary Table 1. cDNA for sequencing and array hybridization was prepared using oligo(dT) or random primers, respectively. For sequencing, fragment sizes of 120–170 bp were attached to the FlowCell at an average concentration of 3 pM, amplified isothermally, and sequenced using Solexa reversible-terminator chemistry on the Illumina Genome Analyser. Sequence reads were mapped to the reference genome using BLAT. Analyses of tiling-chip data were based on the Bioconductor package ‘tilingArray’²⁹.

Online Methods

cDNA preparation for high-throughput sequencing

All cDNA samples for Illumina were prepared by first treating ∼1 mg of total RNA for 30 min with amplification-grade RNase-free DNase (Invitrogen), according to the manufacturer’s protocols. Poly(A)-enriched RNA was then prepared using an oligo(dT) selection kit (Oligotex Direct mRNA miniKit, Qiagen). The resulting RNA was converted to double-stranded cDNA using a cDNA synthesis kit (Superscript choice system for cDNA synthesis, Invitrogen), primed by an oligo(dT) primer. RNA samples from the pooled meiotic time points were subjected to amplification by in vitro transcription (IVT) after a poly(A)-enrichment step as described above.

DNA libraries were prepared following the manufacturer’s instructions (Illumina). DNA was sheared by nebulization, followed by simultaneous end-repair and phosphorylation using T4 DNA polymerase, Klenow fragment of DNA polymerase I and T4 PNK. DNA recovery was performed after each stage using QIAquick PCR purification columns (Qiagen). These repaired fragments were 3′-adenylated using Klenow exonuclease-minus (Illumina) and were purified using a MinElute PCR purification column (Qiagen). Illumina adaptors were ligated to the adenylated ends of the fragments and gel-purified on a 2% TAE (Tris-acetate-EDTA)-agarose gel (Certified Low-Range Ultra Agarose, Biorad), stained using ethidium bromide and visualized on a Dark Reader (Clare Chemical). A range of fragment sizes (120–170 bp) was excised from the gel and extracted using a QIAquick gel extraction kit. Seventeen rounds of PCR amplification were performed using primers complementary to the previously ligated adaptors and compatible to oligonucleotides attached to the FlowCell. DNA was recovered using a QIAquick PCR purification column. DNA was subsequently diluted to a working concentration of 10 nM in TE (Tris-EDTA) after quantification on a Nanodrop-1000 spectrophotometer.

Sequencing data processing and expression scores

FASTQ files of sequence reads were converted into FASTA files, and were filtered to remove sequences <15 bp after trimming the sequence from the position of the first N. All remaining FASTA sequences were matched back to the S. pombe genome using BLAT (tilesize 8, oneoff 1) in parallel on the Sanger Institute computer farm. All FASTA reads were also matched back as above to a spliced genome with all known or predicted intron sequences removed. The result files of matches to the spliced and unspliced genomes were compiled into a complete and non-redundant set used for subsequent analysis.

For Fig. 1a, expression scores for every genomic base pair position were assigned on the basis of how many sequence reads covered each position. The log₂ of the score for each base pair position was then plotted using R/Bioconductor. The numbers of sequence reads drop towards the 5′-end of long genes. To ensure that expression scores are not biased against long genes, scores were determined first by taking the sum of the sequencing expression scores for only 300 bp at the 3′-end of each coding region, or for the entire length if the coding region was <300 bp, and then dividing by the corresponding length used.

Expression-chip hybridization and processing

Total RNA was isolated as described¹⁴, and 0.3 μg RNA were labelled using the standard Affymetrix Genechip eukaryotic hybridization protocols. Hybridizations were performed on Affymetrix Yeast 2.0 Genechip arrays. Scanning was performed on a Genechip Scanner 3000, and data extraction was carried out using Affymetrix GCOS 1.4 (Figs 1a and 4c).

Tiling-chip labelling, hybridization and normalization

Total RNA was isolated as described¹⁴. Labelling and hybridization to the Affymetrix GeneChip S. pombe Tiling 1.0FR arrays were performed as described⁵. Affymetrix CEL files were normalized using the ‘normalizeByReference’ function from the Bioconductor package ‘tilingArray’ (http://www.bioconductor.org)²⁹. In this procedure, the individual hybridization behaviour of every probe was corrected using the signal of three genomic DNA hybridizations. Genomic DNA was extracted, labelled and hybridized to the Affymetrix GeneChip S. pombe Tiling 1.0FR arrays as described⁵. A second normalization step was applied using the signals of intergenic probes as a reference. Finally, between-array normalization and variance-stabilizing transformation were applied using the Bioconductor package ‘vsn’.

Pol II ChIP–chip analysis

Chromatin immunoprecipition (ChIP) was performed as described¹² using an antibody specific for the Pol II C-terminal domain (4H8, Abcam). The immunoprecipitated material and input control were amplified in two steps as described³¹. During the second step, dUTPs were added to the PCR mix for subsequent fragmentation of the products. Fragmentation and labelling of the amplified products were performed using the GeneChip WT double-stranded DNA terminal labelling kit (Affymetrix). The duplicated immunoprecipitated samples and corresponding input material were hybridized on four separate Affymetrix GeneChip S. pombe Tiling 1.0FR arrays. The log₂ signals of the probes on the input arrays were subtracted from the log₂ signals of the Pol II arrays. The two normalized Pol II data sets were averaged and smoothed using a five-probe moving average. Average gene profiles were created using R and Bioconductor.

Data visualization along genomic coordinates

The tiling-chip data were visualized using the ‘plotAlongChrom’ function⁵ (Figs 2b and 3d). The sequence data were visualized using an in-house R script (Figs 2a and 3a). Normalized sequence scores were generated by dividing the sequencing expression score for a given base pair position by the sum of the expression scores for this base pair position in each condition sequenced.

Novel transcript analysis using tiling-chip data

The normalized data were smoothed using a five-probe moving average. Signal breakpoints in the probe signals along genomic coordinates were then determined using a dynamic programming algorithm for finding a globally optimal fit of a piecewise constant expression profile along genomic coordinates²⁹. Segments ≥100 bp and a median probe signal higher than the 75th percentile of the chip and outside of any annotation were selected for visual analysis. To screen for anti-sense transcripts, similar criteria were applied except that the segments had to overlap annotated genes on the opposite strand (Supplementary Table 3).

Novel transcript analysis using sequence data

Stretches of contiguous expression in intergenic regions were identified after removing all UTRs (see below) from the intergenic search space. Novel transcribed regions were required to have a length of ≥70 bp and an average sequence-expression score of ≥5 reads per bp. All predicted novel transcripts were then visually validated to remove inaccurate UTRs before a final manual curation (Supplementary Table 3).

Expression profiling analysis of the novel transcripts

Expression profiles of the novel transcribed regions determined by sequencing and tiling chips were visually inspected from their expression across the 12 biological conditions tested (Supplementary Table 3). For the clustering analysis of Fig. 2c, a Wilcoxon rank sum test was used to determine if the probe signals in each new transcribed region were significantly greater than the signals of a reference set containing probes located outside of any annotated regions in any condition. An expression score was defined as –log₂ of the P-value of this test.

UTR determination using tiling-chip data

CEL files were processed as for novel transcripts. The UTR boundaries were the closest breakpoint to the start of an annotated gene, where the median of the four probes immediately upstream of the breakpoint was lower than the one of the four probes downstream of the breakpoint. If no breakpoint could be defined that way and a breakpoint was present <50 bases inside the coding region, the UTR was set to 1. UTRs called inside neighbouring genes or sharing UTR boundaries with neighbouring genes were discarded. UTRs >1,000 nucleotides were discarded, because they were highly enriched in wrong calls based on visual inspection of the data (Supplementary Table 4)

UTR determination using sequence data

UTR lengths were determined by screening for a break in the transcribed region around genes, denoted by positions with sequence scores of 0 or 1, starting from either end of every gene. If a score of 0 was not found in the section between the start and/or end of the neighbouring regions, 1 was used as a cutoff. If no break was found using either cutoff, the UTR was denoted as undetermined (Supplementary Table 4).

Alternate 5′- and 3′-end analysis using tiling-chip data

Genes with UTRs containing several breakpoints caused by ‘steps’ in the decreasing probe signals moving away from the gene boundaries were automatically selected from 12 biological conditions. A Wilcoxon rank sum test was then used to determine if the probe signals in each region were significantly greater than the signals of a reference set containing probes located outside of any annotated regions in any condition. A score was defined as –log₂ of the P-value of this test. Candidate regions with scores >10 in ≥12 conditions were selected for visual inspection (Supplementary Table 6).

Splicing analysis using sequence data

The initial BLAT results generated a set of sequence reads with gaps in the reference sequence (that is, representing potential spliced reads). Spurious matches within this data set caused by poly(A/T) tracts splitting reads between two distant regions in the genome were filtered out using a limit of ≤1 kb for the maximum sequence spanned by trans-reads. The remaining trans-reads were compared to all known and predicted introns for intron validation. Trans-reads that did not span any known introns were clustered on the basis of their splice junctions, where putative junctions had to overlap ±1 bp to belong to the same cluster. Clusters were ranked by the number of novel trans-reads in each cluster and a conservative set of 33,466 reads with ≥6 reads per cluster (defining 485 potential splice sites) were manually curated. ‘False-positive’ trans-read clusters that did not seem to reflect splicing were mostly within complex repeated regions, and some may reflect errors in the original genome sequence.

Regulated splicing was determined by calculating a ratio of reads that span exon–exon junctions (EE) to those that span the two corresponding exon–intron junctions (2EI) (Supplementary Table 7). The latter were divided by two to normalize for relative frequency. To obtain a conservative estimate of regulated splicing, the EE:EI ratio for one condition had to be ≥5-times greater than the EE:EI ratio of another stage. Junctions covered by <2 sequence reads in any condition were not considered. Genes that were ≥5-times higher spliced in any meiotic-differentiation stage (M1 to M5) compared to rapidly proliferating cells as well as those that were ≥5-times higher spliced in rapidly proliferating cells compared to ≥1 meiotic-differentiation stage were determined (Fig. 4a). Additional analysis was also performed using absolute read numbers, in cases where ratios could not be calculated because of 0 values. In these cases, to obtain a conservative estimate of regulated splicing, where EE = 0 in rapidly proliferating cells, the EE in ≥1 meiotic-differentiation stage was required to be >6. With EE = 1 or 2 but EI = 0 in rapidly proliferating cells, the EE in ≥1 meiotic-differentiation stage was required to be 8 or 9, respectively. With EE ≥3 in rapidly proliferating cells, the EE in ≥1 meiotic-differentiation stage was required to be ≥5-times higher than in rapidly proliferating cells, or ≥20-times higher when identifying introns spliced more efficiently in rapidly proliferating cells to account for the greater sequence depth in this condition (Supplementary Table 8).

Measurement of splicing efficiency by quantitative RT–PCR

To test the relationship between transcription rate and splicing efficiency (Fig. 4d and Supplementary Fig. 7), the uracil-inducible urg1 promoter was integrated upstream of spo6 and spn7 (ref. 30). Cells were grown in exponential phase for 16 h in minimal medium (MM) in the absence of uracil. A cell sample was then harvested, and uracil was added to the remaining culture at a final concentration of 2 mg ml^-1. Further cell samples were harvested 15 min and 60 min after uracil addition.

spo6 and spn7 are putative targets of Mei4 and were induced in a strain overexpressing Mei4 under the control of the nmt1 promoter²⁷. Such a strain (Supplementary Table 1) was grown in the presence of thiamine to early exponential phase. A cell sample was then harvested before the cells were diluted and was grown for 18 h in the absence of thiamine.

Primers were designed inside the exons 1 and 2 of spo6 and across the exon 1/intron 1 and exon 2/intron 2 junctions. Similarly, primers were designed inside exons 1 and 4 of spn7 and across the exon1/intron1 and intron 3/exon 4 junctions. RNA was extracted and qRT–PCR performed as described³⁰. The data were normalized to the signal of the fba1 control gene. No signals above background levels were detected in control runs in the absence of reverse transcriptase.

Curation methods

Novel transcribed regions were converted to gff3 format and visualized in the context of the existing annotation using Artemis software and methods described previously¹³. The corresponding sequence plots were examined and discrete features designated ‘non-coding RNAs’. Manual inspection of the strand-specific tiling-chip data identified several ‘antisense’ transcripts. ‘Non-coding RNAs’ were inspected for the presence of methionine-containing ORFs >60 amino acids, identifying three protein-coding genes. Less discrete features, which may correspond to transcriptional noise, occurring mainly in low-complexity regions were designated ‘miscellaneous features’. Some transcribed features were clearly related to their proximal genes and curated as 5′ and 3′ UTRs (occasionally intron-containing).

Sequence trans-reads obtained from proliferating cells validated 3,796 of the 4,811 known and predicted introns, and trans-reads only obtained from meiotic cells validated an additional 666 introns (Supplementary Table 7). The remaining 349 introns either were in poorly expressed genes with insufficient sequence reads, or were not spliced under any of the conditions tested. Among the latter, manual inspection coupled with homology searches and intron branch, acceptor and donor consensus-sequence data allowed refinement of 25 protein-coding gene structures, and deletion of 6 unsupported intron-containing genes. A number of the introns confirmed by trans-read sequences were not previously annotated in the database. These ‘false negative’ introns were mapped onto the genomic sequence and used to identify 22 new genes and revise a further ∼60 gene structures.

All these alterations have been incorporated in S. pombe gene database (http://www.genedb.org/genedb/pombe/). The new transcribed regions are listed in Supplementary Table 3, and the corrected gene structures are listed at http://www.genedb.org/genedb/pombe/coordChanges.jsp.

Accession codes

Primary accessions

ArrayExpress

Data deposits

Raw data are available from ArrayExpress (http://www.ebi.ac.uk/arrayexpress) under accession numbers E-MTAB-5 (sequence data) and E-MTAB-18 (array data). Transcript data-plots are available from our TranscriptomeViewer at http://www.sanger.ac.uk/PostGenomics/S_pombe/.

References

Yamada, K. et al. Empirical analysis of transcriptional activity in the Arabidopsis genome. Science 302, 842–846 (2003)
Article ADS CAS PubMed Google Scholar
Bertone, P. et al. Global identification of human transcribed sequences with genome tiling arrays. Science 306, 2242–2246 (2004)
Article ADS CAS PubMed Google Scholar
Stolc, V. et al. A gene expression map for the euchromatic genome of Drosophila melanogaster . Science 306, 655–660 (2004)
Article ADS CAS PubMed Google Scholar
Carninci, P. et al. The transcriptional landscape of the mammalian genome. Science 309, 1559–1563 (2005)
Article ADS CAS PubMed Google Scholar
David, L. et al. A high-resolution map of transcription in the yeast genome. Proc. Natl Acad. Sci. USA 103, 5320–5325 (2006)
Article ADS CAS PubMed PubMed Central Google Scholar
Li, L. et al. Genome-wide transcription analyses in rice using tiling microarrays. Nature Genet. 38, 124–129 (2006)
Article CAS PubMed Google Scholar
The Encode Project Consortium. Identification and analysis of functional elements in 1% of the human genome by the ENCODE pilot project. Nature 447, 799–816 (2007)
Kapranov, P. et al. RNA maps reveal new RNA classes and a possible function for pervasive transcription. Science 316, 1484–1488 (2007)
Article ADS CAS PubMed Google Scholar
Kapranov, P., Willingham, A. T. & Gingeras, T. R. Genome-wide transcription and the implications for genomic organization. Nature Rev. Genet. 8, 413–423 (2007)
Article CAS PubMed Google Scholar
Blencowe, B. J. Alternative splicing: new insights from global analyses. Cell 126, 37–47 (2006)
Article CAS PubMed Google Scholar
Hughes, T. A. Regulation of gene expression by alternative untranslated regions. Trends Genet. 22, 119–122 (2006)
Article CAS PubMed Google Scholar
Lackner, D. H. et al. A network of multiple regulatory layers shapes gene expression in fission yeast. Mol. Cell 26, 145–155 (2007)
Article CAS PubMed PubMed Central Google Scholar
Wood, V. et al. The genome sequence of Schizosaccharomyces pombe . Nature 415, 871–880 (2002)
Article ADS CAS PubMed Google Scholar
Lyne, R. et al. Whole-genome microarrays of fission yeast: characteristics, accuracy, reproducibility, and processing of array data. BMC Genomics 4, 27 (2003)
Article PubMed PubMed Central Google Scholar
Chen, D. et al. Global transcriptional responses of fission yeast to environmental stress. Mol. Biol. Cell 14, 214–229 (2003)
Article CAS PubMed PubMed Central Google Scholar
Mata, J. & Bähler, J. Correlations between gene expression and gene conservation in fission yeast. Genome Res. 13, 2686–2690 (2003)
Article CAS PubMed PubMed Central Google Scholar
Struhl, K. Transcriptional noise and the fidelity of initiation by RNA polymerase II. Nature Struct. Mol. Biol. 14, 103–105 (2007)
Article CAS Google Scholar
Potashkin, J., Li, R. & Frendewey, D. Pre-mRNA splicing mutants of Schizosaccharomyces pombe . EMBO J. 8, 551–559 (1989)
Article CAS PubMed PubMed Central Google Scholar
Leonardi, J., Box, J. A., Bunch, J. T. & Baumann, P. TER1, the RNA subunit of fission yeast telomerase. Nature Struct. Mol. Biol. 15, 26–33 (2008)
Article CAS Google Scholar
Webb, C. J. & Zakian, V. A. Identification and characterization of the Schizosaccharomyces pombe TER1 telomerase RNA. Nat. Struct. Mol. Biol. 15, 34–42 (2008)
Article CAS PubMed Google Scholar
Bickel, K. S. & Morris, D. R. Silencing the transcriptome’s dark matter: mechanisms for suppressing translation of intergenic transcripts. Mol. Cell 22, 309–316 (2006)
Article CAS PubMed Google Scholar
Harigaya, Y. et al. Selective elimination of messenger RNA prevents an incidence of untimely meiosis. Nature 442, 45–50 (2006)
Article ADS CAS PubMed Google Scholar
Borgne, A., Murakami, H., Ayté, J. & Nurse, P. The G1/S cyclin Cig2p during meiosis in fission yeast. Mol. Biol. Cell 13, 2080–2090 (2002)
Article CAS PubMed PubMed Central Google Scholar
Munoz, M. J., Daga, R. R., Garzon, A., Thode, G. & Jimenez, J. Poly(A) site choice during mRNA 3′-end formation in the Schizosaccharomyces pombe wos2 gene. Mol. Genet. Genomics 267, 792–796 (2002)
Article CAS PubMed Google Scholar
Averbeck, N., Sunder, S., Sample, N., Wise, J. A. & Leatherwood, J. Negative control contributes to an extensive program of meiotic splicing in fission yeast. Mol. Cell 18, 491–498 (2005)
Article CAS PubMed Google Scholar
Mata, J., Lyne, R., Burns, G. & Bähler, J. The transcriptional program of meiosis and sporulation in fission yeast. Nature Genet. 32, 143–147 (2002)
Article CAS PubMed Google Scholar
Mata, J., Wilbrey, A. & Bähler, J. Transcriptional regulatory network for sexual differentiation in fission yeast. Genome Biol. 8, R217 (2007)
Article PubMed PubMed Central Google Scholar
Malapeira, J. et al. A meiosis-specific cyclin regulated by splicing is required for proper progression through meiosis. Mol. Cell. Biol. 25, 6330–6337 (2005)
Article CAS PubMed PubMed Central Google Scholar
Huber, W., Toedling, J. & Steinmetz, L. M. Transcript mapping with high-density oligonucleotide tiling arrays. Bioinformatics 22, 1963–1970 (2006)
Article CAS PubMed Google Scholar
Watt, S. et al. urg1: a uracil-regulatable promoter system for fission yeast with short induction and repression times. PLoS ONE 3, e1428 (2008)
Article ADS PubMed PubMed Central Google Scholar
Bernstein, B. E., Humphrey, E. L., Liu, C. L. & Schreiber, S. L. The use of chromatin immunoprecipitation assays in genome-wide analyses of histone modifications. Methods Enzymol. 376, 349–360 (2004)
Article CAS PubMed Google Scholar

Download references

Acknowledgements

We thank K. Gould and M. Yamamoto for strains, W. Huber and R. Durbin for advice, and J. Mata, W. Huber, V. Pancaldi, D. Stemple, J.-R. Landry and D. Lackner for comments on the manuscript. B.T.W. was supported by Sanger Postdoctoral and Canadian NSERC fellowships, and S.M. by a fellowship for Advanced Researchers from the Swiss National Science Foundation. This research was funded by Cancer Research UK grant number C9546/A6517 by the Wellcome Trust, and by DIAMONDS, an EC FP6 Lifescihealth STREP (LSHB-CT-2004-512143).

Author Contributions B.T.W., S.M. and J.B. designed and supervised the research and discussed the results; S.W. performed most experiments with help of B.T.W. and S.M.; B.T.W. and S.M. analysed the data with help of F.S., V.W., C.J.P. and J.B.; I.G. and J.R. helped with sequencing; and J.B. drafted the manuscript.

Author information

Brian T. Wilhelm, Samuel Marguerat, Stephen Watt, Falk Schubert, Ian Goodhead, Christopher J. Penkett & Jürg Bähler
Present address: Present addresses: Institut de Recherche en Immunologie et en Cancérologie (IRIC), Montreal, H3C 3J7, Canada (B.T.W.); Department of Genetics, Evolution and Environment, University College London, London WC1E 6BT, UK (S.M., S.W., F.S. and J.B); School of Biological Sciences, University of Liverpool, L69 7ZB, UK (I.G.); EMBL-European Bioinformatics Institute, Hinxton, Cambridge, CB10 1SD, UK (C.J.P.).,
Brian T. Wilhelm and Samuel Marguerat: These authors contributed equally to this work.

Authors and Affiliations

Cancer Research UK Fission Yeast Functional Genomics Group, Wellcome Trust Sanger Institute, Hinxton, Cambridge CB10 1HH, UK ,
Brian T. Wilhelm, Samuel Marguerat, Stephen Watt, Falk Schubert, Valerie Wood, Ian Goodhead, Christopher J. Penkett, Jane Rogers & Jürg Bähler

Authors

Brian T. Wilhelm
View author publications
You can also search for this author in PubMed Google Scholar
Samuel Marguerat
View author publications
You can also search for this author in PubMed Google Scholar
Stephen Watt
View author publications
You can also search for this author in PubMed Google Scholar
Falk Schubert
View author publications
You can also search for this author in PubMed Google Scholar
Valerie Wood
View author publications
You can also search for this author in PubMed Google Scholar
Ian Goodhead
View author publications
You can also search for this author in PubMed Google Scholar
Christopher J. Penkett
View author publications
You can also search for this author in PubMed Google Scholar
Jane Rogers
View author publications
You can also search for this author in PubMed Google Scholar
Jürg Bähler
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Jürg Bähler.

Supplementary information

Supplementary Figures

The file contains Supplementary Figures 1 to 7 with Legends (PDF 244 kb)

Table 1

The file contains Supplementary Table 1 (XLS 19 kb)

Table 2

The file contains Supplementary Table 2 (XLS 19 kb)

Table 3

The file contains Supplementary Table 3 (XLS 88 kb)

Table 4

The file contains Supplementary Table 4 (XLS 2247 kb)

Table 5

The file contains Supplementary Table 5 (XLS 69 kb)

Table 6

The file contains Supplementary Table 6 (XLS 45 kb)

Table 7

The file contains Supplementary Table 7 (XLS 847 kb)

Table 8

The file contains Supplementary Table 8 (XLS 60 kb)

Table 9

The file contains Supplementary Table 9 (XLS 17 kb)

Rights and permissions

Reprints and permissions

About this article

Cite this article

Wilhelm, B., Marguerat, S., Watt, S. et al. Dynamic repertoire of a eukaryotic transcriptome surveyed at single-nucleotide resolution. Nature 453, 1239–1243 (2008). https://doi.org/10.1038/nature07002

Download citation

Received: 17 March 2008
Accepted: 15 April 2008
Published: 18 May 2008
Issue Date: 26 June 2008
DOI: https://doi.org/10.1038/nature07002

This article is cited by

Next-Generation Sequencing in Medicinal Plants: Recent Progress, Opportunities, and Challenges
- Deeksha Singh
- Shivangi Mathur
- Rajiv Ranjan
Journal of Plant Growth Regulation (2024)
Single-cell RNA sequencing in orthopedic research
- Tao Wang
- Ling Wang
- Zhiyong Hou
Bone Research (2023)
An intersectional analysis of LncRNAs and mRNAs reveals the potential therapeutic targets of Bi Zhong Xiao Decoction in collagen-induced arthritis rats
- Cailin He
- Yang Wang
- Xingui Xiong
Chinese Medicine (2022)
Bioinformatics analysis reveals potential biomarkers associated with the occurrence of intracranial aneurysms
- Chao Zhao
- Zhiguo Ma
- Aihong Wu
Scientific Reports (2022)
Biologia Futura: progress and future perspectives of long non-coding RNAs in forest trees
- Maheswari Patturaj
- Aiswarya Munusamy
- Yasodha Ramasamy
Biologia Futura (2022)

Comments

By submitting a comment you agree to abide by our Terms and Community Guidelines. If you find something abusive or that does not comply with our terms or guidelines please flag it as inappropriate.