INTRODUCTION
Endogenous antisense RNAs (asRNAs) are products of DNA-dependent RNA polymerase initiated from antisense promoters that at least partially overlap a functional RNA (sense RNA) that may or may not be coding. The overlapping regions of sense and antisense RNAs are fully complementary, so they have the potential to form perfectly matched double-stranded RNAs (dsRNAs). asRNAs usually are much less abundant than the corresponding sense RNA, and next-generation sequencing has identified many new species of asRNAs (
1,
2).
Known examples of RNA regulation of gene expression in bacteria involve a variety of small noncoding RNAs and asRNAs (
3–7). In particular, asRNAs and RNase III regulate plasmid and toxin gene expression. The
Escherichia coli ColE1 plasmid replication origin carries an asRNA that is complementary to the DNA replication primer and inhibits plasmid replication (
8–11). Other well-known asRNA-regulated systems are the type I toxin-antitoxin (TA) genes (
12). In type I TA systems, including
hok-
sok in the R1 plasmid (
13,
14) and
ldrD-
rdlD in the
E. coli genome (
15), a small asRNA gene lies opposite, but overlapping, a gene encoding a toxic peptide. The small asRNA inhibits the expression of the toxin by at least partially base pairing with the toxin RNA. RNase III, an exonuclease that cleaves dsRNAs (
16) to generate 5′-phosphate and 3′-hydroxyl termini, leaving a characteristic 3′ 2-nucleotide (nt) overhang (
17,
18), regulates both the plasmid replication system (
10) and the type I TA systems (
19,
20). The exhaustive digestion of dsRNAs by RNase III produces small dsRNAs of ∼14 bp (
21).
Bacterial genomes produce many asRNAs from protein-coding genes. Using a whole-genome tiling microarray, the Church group discovered that a large percentage of the
E. coli genome is transcribed in both directions (
22), although technical artifacts in reverse transcription steps also could give some antisense signals (
23). Multiple groups subsequently used deep sequencing to study the transcriptome of bacterial genomes (
6). Lasa et al. found a significant increase in the number of antisense reads within the short (<50-nt) RNA deep-sequencing reads compared to the number of long RNA reads in
Staphylococcus aureus (
2). Their findings suggested that asRNAs are widely transcribed across the genome of Gram-positive bacteria but are degraded with sense RNAs into small RNAs of <50 nt by RNase III. Lioliou et al. used a catalytically inactive RNase III mutant to pull down RNase III-bound RNAs and identified RNase III-bound asRNAs in 44% of annotated genes in
S. aureus (
24). More recently, deep sequencing of immunoprecipitated dsRNAs in an RNase III cleavage mutant strain revealed that RNase III cleaves sense and antisense RNA pairs in
E. coli (
25). Transcription termination by Rho was implicated in restricting the expression of antisense transcription (
26,
27).
Despite the consensus that asRNAs are ubiquitous in bacteria, the biological functions and physiological significance of asRNAs are not well understood. There are only a few examples of asRNAs regulating protein-coding genes (
28). One study suggested that asRNAs are mainly transcriptional noise arising from spurious promoters (
29). In contrast, two operons overlapping in their 5′ regions were shown to antagonize each other’s expression in
Listeria monocytogenes, representing an antisense RNA gene regulation model, termed “excludon” (
30). Whether widespread asRNA are ubiquitous gene regulators or mostly transcriptional noise and the role of RNase III in asRNA gene regulation remain to be investigated in
E. coli.
The
Tombusvirus p19 protein captures short interfering RNAs (siRNAs) (∼21-nucleotide small dsRNAs) to defend against the antiviral effects of RNA interference in plants (
31,
32). We previously found (
33) that the ectopic expression of p19 in
E. coli captures ∼21-nucleotide small dsRNAs generated from overlapping exogenous long hairpin RNAs. These small RNA duplexes, which apparently are intermediary degradation products of RNase III, were termed pro-siRNAs (for
prokaryotic
siRNAs). pro-siRNAs were greatly reduced in the absence of p19 or in RNase III-deficient bacteria expressing p19. The precipitation of p19 in bacterial cells coexpressing
p19 and ∼500-nt sense and antisense sequences or a similarly sized sense-antisense stem-loop of an exogenous gene enabled us to isolate and purify pro-siRNAs that specifically and efficiently knock down the exogenous gene when transfected into mammalian cells (
33–35). pro-siRNAs mapped to multiple sequences in the exogenous target gene.
In this study, we engineered E. coli cells expressing p19 but no exogenous sequences, from which ∼21-nucleotide dsRNAs of bacterial genome sequence were captured (referred to as p19-captured dsRNAs). We hypothesized that these short dsRNAs represent p19-stabilized RNase III cleavage intermediates of overlapping endogenous sense and antisense transcripts that can provide a useful method for characterizing labile endogenous dsRNAs. p19-captured dsRNAs also contained bona fide RNase III cleavage sites, which could be used to identify target sequence preferences of RNase III.
DISCUSSION
Here, we developed a method to capture endogenous small dsRNAs (∼21 to 22 bp) by the ectopic expression of
Tombusvirus p19 in
E. coli. Deep sequencing of p19-captured dsRNAs and total rRNA-depleted RNA suggested that clusters of short dsRNAs arise from duplexes of at least 21 bp formed by overlapping sense and antisense transcripts that are processed into short dsRNAs by RNase III. p19 capture stabilized labile dsRNA products to enable us to detect dsRNA with high sensitivity. asRNAs were transcribed from most genes, as previously noted (
2,
25,
56), but with a wide range of abundance (
Fig. 4d). The abundance of captured dsRNAs correlated with asRNA reads (
Fig. 4e). Although some of the less abundant asRNAs and dsRNAs may represent transcriptional noise, the most abundant p19-captured dsRNA clusters we identified agreed well with asRNAs identified in other studies by deep sequencing, assignment of antisense transcription start sites (
46) and operons (
47), and dsRNAs captured with anti-dsRNA antibody (
25) and are likely the result of intended transcription (
Fig. 4f and
h). Our method confirmed hundreds of previously identified asRNAs and identified potentially hundreds of new such loci (see
Table S1 in the supplemental material) in
E. coli. Our data should provide a valuable resource for studying asRNAs in
E. coli. The p19-captured dsRNA, RNase III
in vitro digestion, and RNA deep-sequencing data sets have been formatted for convenient viewing in the UCSC genome browser (files can be downloaded from
http://www.pro-sirna.com/lab/data/).
Table S3 provides the largest collection of
in vivo bacterial RNase III cleavage sites in perfectly matched dsRNAs, which should be a useful resource for future studies of the function of RNase III in
E. coli.
A major advantage of p19 capture is that it was performed in bacterial cells with intact RNase III, potentially avoiding secondary effects caused by RNase III deficiency in RNase III mutant cells used in some studies (
24,
25). This method could be readily adapted to study asRNA in other bacterial species without requiring the generation of an
rnc null mutant, which is lethal for certain species, like
Bacillus subtilis (
57). RNase III degrades perfect dsRNAs generated from the pairing of sense and antisense transcripts but also can cleave structured RNAs that contain perfectly or imperfectly paired double-stranded regions (e.g., rRNA precursor [
37] and R1.1 RNA of T7 phage [
58]). There is no simple way to separate the antisense-dependent effects of RNase III. However, p19 only binds perfectly paired ∼21-nt dsRNAs (
44), such as would arise from antisense transcripts pairing with sense transcripts, but not imperfect duplexes that would arise in structured regions of RNA, providing a specific way to capture antisense transcripts that pair with sense transcripts in
cis.
The most abundant p19-captured dsRNA clusters, which were mostly found in other studies, are least likely to be caused by transcriptional noise. Shorter asRNAs were generally detected only in RNase III-deficient bacteria, suggesting asRNA transcription and RNase III degradation of dsRNAs promote more efficient sense RNA decay (
Fig. 6 and
S3).
cspD appears to be an example of RNase III-regulated protein production mediated by a
cis-acting asRNA (
Fig. 7). RNase III might be essential for degrading
cspD sense mRNA.
cspD asRNA covers a substantial region of the sense RNA, and the dsRNA might mask cleavage sites of other RNases (e.g., RNase E) and stabilize the
cspD sense RNA. A similar mechanism in which asRNA stabilizes sense RNA and impedes RNase degradation has been described for the
gadY small RNA, which stabilizes overlapping
gadX mRNA (
59). To further confirm asRNA and RNase III regulation on
cspD gene expression, the antisense promoter at this locus could be cloned and modified by mutagenesis, and the resulting effects on CspD protein expression could be tested in future work.
Table S4 shows a comparison of our method with previous methods that have identified RNase III targets (
2,
24,
25,
56,
60,
61). The use of enzymatic treatment and other tools in previous methods might introduce unknown bias. For example, the J2 anti-dsRNA antibody has known preferred sequence specificity (
62). However, it was surprising to observe a GC bias in the middle section of p19-captured small dsRNAs isolated in
E. coli (
Fig. 8b) and from RNase III-digested dsRNAs followed by p19 pulldown (
Fig. 3e). Previous studies on the binding preference of p19 focused on its dsRNA length selection property and showed that p19 bound 21-bp synthetic dsRNAs with high affinity (dissociation constant [
Kd] in the picomolar to nanomolar range) without any obvious sequence bias (
32,
43,
44). Thus, the GC bias of p19 appears to be subtle and might only be discovered when thousands of sequences are tested, as in our study (
Fig. 3). A previous study found that p19 interacts with the phosphate groups localized to the central portion of an siRNA (
43). We hypothesize that ∼21-nt dsRNAs with abundant GC pairs in the middle section form a more stable A-form helix structure with certain features preferred by p19 protein. This finding of GC bias for p19 protein might have biological implications for the function of p19 as an RNA silencing suppressor for plant tombusviruses. For example, the genomic GC content is 33.6% for tomato and 47% for tomato bush stunt virus, raising the intriguing possibility that p19 prefers virus-derived siRNAs over endogenous siRNAs of the plant host to selectively protect the virus from RNA silencing
Bacterial RNase III previously was thought to recognize structural features (A-form helix) of dsRNA rather than a sequence motif (
63). However, we found hot spots in p19-captured endogenous and exogenous 21- to 22-bp duplexes, which were caused, at least in part, by bacterial RNase III. Sequence analysis of the large data sets of endogenous dsRNAs we retrieved revealed a strong preference for AU-rich sequences in the 3 nt on either side of the cleavage sites and for GC enrichment in the overhangs (
Fig. 3e and
8b). This sequence analysis suggests that
E. coli RNase III prefers to cut at the sides of two GC-rich nucleotides flanked by AU-rich regions. This finding is generally consistent with the consensus sequences of RNase III cleavage sites for single-stranded structured RNAs presented in Nicholson (
64). Moreover, introducing GC pairs adjacent to an RNase III cleavage site conferred RNase III resistance (
64,
65). Recently, Altuvia et al. sequenced 5′ monophosphorylated RNA fragments from both WT and
rnc mutant
E. coli and identified 1,003 RNase III cleavage sites, which revealed that the 2-nt overhangs between the 2 cleavage sites involve at least one G/C (
56), consistent with our findings (
Fig. 3e and
8b). However, only 2 of the ∼4,000 RNase III cleavage sites that we identified with high confidence overlapped the sites found by Altuvia et al. (
51) (
Table S3). This low degree of concordance suggests our method identified a distinctive set of RNase III cleavage sites in perfectly paired dsRNA that could be missed by previous methods, which derived mainly from single-stranded structured RNAs. This highlights another major limitation of our method: it cannot identify RNase III cleavage sites in single-stranded RNAs with intramolecular secondary structure.
Surprisingly, although current models propose that Dicer, an RNase III family enzyme, cuts dsRNAs from the 3′end in a phased manner without bias (
66), our
in vitro digestion data (E4 in
Fig. 2) also found that human Dicer produced many internal short RNA peaks and has some cleavage bias. In fact, recent studies have shown sequence preferences for RNase III class enzymes, including Mini-III in
Bacillus subtilis (
67), yeast Rnt1p (
68), Dicer-like enzymes in
Paramecium (
69), and
Aquifex aeolicus RNase III (
70). A GC bias was also found in plant virus-derived siRNAs (
71). Therefore, sequence bias may be a general property of RNase III enzymes. Further analysis of p19-captured dsRNAs in additional bacterial species may help to unravel the mechanisms underlying sequence bias of RNase III class enzymes. Since the bacterial CRISPR system uses RNase III to make guide RNAs, any RNase III sequence bias potentially influences the selection of genes or cleavage sites efficiently targeted by CRISPR.
In summary, our study presents a new method for identifying and studying asRNA and RNase III products in E. coli that could be adapted to study other bacteria. To identify asRNA loci from the bacterial genome, it is better to express p19 protein from a genomic locus rather than from a plasmid, since the plasmid can generate abundant dsRNAs. p19-captured small dsRNA clusters mark genomic loci where overlapping sense and antisense transcriptions occur in E. coli. However, for this method to work, the overlapping sense and antisense transcripts must form dsRNAs, and those dsRNA regions must be processed into short dsRNAs of ∼21 bp. Both E. coli RNase III and p19 protein GC preferences may have contributed to the hot spots we identified in p19-captured small dsRNAs. Despite certain limitations and bias, the p19-capture method is useful to confirm that dsRNAs are formed and cleaved inside bacterial cells and to reveal exact RNase III cleavage sites within perfectly matched dsRNAs. Our study indicates that RNase III controls dsRNA abundance in bacteria. More work is needed to understand the role of asRNA in bacteria and the consequences of not efficiently clearing the dsRNAs that form.