hnRNP L has been well documented to control the splicing of the
CD45 gene in both mouse and human T cells (
9–12). However, the dramatic developmental defect observed in hnRNP L-deficient thymocytes, together with the high abundance of this protein in T cells (
12,
26) (
Fig. 1a), suggests that hnRNP L controls the expression of a large set of functionally important genes. Therefore, to begin to understand the physiological impact of hnRNP L on T cell function, we performed cross-linking and immunoprecipitation followed by high-throughput sequencing (CLIP-seq) in primary human CD4
+ T cells (
20,
21).
All previous studies of hnRNP L in T cells have shown this protein to function similarly in resting and activated cell states, with no data suggesting a widespread change in the binding specificity of this protein in response to T cell stimulation (
10,
27). Nevertheless, since our goal is to understand the role of hnRNP L in promoting T cell function, we performed CLIP in parallel in quiescent (resting) cells and cells activated through the T cell receptor, since these two cell conditions represent critical states of T cell physiology. Briefly, purified CD4
+ T cells were obtained from three healthy donors. For each donor, half the cells were stimulated in culture with antibodies against CD3 and CD28 (T cell receptor and coreceptor), while the other half were maintained in medium alone. Direct protein-RNA interactions were fixed in living cells by treatment with UV light, which induces covalent cross-links between proteins and the RNAs to which they are directly bound (
21). Cells were then lysed; RNA was fragmented to a size range of 30 to 110 nt; and hnRNP L RNA complexes were stringently purified using a well-described antibody to endogenous human hnRNP L (see Fig. S1 in the supplemental material). The efficiency of the immunoprecipitation and the consistency of hnRNP L expression in resting and stimulated CD4
+ T cells are shown in
Fig. 1a. Following isolation of the hnRNP L RNA complexes from cells, RNAs were released from the protein, tagged with RNA linkers, and subjected to high-throughput sequencing (see Materials and Methods).
hnRNP L RNA interaction profiles in T cells.
We obtained a total of ∼200 million reads from the 3 pools of resting CD4
+ cells and ∼100 million reads from the stimulated samples (
Fig. 1b). In each case, more than 80% of reads mapped unambiguously to the genome, corresponding to a final total of 13 to 15 million unique alignments (
Fig. 1b; see also Table S1 in the supplemental material). Of these unique aligned reads (i.e., “CLIP tags”), ∼23% mapped within protein-coding transcripts (
Fig. 1b, RefSeq alignments), 6% to established noncoding RNAs, 19% to antisense RNAs, and the remaining 51% to mitochondrial RNAs or RNAs deriving from intergenic regions of the genome (see Table S2 in the supplemental material). We note that the numbers of unique alignments, as well as the genomic distributions of reads, are virtually identical for the resting and stimulated samples despite the 2-fold differential in raw reads. We thus conclude that the sequencing depth of the stimulated samples is essentially a saturating sampling of hnRNP L binding and that the increased sequencing depth from the resting samples provides little extra discovery. We also note that the majority of intergenic alignments represented isolated reads, suggesting that these are due to spurious binding events and/or background noise in the sequencing (see Table S2 in the supplemental material).
Because our primary interest is to understand the role of hnRNP L in shaping protein expression in T cells, we focused on those reads within protein-coding transcripts (
Fig. 1b, RefSeq alignments). In order to identify a reliable binding profile of hnRNP L within transcripts, we defined binding sites empirically, using an algorithm similar to published methods that accounts for transcript length and expression (
22) (see Materials and Methods). To identify sites of reproducible hnRNP L RNA interaction, we required that a binding site be represented in at least two of three biological replicates. By this criterion we observed, in total, 49,619 sites of hnRNP L binding in resting CD4
+ cells and 47,137 in anti-CD3- and anti-CD28-stimulated cells (
Fig. 1b). We note that the overlap between biological samples was high: ∼85% of total peaks met the requirement of being present in at least two of the replicates (see below). Moreover, on average, each site was supported by 8 to 12 reads, although a subset of sites were supported by many more (see Table S3 in the supplemental material).
As expected from general predictions of hnRNP function in pre-mRNA splicing, the majority of the binding sites we identify occur within proximal (within 300 nt of an exon) and distal intronic regions (
Fig. 1c). Furthermore, hnRNP L binding sites are depleted within coding exons but are enriched in 3′ UTR exons (
Fig. 1c), in agreement with previously identified roles for hnRNP L in the regulation of 3′-end processing and the modulation of miRNA regulation (
5,
28). Finally, hexamer enrichment analysis reveals a strong preference for CA repeat elements, as evidenced both in the 2 most enriched hexamers and by multiple sequence alignment of the top 20 enriched hexamers (
Fig. 1d and
e; see also Table S4 in the supplemental material). Such a bias toward CA repeats is anticipated from previous biochemical studies of the binding specificity of hnRNP L (
29). In sum, the concurrence of the locations and sequence bias of the CLIP-identified hnRNP L binding sites with those from previous studies, together with the presence of sites of known hnRNP L RNA regulatory interactions within CLIP-derived binding profiles (see below), provides confidence that we have reliably identified major binding sites of hnRNP L across the transcriptome of CD4
+ T cells.
Previous studies from our lab and others have used Jurkat cells, an immortalized T cell line, to investigate the function of hnRNP L in T cell biology (
9–11,
30). In order to correlate our findings in primary CD4
+ cells to Jurkat cells and to determine the utility of Jurkat cells for future mechanistic studies of hnRNP L function, we performed CLIP analysis in parallel with that described above using JSL1 Jurkat cells (see Fig. S1 in the supplemental material). As with the CD4
+ cells, we used triplicate biological samples of JSL1 cells grown in medium alone (resting) or stimulated with the phorbol ester PMA, which mimics T cell signaling in these cells (
31). In these experiments, we collected a total of 51 million and 68 million reads from the resting and stimulated cells, respectively, from which we defined 41,440 binding sites in resting cells and 32,156 binding sites in stimulated cells by using the criteria described for CD4
+ cells (
Fig. 2a). Notably, the distribution of transcript features bound by hnRNP L in JSL1 cells is similar to that in CD4
+ cells (
Fig. 2b). Additionally, the sequence motifs enriched within hnRNP L binding profiles are consistent both with previous experiments (
29) and with the results for CD4
+ primary T cells (
Fig. 2c and
d). Interestingly, using expression data for resting and stimulated JSL1 cells from previous studies (
24), we find that there is no general correlation between the density of CLIP tags aligning to a gene and its overall expression level (see Fig. S2 in the supplemental material). This lack of correlation of CLIP detection and gene expression confirms that the abundance of CLIP tags is a true reflection of the binding preference of hnRNP L.
CLIP-seq identifies consistent binding profiles in JSL1 and CD4+ T cells.
Given the similarity between the sequence features and genomic annotations of the hnRNP L binding profiles obtained in CD4
+ and JSL1 T cells, we asked how consistent the binding of hnRNP L was between cell types and growth conditions. By calculating the percentage of total overlapping nucleotides for the two cell types, or for the two conditions, we find significantly greater overlap between the hnRNP L CLIP samples from the four cell populations than between randomized binding profiles (
Fig. 3a). For each cell type, we also investigated the number of peaks in resting cells that fell within 50 nt of a peak in the corresponding stimulated cells (
Fig. 3b and
c). Strikingly, for both CD4
+ and JSL1 cells, at least one-third of the peaks are shared between the resting and stimulated conditions by this logic. For a further ∼50% of binding sites defined as “biased,” we observe reads in both cell states, although these reads reach significance thresholds under only one of the two conditions. Indeed, at most ∼20% of hnRNP L binding sites in any cell appear to be truly “condition specific,” in that reads are identified in only one of the growth states investigated. While this minority population of condition-specific binding events may be of interest (see below), our data clearly demonstrate that the bulk of hnRNP L binding is conserved between primary and cultured T cells as well as between resting and stimulated states. Specifically, we identify a set of 4,585 common hnRNP L binding regions that are present in all four cell types analyzed (see Table S5 in the supplemental material). These common regions occupy 2,460 genes in the T cell transcriptome. Importantly, among these common hnRNP L binding sites, we observe the two best characterized hnRNP L functional sites of interaction, namely, the ESS1 regulatory element in
CD45 exon 4 (
9) (
Fig. 3d) and an autoregulatory intronic site in
HNRNPL (
32) (
Fig. 3e).
hnRNP L binds transcripts from the Wnt and TCR signaling pathways.
Given the presence of known targets of hnRNP L regulatory function the common binding regions, we focused on this set of 4,585 binding events to identify new functional targets of hnRNP L and to begin to understand how this protein influences T cell development and function. First, we analyzed the KEGG pathways enriched the common target genes. Strikingly, genes involved in Wnt signaling (
P, 1.67E−4) and T cell receptor (TCR) signaling (
P, 0.0011) are in the most overrepresented pathways among hnRNP L-bound transcripts (
Table 1). Importantly, Wnt signaling is critical for thymic development (
33), while TCR signaling is essential for both the development and the function of T cells (
34). We also analyzed biological process GO terms with DAVID, which revealed a strong enrichment of terms related to transcription and RNA-based gene regulation among common hnRNP L-bound transcripts (
Table 1). Together, these analyses suggest that hnRNP L may broadly affect T cell function both directly, by regulating key signaling pathways, and indirectly, by altering the expression of other DNA- and RNA-binding proteins that control gene expression.
Novel targets of hnRNP L-dependent splicing regulation.
There are numerous mechanisms by which the binding of hnRNP L to a transcript may influence its expression, including regulation of transcription, stability, and efficiency of processing. Because hnRNP L is best characterized as a splicing regulatory protein, we focused this study on determining new targets of hnRNP L splicing regulation. We first identified several instances in which common hnRNP L binding regions (as defined above) were located in introns flanking known alternative exons, and we then assayed the inclusion of these exons in JSL1 cells depleted of hnRNP L (
Fig. 4a). In agreement with the prediction from
Table 1 that hnRNP L regulates genes involved in TCR signaling, T cell development, and RNA synthesis and processing, we find that hnRNP L depletion significantly alters the inclusion of known variable exons in the genes encoding the RNA-binding protein PUM2 (
Fig. 4b) and the transcription factors NFAT, Bcl11A, and TCF3, which are involved in T cell developmental and activation pathways (
35–37) (
Fig. 4c to
e). We also observe hnRNP L-dependent alternative splicing of the mitogen-activated protein (MAP) kinase TAK1 and the GTPase ACAP1, which regulate NF-κB signaling upon immune signaling (
38,
39), and of CCAR1, a coactivator required for Wnt-dependent gene activation (
40) (
Fig. 4f to
h). For all these genes, inclusion of the variable exon either regulates overall protein expression (NFAT5 and CCAR1) or alters the domain structure of the protein (PUM2, Bcl11A, TCF3, TAK1, and ACAP1) (see Discussion). Therefore, hnRNP L-regulated splicing of these genes is likely to impact T cell development and signaling, in agreement with the prediction from
Table 1 and the phenotype of hnRNP L thymic deletion mice (
12).
The case of CCAR1 is particularly interesting, since we discovered that the binding of hnRNP L is in fact not in an intron but rather in an unannotated poison exon (i.e., an exon containing a stop codon). The fact that hnRNP L strongly represses this CCAR1 poison exon, together with our previous data on hnRNP L-mediated repression of CD45 exon 4 (
9), suggests that although binding of hnRNP L to exons is rare (
Fig. 1c and
2b), these events represent robust repressive activity of hnRNP L. Consistently, we identify ∼60 genes that contain common hnRNP L binding sites within or overlapping an exon (see Table S5 in the supplemental material). For five of the hnRNP L-bound exons tested, we find that the variable exon is markedly upregulated upon hnRNP L depletion (
Fig. 5). Importantly, these hnRNP L-regulated exons include those in genes encoding splicing factors (ZRANB2), cell surface receptors (SPG11, IL2RG), intracellular signaling proteins (ARAP1), and a transcription coactivator (SS18), all of which have potential roles in T cell biology.
5′ splice site (5′ss) strength is a determinant of hnRNP L function.
In addition to their functional implications, the newly identified targets of hnRNP L-mediated splicing regulation presented in
Fig. 4 and
5 demonstrate the breadth of the mechanism of hnRNP L function. While exon binding appears to correlate with hnRNP L-dependent repression (
Fig. 5), we observe no clear correlation between intron binding and hnRNP L-dependent splicing regulation. For instance, reduction of hnRNP L levels increases the inclusion of the variable exon of PUM2, whereas it decreases the inclusion of the variable exon in Bcl11A, despite binding on either side of the exon in both instances. Conversely, hnRNP L appears to enhance variable-exon inclusion whether it is bound to the upstream (NFAT5) or the downstream (TAK1) intron. Moreover, ∼50% of exons containing or flanked by common hnRNP L binding sites that we tested for splicing displayed no change in inclusion in response to hnRNP L depletion (see Table S6 in the supplemental material). This lack of defined correlation between binding location and function is consistent both with our previous studies demonstrating that factors in addition to the location of hnRNP L binding determine its functional impact on splicing (
19) and with other studies that have revealed that CLIP-defined binding sites for hnRNPs are not strong predictors of splicing regulation (
41,
42).
To determine if we could increase our ability to utilize the CLIP-defined hnRNP L binding sites to identify novel targets of hnRNP L-mediated splicing regulation, we grouped the 27 exons tested by a variety of parameters, such as intron length, position of the CLIP site, and splice site strength (see Table S6 in the supplemental material). Strikingly, we find that hnRNP L-dependent splicing regulation correlates best with the strength of the 5′ splice site of the alternative exon. Specifically, no alternative exons with 5′ss scores of 10 or greater (MaxEnt [
43]) were regulated by hnRNP L, even when multiple common binding sites were detected close to the variable exon (see, e.g., DIAPH1 and ATM in Table S6 in the supplemental material). In contrast, all of the hnRNP L-regulated exons had 5′ss scores less than 9.5, and 70% of the alternative exons with scores less than 9.5 exhibited hnRNP L-dependent regulation (see Table S6). Notably, no other single feature encompassed all of the 14 validated hnRNP L regulatory events with a discovery rate of 70% or more.
To further validate the relevance of 5′ splice site strength, we tested an additional 14 exons in functionally important genes for hnRNP L-dependent splicing regulation (see Table S6 in the supplemental material). These exons were chosen with a range of 5′ss scores, including two in the window between 9.5 and 10 that was not represented in our initial exon set. In agreement with our predictions, we find that neither exon with a 5′ss score above 9.9 exhibits changes in splicing upon depletion of hnRNP L, while 8 of the 12 exons with 5′ss scores less than 9.9 are regulated by hnRNP L (
Fig. 6; see also Table S6 in the supplemental material). Therefore, we conclude that 5′ss strength is an important criterion in determining regulation by hnRNP L and can be applied to CLIP-identified physical targets to increase the discovery power of functional targets of hnRNP L-regulated splicing. Importantly, using these criteria, we have identified a total of 20 previously unrecognized targets of hnRNP L-mediated splicing regulation, all of which are genes implicated in critical signaling and gene expression pathways in T cells, thus providing further insight into the functional role of hnRNP L in T cell biology.
Condition specificity of hnRNP L binding.
Our analysis of the transcriptome-wide binding of hnRNP L has thus far been focused on the binding sites that are present in all four T cell populations tested, since these reveal much about the ubiquitous role of hnRNP L in T cell biology. However, as mentioned above, we did identify a subset of hnRNP L RNA interactions in both cell types that are condition specific, occurring either entirely in resting samples or entirely in stimulated samples, with no reads observed under the opposite condition (
Fig. 3b and
c). To further investigate the nature of these condition-specific events, we analyzed changes in gene expression for these resting-state-specific and stimulated-state-specific binding sites, using gene expression data that we had obtained previously for JSL1 cells (
24). Remarkably, we find that the majority of condition-specific sites are in genes whose expression does not differ significantly between resting and stimulated samples, demonstrating that the difference in association with hnRNP L is not a secondary consequence of differential gene expression (
Fig. 7a and
b). We also find that these condition-specific binding sites maintain the general bias toward CA repeats that is seen in the common sites (
Fig. 7c and
d; see also Table S4 in the supplemental material), although this bias is less dramatic, particularly within the stimulation-specific peaks (see Table S4 in the supplemental material and Discussion). While the possibility of direct condition-specific regulation of hnRNP L binding is not inconsistent with previous studies in T cells, there are no data to directly support such a model. Moreover, we find that the discovery of condition-specific peaks is diminished the more stringently we require biological replication of a binding site (see Table S7 in the supplemental material). Therefore, it remains possible that only a minor subset of the condition-specific peaks we have defined here truly represent signal-regulated changes in the binding of hnRNP L, while the majority reflect false positives due to limited sequencing depth and biological noise.