A system to identify and enrich rare IFN+ cells.
Influenza virus only rarely triggers IFN expression in infected cells (
10–12)—a fact that poses a challenge for the study of IFN induction in single cells. Therefore, we developed a method to identify and enrich rare IFN
+ cells by creating A549 cells that carried IFN reporters consisting of a type I (
IFNB1) or type III (
IFNL1) promoter driving expression of a cell-surface protein (LNGFRΔC [
50,
51]) followed by a fluorescent protein (
Fig. 1A). Cells that activate the IFN reporters can be enriched by magnetic-activated cell sorting (MACS) or identified by flow cytometry. The reporters were efficiently activated by infection with a strain of Sendai virus (
52) that potently induces IFN (see Fig. S1A in the supplemental material), and activation of the type I and type III IFN reporters was highly correlated in our cells (Fig. S1B; further validated by the single-cell transcriptomics below). Therefore, for the rest of this paper, we use “IFN expression” to refer to combined expression of type I and III IFNs.
We generated a stock of A/WSN/1933 (H1N1) influenza virus (here referred to as WSN) directly from reverse genetics plasmids (
53), and passaged this stock at a low multiplicity of infection (MOI). This process ensures that the viral stock is relatively “pure,” with only low levels of the large internal deletions and other defects that arise in stocks passaged at a high MOI (
54). As described in the next subsection, our stock actually consisted of a mix of two viruses: wild-type WSN and a variant of this virus that carries synonymous viral “barcodes” near the termini of each gene. This viral stock activated the IFN reporter in ∼0.5% of infected cells (
Fig. 1B), a frequency roughly comparable to that reported in prior studies (
11,
12). We also validated that MACS for the cell surface protein driven by the IFN reporter enriched the IFN
+ cells by >50-fold (see Fig. S3).
Combined transcriptomics and virus sequencing of single infected cells.
We developed the approach shown in
Fig. 2 to obtain the entire transcriptome and the full sequences of all viral genes in single cells. First, we generated the viral stock described in the previous subsection, which consisted of a mix of wild-type WSN and a “synonymously barcoded” variant that contained two engineered synonymous mutations near each termini of each gene (see File S2). These viral barcodes allow us to identify coinfections from single-cell transcriptomic data (
12) and provide a control for PCR artifacts during full-length sequencing of viral transcripts (see below). We used this viral stock to infect A549 IFN reporter cells (
Fig. 2A) at a dose that led to detectable viral transcription in ∼25% of cells (this moderately low MOI reasonably balances our desire to limit the number of coinfections with the cost of performing transcriptomics on uninfected cells). From 12 to 13 h postinfection, we used MACS to enrich cells that activated the IFN reporter. To ensure the presence of IFN-negative cells, we added back nonenriched cells to ∼10% of the total. We also added uninfected canine cells to ∼5% of the total as a control for multiplets and to estimate the background amount of viral mRNA detected in truly uninfected cells.
We processed the cells on a commercially available platform (
55) that isolates cells in droplets and reverse transcribes polyadenylated mRNAs to append a unique cell barcode to all cDNAs in each droplet and a unique molecular identifier (UMI) to each cDNA molecule (
Fig. 2B). Because influenza virus mRNAs are polyadenylated (
56), this process appends cell barcodes to viral as well as cellular mRNAs. Furthermore, because virtually the entire influenza virus genome is transcribed, the cell-barcoded cDNA spans almost all 13,581 nucleotides in the segmented viral genome: the only portions not covered are one universally conserved nucleotide upstream of the transcription start site (
57) and 17 to 22 highly conserved nucleotides downstream of the polyadenylation site (
56) in each of the eight viral gene segments.
We used a portion of the cell-barcoded cDNA for standard single-cell transcriptomics by Illumina 3′-end sequencing (
Fig. 2C), but we also took a portion and enriched for full-length viral molecules by PCR (
Fig. 2D). We performed PacBio sequencing on these full-length viral cDNAs to generate high-accuracy circular consensus sequences (CCSs) (
58). These CCSs retain the cell barcodes, and with sufficient sequencing depth, we obtained CCSs from multiple unique UMI-tagged cDNAs for each viral gene in each cell. Because most cells were infected by just one or two virions, we were able to build a consensus of CCSs for each viral gene in each cell to determine the sequence(s) of these virions. Combining this information with the 3′-end sequencing determined the entire transcriptome and the full sequences of the infecting virions in single cells (
Fig. 2E).
Transcriptomic analyses of single IFN+ and IFN− influenza virus-infected cells.
We obtained transcriptomes for 1,614 human (A549) cells and 50 of the uninfected canine cells that were spiked into the experiment as a control (
Fig. 3A). We also obtained 12 transcriptomes with a mix of human and canine transcripts; from the number of such mixed cell-type transcriptomes, we estimate (
59) that ∼11% of the transcriptomes were derived from multiple cells. To remove some of these multiplets along with low-quality droplets, we filtered transcriptomes with unusually high or low numbers of cellular transcripts as is commonly done in analysis of single-cell transcriptome sequencing (scRNA-seq) data (
60). After this filtering, we retained 1,490 human cells for further analysis (
Fig. 3B).
To identify infected cells, we examined the fraction of each transcriptome derived from virus (
Fig. 3C). As expected, only a small fraction (∼0.7%) of transcripts in the uninfected canine cells was viral; this low-level background was likely from lysed cells that released ambient viral mRNA. We tested whether each cell contained significantly more viral transcripts than expected under a Poisson model given this background fraction and classified 290 human cells as definitively infected with influenza virus (
Fig. 3C). We classified the other cells as uninfected, although it is possible that some were infected with virions that produced very little mRNA. The distribution of the amounts of viral mRNA across infected cells is shown in the inset in
Fig. 3C. As in our prior work (
12), the distribution is extremely heterogeneous: many infected cells had only a few percent mRNA derived from virus, but viral mRNA comprised more than half the transcriptome of a few cells.
We called the presence or absence of each viral gene in each infected cell, again using a Poisson model parameterized by background fractions estimated from uninfected canine cells. We called presence/absence of genes rather than transcripts, since the two genes that encode multiple transcripts (M1/M2 from the M gene, and NS1/NS2 from the NS gene) do so via alternative splicing that leaves both isoforms with the same termini, making them indistinguishable by 3′-end sequencing.
Figure 3D (top) shows that 162 of 290 infected cells expressed all eight genes (see Fig. S4 for frequencies of individual genes). This measured frequency of infected cells expressing all eight genes is slightly higher than in our own prior work using the WSN strain (
12) and slightly to substantially higher than that reported in studies by others using different viral strains or methodologies (
15,
43,
61,
62).
The amount of viral mRNA was lower in cells that failed to express viral genes (
Fig. 3D, bottom). However, viral burden remained highly variable even after conditioning on the number of viral genes: some cells that failed to express one or even two genes still derived >50% of their mRNA from virus, while other cells that expressed all genes had only a few percent mRNA from the virus (
Fig. 3D, bottom). Consistent with our prior work (
12), despite the wide variation in absolute expression of viral genes, their relative expression was fairly consistent (
Fig. 3E) and generally matched values from older bulk studies (
63).
By examining the synonymous viral barcodes near the 3′ termini of transcripts, we determined that 38% of cells were coinfected with wild-type and synonymously barcoded virions (
Fig. 3F; cells called coinfected if a binomial test rejected the null hypothesis that 95% of viral mRNA is from one viral barcode variant). From
Fig. 3F, we estimate (
59) that 63% of infected cells were coinfected. Interestingly, this coinfection rate is higher than expected from the relative numbers of infected and uninfected cells (
Fig. 3C) if infection was Poisson. This discrepancy could arise if the MACS for IFN
+ cells also enriched coinfected cells, if infection was not truly Poisson, or if coinfection complemented otherwise transcriptionally defective virions to increase the likelihood that we identify a cell as infected. The first explanation seems unlikely, as there was no tendency for coinfected cells to express more IFN (see Fig. S5). Therefore, we favor the latter two explanations, both of which have been demonstrated for other viruses (
64,
65). The moderately high rate of coinfection may also explain why more cells in our experiment expressed all eight viral genes compared to those in some prior studies, as a coinfecting virion can complement a missing viral gene.
We next examined expression of IFN and ISGs (
Fig. 3G and Fig. S6). More than 20% of infected cells were IFN
+, indicating that the MACS successfully enriched IFN
+ cells far beyond their initial frequency. The expression of type I and type III IFN was highly correlated in single cells, justifying our decision to collapse both classes under the single label of “IFN” in the analyses that followed (see Fig. S7). Few (∼1.3%) uninfected cells were IFN
+; the few that were present might be because the MACS enriched for rare cells that activated IFN in response to nonviral ligands (
66–68) or because some cells that we classified as uninfected were actually infected at low levels. The difference in the frequency of IFN positivity among infected and uninfected cells in
Fig. 3G was highly significant (
P < 10
−5, Fisher’s exact test). Many more cells expressed ISGs than IFN itself (Fig. S6A). The IFN
+ cells were a subset of the ISG
+ cells: IFN
+ cells always expressed ISGs, but many ISG
+ cells did not express IFN (Fig. S6B). These results are consistent with the established knowledge that IFN is expressed primarily in cells that directly detect infection but that ISGs are also expressed via paracrine signaling in other cells (
1,
2).
Finally, we qualitatively examined how expression of viral genes, IFN, and ISGs relates to the overall structure of the high-dimensional transcriptomic data. Figure S8 shows unsupervised t-SNE clustering (
69) of the cells. Cells expressing high levels of viral genes, IFN, and ISGs clustered together, and most of the structure in the t-SNE plot that is not associated with these genes involves uninfected and IFN
− cells.
Full genotypes of viruses infecting single IFN+ and IFN− cells.
We next used PacBio sequencing (
Fig. 2D; see also File S3) to determine the full sequences of the viral genes in single infected cells. We obtained >200,000 high-quality PacBio CCSs that mapped to an influenza virus gene and contained a cell barcode and UMI (see Fig. S9). The synonymous viral barcodes at both termini of each gene enabled us to confirm that PCR strand exchange was rare (see Fig. S10), meaning that the vast majority of CCSs correctly link the sequence of the transcript to cell barcodes and UMIs that identify the cell and molecule of origin.
After calling the presence/absence of each viral gene in each cell as described in the previous section, we called mutations if they were found in at least two CCSs originating from different mRNAs (unique UMIs) and at least 30% of all CCSs for that gene in that cell. For cells coinfected with both viral barcode variants, we called mutations separately for each viral variant. This strategy reliably identifies mutations in virions that initiate infection of cells infected with at most one virion of each viral barcode variant (∼75% of infected cells) as well as high-abundance mutations in cells coinfected with multiple virions of the same viral barcode. It will not identify mutations that arise within a cell after the first few rounds of viral genome replication, since such mutations will not reach 30% frequency in that cell. Therefore, analogous to somatic variant calling in tumor sequencing (
70,
71), there is a limit to our detection threshold: we cannot identify mutations that occur in just a small fraction of transcripts in a cell.
We were able to call the sequences of all expressed viral genes in the majority of infected cells (see Fig. S11). We were most effective at calling full viral genotypes in cells that expressed large amounts of viral mRNA and were infected by only one viral barcode variant (Fig. S11), but we also called full genotypes for many cells that had low viral burden or were coinfected by both viral barcode variants.
The 150 cells for which we called the full viral genotypes are shown in
Fig. 4 (see also File S4). Visual inspection of this figure reveals a wealth of information. For instance, the cell with the highest viral burden (cell 1 in
Fig. 4, which has 65% of its mRNA from virus) was infected by a virion that expressed unmutated copies of all eight genes and did not induce detectable IFN. However, 12 of the other 13 cells with at least 50% of their mRNA from virus were infected by virions that had a mutation or failed to express a gene, and five of these cells produced IFN. As expected, all cells infected by virions that failed to express a component of the viral polymerase complex (PB2, PB1, PA, or NP) expressed small amounts of viral mRNA, since they were limited to primary transcription using incoming proteins (e.g., cell 132 and cell 143). The two cells that expressed the most IFN (cell 13 and cell 123) lacked the viral NS gene that encodes the virus’s primary IFN antagonist (
24,
25). Many other IFN
+ cells had different defects, such as large internal deletions (e.g., cell 5 and cell 89) or amino acid mutations (e.g., cell 9, cell 28, and many others).
However,
Fig. 4 also reveals stochasticity that is independent of viral genotype. This stochasticity sometimes acts to the detriment of the virus and sometimes to the detriment of the cell. As an example of the former case, expressing unmutated copies of all eight genes did not guarantee high viral gene expression and successful innate immune evasion: for instance, the unmutated virion that infected cell 139 only managed to express viral mRNA to 6% of the total transcriptome, and the unmutated virion that infected cell 105 still induced IFN. But in other cases, the stochasticity allows a defective virus to still escape immune recognition. For instance, there are a number of cells (e.g., cell 62 and cell 78) that did not activate IFN despite being infected by virions that failed to express NS.
Viral defects associated with viral gene expression and IFN induction in single cells.
To systematically assess viral features associated with infection outcome, we divided the 150 cells in
Fig. 4 into those that expressed unmutated copies of all eight genes (disregarding synonymous mutations) and those that did not.
Figure 5A shows that the 49 cells infected by full unmutated virions had a significantly tighter distribution of the amount of viral mRNA per cell than the other 101 cells as quantified by the Gini index (
72) (see also File S5). Therefore, viral defects are a major contributor to the heterogeneity in viral transcriptional burden.
Some viral defects also contribute to IFN induction. Specifically, cells infected by incomplete or mutated virions expressed IFN more frequently than cells infected by virions that expressed unmutated copies of all genes (
Fig. 5B), although this difference was not statistically significant (
P = 0.12, Fisher’s exact test). However, the association was significant for certain classes of viral defects: absence of NS and amino acid mutations in PB1 were significantly enriched in IFN
+ cells, and amino acid mutations in NS and deletions in HA were weakly enriched (
Fig. 5C). The only trend that remained significant at a false discovery rate (FDR) of 10% was absence of NS. This lack of statistical significance after FDR correction could be due to the relatively modest number of fully sequenced infected cells (just 150). The validation experiments in the next section show that many of the viral mutations in IFN
+ cells do in fact increase the rate of IFN induction.
One other interesting trend emerged from the single-cell data. There was no difference in the amounts of viral mRNA between IFN
+ and IFN
− cells that expressed NS (
Fig. 5D). But among cells that lack NS, cells with more viral mRNA were significantly more likely to be IFN
+ (
Fig. 5D); this finding is elaborated on in the validation experiments below. Overall, the lack of reduced viral gene expression in IFN
+ cells suggests that autocrine IFN signaling typically occurs too late to suppress viral transcription, and the well-known inhibitory effect of IFN against influenza virus depends mainly on paracrine signaling.
Validation that viral defects in single IFN+ cells often increase IFN induction.
To test if the viral defects identified in single IFN+ cells caused increased IFN expression, we used reverse genetics to generate bulk stocks of viruses with some of these defects.
The viral defect most strongly associated with IFN induction was failure to express the NS gene (
Fig. 4 and
5C). Although it is sometimes possible to use complementing cells to generate influenza viruses lacking a specific gene (
73,
74), we were unable to generate viruses that lacked NS. The NS gene encodes two proteins (NS1 and NS2), the first of which is influenza’s primary innate immune antagonist (
24,
25). We therefore mimicked the absence of NS by creating a mutant virus (which we term “NS1stop”) that had multiple stop codons early in the NS1 coding sequence.
The single-cell data also showed that amino acid substitutions in proteins encoded by the PB1 and NS genes were enriched in IFN
+ cells (
Fig. 4 and
5C), and so we created mutant viruses with some of these substitutions: PB1-D27N, PB1-G206S, PB1-K279R, PB1-T677A, NS1-A122V, and NS2-E47G.
Finally, prior work has suggested that virions with internal deletions in the polymerase genes can induce higher levels of IFN (
16,
38–42). Although such deletions are not significantly enriched among IFN
+ cells in our single-cell data (
Fig. 5C), there was a coinfected IFN
+ cell where one viral variant had a deletion in PB1 spanning nucleotides 385 to 2163 (cell 5 in
Fig. 4). We therefore created a virus carrying this deletion and propagated it in cells constitutively expressing PB1 protein.
We tested the rate of IFN induction by each viral stock using the reporter cells.
Figure 6 shows that five of the eight mutant viral stocks induced IFN more frequently than a wild-type viral stock. The strongest IFN induction was by the NS1stop virus, but the PB1 internal deletion and three of the point mutant viruses (PB1-D27N, PB1-T677A, and NS1-A122V) also induced IFN significantly more frequently than the wild type. The other three point mutants (PB1-G206S, PB1-K279R, and NS2-E47G) did not increase IFN induction, an unsurprising finding since we expected some mutations without an IFN-enhancing effect to be found in IFN
+ cells by chance. Overall, the results in
Fig. 6 validate that the viral defects in single IFN
+ cells often cause increased IFN production.
However, IFN induction remained stochastic even for the most potently IFN-inducing viral mutants.
Figure 6 shows flow cytometry data (see also Fig. S12), which is itself a single-cell measurement, albeit one that does not report the viral genotype. As can be seen from these data, none of the mutant viral stocks induced IFN in more than 20% of infected cells. Of course, these mutant virus stocks are themselves genetically heterogeneous, as many virions will have additional defects similar to that revealed by our single-cell sequencing of the “wild-type” viral stock. However, our single-cell data show that IFN induction was stochastic even for infections that shared the same defect, such as absence of NS (e.g., compare cell 62 and cell 69 in
Fig. 4). Therefore, the experiments in this section not only validate some specific viral defects that increase IFN induction but also show that induction remains stochastic even with these defects.
The IFN-inducing viral defects act by diverse mechanisms.
Some of the viral defects in IFN
+ cells are easy to reconcile with existing knowledge: for instance, NS1 is the virus’s primary IFN antagonist (
24,
25), and internal deletions are prevalent in immunostimulatory viral stocks (
16,
38–42). Other defects are more surprising: for instance, it is not obvious why amino acid mutations in PB1 increase IFN induction. We therefore designed experiments to interrogate some of these defects in more detail.
We first focused on one of the strongest trends from the single-cell data: increased viral gene expression was associated with increased IFN induction when the infecting virion failed to express NS, but not otherwise (
Fig. 5D). To confirm this observation, we performed a flow cytometry analysis of the reporter cells infected by different immunostimulatory viral mutants to examine the association between expression of a viral gene product (HA protein) and IFN induction. Consistent with the single-cell data, cells that expressed more HA were much more likely to turn IFN
+ when infected with the NS1stop or NS1-A122V mutants but not when infected with any of the other viral variants (
Fig. 7). This fact suggests that when there are high levels of viral transcription, NS1 becomes more important as a buffer against detection of viral products.
We hypothesized that the immunostimulatory mutations to PB1 might cause the viral polymerase to produce aberrant products, in line with recent work showing that mutations to PB2 can lead to the generation of aberrant RNAs that trigger RIG-I (
35,
36). To investigate if the PB1 mutations might perturb polymerase activity, we examined their location in a structural model of the polymerase complex (
Fig. 8A). The IFN-enhancing PB1 mutation T677A occurs at the tip of a helix that interacts with the 3′ terminus of the RNA template as it enters the channel above the active site, whereas the IFN-enhancing D27N mutation is deeper in the polymerase, close to the binding pocket of the 5′ terminus of the template. Therefore, both mutations could plausibly alter the polymerase’s interactions with the RNA template.
To test if the PB1 mutations affect activity, we transfected 293T cells with plasmids that express wild-type or mutant PB1 protein along with the other proteins in the polymerase complex (PB2, PA, and NP) and full-length viral RNA (vRNA) for the NA segment. Both polymerase mutations increased IFN expression in this assay (
Fig. 8B), indicating that they have an immunostimulatory effect in the context of an active viral polymerase even when other viral components are absent. We next directly measured polymerase activity on the full-length vRNA template by extracting total RNA and quantifying replication (vRNA) and transcription (mRNA) products by primer extension. Both immunostimulatory PB1 mutations had activities that were significantly different from those of the wild type, despite being expressed at wild-type protein levels (
Fig. 8B). Specifically, the T677A mutant had higher levels of both activities, whereas the D27N mutant had reduced levels of both—although D27N still retained activity far in excess of that of a control active-site mutant (
Fig. 8C). We speculated that the mutations might alter polymerase processivity, leading to accumulation of aberrant RNA products that activate the innate immune system (
35,
38–42). We therefore repeated the activity assays using a short 246-nucleotide template (
35) in place of the full-length NA vRNA (
Fig. 8D). On this shorter template, the activity of the D27N mutant was now similar to that of the wild type, while the activity of the T677A mutant remained higher than that of the wild type (although not significantly so in three biological repeats). Therefore, the two immunostimulatory PB1 mutations have distinct effects on the polymerase: D27N reduces processivity thereby favoring shorter RNA products, whereas T677A increases overall activity which could also lead to accumulation of aberrant RNA products.
Overall, the results in this section show that the diverse range of immunostimulatory viral defects identified in single cells act by diverse processes, demonstrating that viral variation influences not only the rate of IFN induction but also the factors that contribute to this induction.