INTRODUCTION
Decades of research have demonstrated that viruses are key biological components of aquatic ecosystems (
1). Through top-down control, viruses influence microbial community structure and the movement of organic matter
via the “viral shunt” and “viral shuttle” (
2–4). Viruses also modulate the metabolism of infected cells by hijacking host cell physiology and through virus-encoded metabolic genes (
5), which can have far-reaching implications for biogeochemical cycles (
6,
7). As many of Earth’s aquatic ecosystems are faced with the unprecedent changes in climate (
8,
9), there is growing interest in how environmentally relevant viruses are influenced by climate change, such as how climate modulates viral diversity and production rates (
10). The winter period in temperate freshwater lakes is particularly vulnerable to climate change, where one apparent response is the decline in ice cover duration and thickness (
9,
11). In fact, the frequency of ice-free winters is projected to increase globally in current climate trajectories (
12). Seasonally ice-covered lakes are not biologically dormant during the winter (
13), and yet there is relatively little documented about viral activity during this period (
14).
The Great Lakes contain about 20% of the global freshwater supply (
15) and have substantial contributions to regional socioeconomic services (
16). Winter in the Laurentian Great Lakes is undergoing pronounced changes due to a warming climate (
17). Since the 1970s, Lake Erie has exhibited significant declines in wintertime ice cover extent and duration (
18,
19). Furthermore, ice cover extent drastically alters microbial community structure by shifting the winter ecosystem from one characterized by dense accumulations of filamentous diatoms (
20–22) to one dominated by picoplankton (
23). Putative phage gene fragments (
24) and RNA virus sequences (
25) have been detected during the mid-winter in Lake Erie, but there has been no comprehensive characterization of winter viral activity in the Great Lakes to date. In addition, the relationship between viral community activity and ice cover extent has not previously been explored. Viral community composition and activity are shaped by the surrounding microbial community (
26,
27). Given that ice cover can influence microbial population structure, it is reasonable to expect a relationship between ice cover extent and viral activity. Viral activity could also be directly impacted by other ice cover-dependent factors as well (e.g., the effect of UV exposure on viral decay rates (
28,
29)). Characterizing active viral populations during the winter is an important step toward understanding how virus-mediated processes may be impacted by changing ice conditions.
Targeted gene amplification and shotgun metagenomic approaches have been informative in the study of viruses in the Great Lakes (
24,
30,
31), but a limitation of DNA-based approaches is that they do not reveal virus activity (i.e., replicating viruses) or capture RNA viruses. Metatranscriptomics can facilitate both the discovery of RNA viruses and estimate
in situ activity levels of viruses using transcript abundance as a proxy (
32). Here, we surveyed metatranscriptomes from filtered (>0.2 µm) samples collected from Lake Erie surface waters during two contrasting winters: one winter with high-ice cover (2019, 95% mean maximum ice cover) and a subsequent relatively ice-free winter (2020, 15% mean maximum ice cover) (
33). Samples from the spring months following the ice-free winter were also collected and served as an outgroup in this study. Using hallmark genes as proxies for putative viral populations, we identified phylogenetically diverse viral communities that were transcriptionally active in the winter surface waters. Viral community activity metrics (alpha and beta diversity indices) revealed significant differences in community composition and viral richness (i.e., the number of different viral hallmark gene variants) between the ice-covered and ice-free samples. In addition, viral activity metrics were correlated with microbial community activity metrics, namely with microbial richness and the proportion of diatom reads within the winter libraries. This suggested that the differences in viral activity between the winters may in part be linked to differences in the surrounding microbial community. Although the cause(s) of the observed differences in viral community activity remain uncertain, they coincided with the substantial variation in ice cover extent between the winters.
RESULTS
Preliminary comparison of viral community activity between whole-water and net-concentrated samples
Beta diversity analysis based on the relative transcript abundance of viral hallmark genes showed that the composition of active viral communities was significantly different between the whole-water (no filtration) and net-concentrated (64 µm opening) samples (ANOSIM R = 0.91, P = 0.001) (Fig. S1A). The total number of putative viral hallmark gene variants detected within a library was significantly lower in the net-concentrated relative to the whole-water samples (ANOVA, P < 0.001) (Fig. S1B). Examination of the relative abundance of viral hallmark genes within the libraries revealed that phage (Gp23), NCLDV (MCP), and RNA virus (RdRp) transcripts were detected in each whole-water sample, but that phage and NCLDV transcripts were relatively absent in the net-concentrated metatranscriptomes (Fig. S1C). Since we observed more hallmark gene variants in the unfiltered samples, both in terms of the number of hallmark genes detected and the different virus types detected (phage, NCLDV, and RNA viruses), we focused on the whole-water samples and the net-concentrated metatranscriptomes are not discussed here further.
Physiochemical parameters and phytoplankton abundance estimates
The whole-water samples spanned temporal, climatic, and spatial gradients. Whole-water samples were collected during February–March of 2019 (
n = 4), February–March of 2020 (
n = 10), and May–June of 2020 (
n = 6) across 12 sites (Fig. S2; Table S3). The winter of 2019 was considered a high-ice year (95% mean maximum ice cover) while the winter of 2020 was largely ice-free (15% mean maximum ice cover) (
33). Ice cover ranged from 90% to 100% at the 2019 winter (i.e., February/March) sample sites and was 0% at all 2020 winter sample sites (Table S3). Ice thickness ranged from approximately 2.5 to 15 cm at the ice-covered sites. The concentration of dissolved ammonia, chlorine, silicate, and soluble reactive phosphorus and the silica:nitrate ratio were not significantly different between 2019 (ice-covered) and 2020 (ice-free) winters (two-tailed unpaired
t-test,
P > 0.1) (Fig. S3; Table S4). Dissolved nitrate was on average significantly higher in the ice-covered relative to the ice-free winter (
P = 0.03) (Fig. S3). Chlorophyll
a (both >20 µm and >0.22 µm size classes) and diatom cell abundance were on average lower in the ice-free samples, but not significantly so (
P > 0.1) (Fig. S4; Table S5). An in-depth analysis on phytoplankton dynamics is outside of the scope of this study, but details regarding phytoplankton taxonomy and distribution in these samples can be found in the associated study by Zepernick et al. (
35).
Signatures of active viral communities within the winter surface water metatranscriptomes
Viral hallmark genes detected in the metatranscriptomes were interpreted to represent transcriptionally active or cell-associated viral populations. This was presumed because (i) DNA viruses must be replicating intracellularly to synthesize mRNA and (ii) samples were filtered (>0.2 µm) prior to nucleic acid extraction at a size cutoff that would enrich for cell-associated RNA viruses. Hereafter we refer to the detected virus populations as “active” with the consideration that this approach may detect cell-associated RNA virus genomes that are not replicating.
Putative phage (Gp23), NCLDV (MCP), and RNA virus (RdRp) hallmark genes were detected in the winter surface water samples (
Fig. 1A). Overall, active winter viral communities displayed high hallmark gene richness (i.e., the number of hallmark gene variants detected) and were phylogenetically diverse (
Fig. 1A through D). The majority of Gp23 detected in the winter metatranscriptomes
via read mapping (439 predicted gene sequences) were most closely related to uncultured myophage environmental sequences with uncertain hosts as opposed to the established cultured clades, although some Gp23 were related to isolated cyanophage (two sequences) and
Pelagibacter phage (six sequences) (
Fig. 1B; Table S6). The NCLDV MCP detected in the winter libraries (427 sequences) represented five of the six established NCLDV orders (
Fig. 1C) (71), most of which were assigned to the
Imitervirales (395 sequences) (Table S6). The relatively higher
Imitervirales MCP richness may reflect the broad host range of this group (92, 93). The phylogenetic diversity of RdRp present in the winter libraries (332 sequences) was similarly high, spanning the known phylum-level diversity of the
Orthornavirae (
Fig. 1D; Table S6). Furthermore, only a single lysogenic marker (integrase) was considered viral based on its top BLASTp hit to uncultured freshwater
Caudoviricetes (GenBank accession:
CAB5220699.1) and was only present in one library in the ice-free conditions (MT6, 02/14/2020).
The viral hallmark gene transcript pool was consistently dominated by Gp23 and RdRp transcripts across both winters, although MCP representation increased in the spring, specifically the May 1–22 samples (Fig. S5). Only eight single-stranded DNA virus replicase markers were detected with low representation in the water column libraries (Table S2; Fig. S5) and they were not analyzed further.
Viral community activity metrics differed between the ice-covered and ice-free conditions
Alpha and beta diversity metrics were calculated based on the detected viral hallmark genes to compare viral community activity between the two winters. Community activity was highly dissimilar between the ice-covered and ice-free conditions when comparing the relative abundance (TPM) of viral hallmark genes (73% dissimilar on average, SIMPER analysis). Furthermore, non-metric multidimensional scaling (nMDS) plots revealed that viral community composition clustered by season and that composition was significantly different between the winters based on both hallmark gene relative abundance (ANOSIM R = 0.79,
P = 0.003) and presence-absence matrices (ANOSIM R = 0.86,
P = 0.002) (
Fig. 2A). Mean viral community evenness (Pielou’s) and diversity (Shannon’s H) were not significantly different between seasons (
Fig. 2B; Table S7). While community evenness was high in all winter samples (Pielou’s J > 0.6), evenness was on average lower in the ice-free samples. Mean hallmark richness, however, was significantly lower (Tukey HSD
P < 0.001) in the ice-covered condition compared to the ice-free (
Fig. 2B). In other words, there were significantly fewer viral hallmark genes detected
via read mapping in the ice-covered winter samples. Viral hallmark richness was still significantly lower when differences in sequencing depth were accounted for (i.e., when hallmark gene counts were normalized to library size) (Tukey HSD
P < 0.001).
Viral community activity was significantly different between winters when examining virus hallmark types individually. The separate Gp23, MCP, and RdRp communities each clustered by season and were significantly different between the ice-covered and ice-free winters based on relative abundance (Fig. S6). Similarly, the mean hallmark gene richness of all three virus types was significantly lower in the ice-covered winter relative to the ice free (Fig. S7; Table S8). Overall, active viral communities displayed pronounced differences in diversity when compared between the ice-covered and ice-free winter samples and these differences were observed in a broad range of virus types.
Differences in viral community activity were correlated with microbial community activity metrics
Collectively, viral hallmark genes comprised a larger portion of the transcript pool (i.e.
, total TPM) in the ice-free samples compared to the ice-covered (
Fig. 3A). Higher relative viral transcript abundance in the ice-free conditions coincided with a lower abundance of the bloom-forming diatoms (
Bacillariophyta) in terms of both diatom relative transcript abundance (
Fig. 3B) and cell counts (Fig. S4). Conversely, prokaryotic transcripts had higher representation in the ice-free conditions (13%–36% versus 30%–67%, respectively) (
Fig. 3B). In fact, collective prokaryotic relative abundance was significantly negatively correlated with diatom relative abundance across the winter samples (Spearman’s ρ = −0.83,
P < 0.001) (Table S9). Moreover, both prokaryotic (
rpoB) (Tukey HSD
P < 0.001) and eukaryotic (
rpb1) (Tukey HSD
P = 0.02) hallmark gene richness were lower in the ice-covered winter (Fig. S8).
The shifts in microbial community metrics correlated with the observed differences in viral hallmark gene abundance and richness. Viral hallmark gene relative transcript abundance was significantly negatively correlated with diatom relative transcript abundance (ρ = −0.82,
P < 0.001) (
Fig. 3C) and positively so with prokaryotic abundance (Table S9). Furthermore, viral richness was positively correlated with microbial community richness (RpoB/Rpb1) (ρ = 0.93,
P = 2.2e-16) (
Fig. 3D). Altogether, lower diatom representation in the ice-free metatranscriptomes was associated with higher microbial community richness and prokaryotic representation, and, in turn with higher viral richness and representation.
Hallmark genes with the largest contributions to the dissimilarity between winter conditions
Although viral richness and representation (collective TPM) were higher in the ice-free conditions, only a few individual viral markers increased in relative abundance and contributed to the dissimilarity between the ice cover conditions. Out of the total 1,525 viral hallmark genes, only 11 had greater than 0.5% contribution to average dissimilarity and amounted to the cumulative 10% average dissimilarity between the ice cover conditions based on the SIMPER analysis (Table S10). While this cut-off is somewhat arbitrary, it captures the viral markers with the largest shifts in relative abundance between the winters, and the remaining markers have low individual contributions. Eight of the eleven markers of interest (four Gp23 and four RdRp) had higher average representation in the ice-free relative to the ice-covered winter where they “spiked” in relative transcript abundance in the ice-free conditions, while the remaining three (one Gp23 and two RdRp) had the opposite trend (
Fig. 4). Notably, the three markers that had higher average representation in the ice-covered winter had an overall low relative abundance (
Fig. 4), which is in line with the overall low viral TPM in the ice-covered winter samples noted previously.
All five of the top Gp23 contributors were placed within a clade containing an uncultured freshwater Gp23 sequence with an uncertain host range (GenBank accession CAB5226458.1). The RdRp of interest spanned three RNA virus phyla (
Fig. 4). Interestingly, three RdRp markers were related to diatom viruses within the
Bacillarnavirus (order
Picornavirales) (Fig. S9) and diatom colony-associated viruses within the
Narnaviridae (phylum
Lenarviricota) (Fig. S10) and
Totiviridae (phylum
Duplornaviricota) (Fig. S11). To further infer virus-host pairs, we assessed correlations in relative transcript abundance between the cellular marker genes (
rpoB/
rpb1) and the virus hallmark genes, but no significant interactions (
α < 0.01) were found between the diatom Rpb1 and putative diatom virus RdRp markers.
Major capsid protein transcript abundance was consistently low throughout both winters, and subsequently, no single MCP marker had a large contribution to the dissimilarity between winters. However, MCP had higher representation in the spring samples (May 1–22) relative to both winters largely due to the relative increase in two MCPs. Their dominance was also likely responsible for the lower evenness within the MCP communities in the spring samples (Fig. S7). Although host range is uncertain, these MCPs resembled isolated Chloroviruses (order
Algavirales) that infect green algae and
Mimiviridae (order
Imitervirales) isolates that infect amoeba (
Fig. 1C). It was noteworthy that the Chlorovirus-like MCP had a significant positive correlation with a Rpb1 annotated as Chlorophyta (
Trebouxiophyceae) (
α < 0.01, cor >0.8).
DISCUSSION
Viral activity during the winter months in the Great Lakes is largely undocumented. Here, viral hallmark genes were detected in metatranscriptomes from two contrasting winters in Lake Erie, indicating that there are viral communities “active” in the surface waters. Our findings expanded the known diversity of viruses that have been detected during the winter in Lake Erie. In one of the few studies to quantify viruses in winter samples collected from the Great Lakes, Matteson et al. enumerated putative cyanophage in mid-winter samples and found them to be a substantial component of the standing stock of viruses (
24). However, the qPCR-based detection approach that was used only characterized a narrow group of viruses (cyanomyophage of
Synechococcus spp.) and could not reveal if those viruses were transcriptionally active. While our metatranscriptomic approach also detected cyanomyophage-like hallmark genes, they represented a small component of the total myophage phylogenetic diversity. Instead, most myophage markers detected here belonged to clades of putative aquatic viruses with unknown hosts. It was previously also hypothesized that phage may “overwinter” as prophage in the Lake Erie water column (
24). While we do not dismiss lysogeny as a component of Lake Erie winter microbial communities, we detected little evidence of lysogenic lifestyles based on the transcripts assembled here. Metagenomic sequencing may enable the assembly of phage genomes that could help clarify the prevalence of temperate phage in Lake Erie.
Diverse populations of putative eukaryotic viruses within the NCLDV and RNA virus groups were also discovered in the winter samples. Previously, only 11 dsRNA viruses within the
Partitiviridae (
Pisuviricota),
Totiviridae (
Duplornaviricota), and
Birnaviridae (
incertae sedis) families were identified in a single Lake Erie metatranscriptome by Edgar et al. (
25), whereas here we detected hundreds of putative RNA viruses belonging to all five currently established phyla. While perhaps a limitation of sequencing depth and sample size, Edgar et al. detected no NCLDV transcripts using a similar hallmark gene approach (
25). Our survey, however, revealed putative NCLDV spanning most of the established diversity (five of the six NCLDV orders). Regarding viral discovery, one consideration when using a hallmark gene approach is that viral sequence detection is limited by the diversity of the database (i.e., highly divergent viruses may not be detected by sequence homology searches). In addition, metatranscriptomes represent “snapshots” of the environment at the time and conditions sampled; therefore, the true diversity of active viruses in the surface waters was likely not captured.
Based on the putative viral sequences we identified, active virus communities were compositionally distinct between winters and viral richness was significantly lower in the ice-covered winter. Our findings showed that the observed differences in active viral communities strongly correlated with the shifts in the microbial community, suggesting a direct relationship between active viral and microbial community composition. Similar to previous observations (
23), we saw evidence of a decrease in the magnitude of the winter diatom bloom in the ice-free conditions compared to the ice-covered state determined by both the representation in the metatranscriptomes (diatom relative transcript abundance) and cell count data. Other studies in aquatic environments have attributed lower viral abundance or richness in the winter to lower potential host pool abundance (
51,
52), and, in principle, viral diversity and abundance are shaped by the surrounding susceptible host diversity and density (
53). The drivers behind the observed differences in viral activity between winters are uncertain, but perhaps the decrease in diatom representation and increase in microbial richness contributed to the greater viral richness in the ice-free winter metatranscriptomes.
The higher levels of viral activity (viral TPM) in the ice-free winter relative to the ice-covered winter were primarily due to increased representation of a few hallmark gene variants. In other words, most viral hallmark genes did not display higher relative transcript abundance in ice-free conditions and were instead consistently low. Interestingly, the subset of viral hallmark genes that did display higher relative transcript abundance contained hallmark genes similar to those of diatom-associated RNA viruses. This included one RdRp related to lytic
Bacillarnavirus (order
Marnaviridae) isolates that infect marine diatoms (
54), as well as RNA viruses associated with an environmental diatom colony (
55). Culture-dependent studies have shown titers of some diatom viruses increased with the proliferation of diatom blooms and have been speculated to control the bloom crash (
56), but also coexist during the bloom course (
57). None of the putative diatom RNA viruses interacted with diatom (or any) Rpb1 markers based on our abundance-based network analysis; thus, the hosts of these putative RNA viruses remain uncertain.
We note that NCLDV did not have large contributions to the differences in active viral community composition between ice-cover states and that overall NCLDV displayed relatively low transcript abundance during both winters. Low counts (gene copy number) of various algal viruses (
Algavirales) have been detected during the wintertime in previous studies surveying the Great Lakes (
58,
59). Our findings support the idea that NCLDV activity is generally low during winter months in the Great Lakes, but continued winter surveys are needed to resolve this.
In all, we show that diverse viral populations were an active component of the winter surface waters in Lake Erie, emphasizing the need for any future winter surveys to account for viruses when investigating microbial dynamics during the winter. While we documented distinctions in viral activity between the ice-covered and ice-free winter samples, we cannot definitively attribute those differences to ice cover extent. However, we posit that the stark difference in ice cover extent (0% versus 90%–100% cover) is likely a contributing factor to the variation in microbial and viral activity since ice cover is considered a “master variable” due to its influence on biological, biogeochemical, and physical processes in freshwater lakes (
17). Aspects such as the reduced active virus richness observed in the ice-covered samples could be characteristic of ice-covered waters in Lake Erie, but ultimately further winter sampling efforts spanning temporal and climatic gradients are needed to resolve trends in viral activity associated with ice cover extent.
ACKNOWLEDGMENTS
We acknowledge the Kenneth & Blaire Mossman Endowment to UTK. The work conducted by the U.S. Department of Energy Joint Genome Institute, a Department of Energy Office of Science User Facility, is supported by the Office of Science of the U.S. Department of Energy under contract DE-AC02-05CH11231. We are thankful to the members of the U.S. Coast Guard Cutter Neah Bay and the MV Orange Apex along with Dan Peck and Thijs Frenken for assistance with sample collection and processing. We thank Derek Niles of Orange Force Marine Ltd. for coordinating sample collection during spring 2020 in the face of COVID disruptions that prevented conventional surveillance of the Great Lakes by federal agencies. Finally, we thank George Bullerjahn, Christa Pennachhio (DOE JGI), and Justin Chaffin for their collaboration.
This work was supported by the National Institutes of Health, NIEHS grant 1P01ES02328939-01, National Science Foundation grant OCE-1840715 (R.M.L.M. and S.W.W.), NSERC grant RGPIN-2019–03943 (R.M.L.M.), funding from the Simons Foundation (735077), funding from the US Department of Energy, Office of Science, Office of Biological and Environmental Research, Genomic Science Program Grant to JPG, under Award Number DE-SC0020362, and JGI project 503851 (S.W.W. and R.M.L.M.).
The project was designed by S.W.W. and R.M.L.M. RNA extractions were performed by B.N.Z. B.N.Z. provided the metatranscriptome assembly and gene predictions. Data analysis, statistics, and interpretation of results were carried out by E.R.D. Manuscript drafted by E.R.D. All authors reviewed and edited the final version of the manuscript.