Viromes from melted sea ice, sea-ice brine, and cryopeg brine.
In May 2017, we collected five samples near Utqiaġvik, Alaska: one cryopeg brine sample from below the floor of the Barrow Permafrost Tunnel (
9); three samples from sections of a sea-ice core (upper sea ice, middle sea ice, and lower sea ice) in landfast first-year sea ice; and one sample of sea-ice brine drained into a sackhole in the sea ice (
Fig. 1). Counts of virus-like particles (VLP) indicated that the cryopeg brine had 1.2 × 10
8 VLP ml
−1, while VLP counts from sea-ice brine and bulk (melted) sea ice were much lower (1.2 × 10
5 to 2.1 × 10
5 VLP ml
−1 [
39]); thus, we could not address whether viruses were concentrated in the liquid phase (brine) of sea ice (see Table S1 in the supplemental material in reference
40). Similarly, the concentrations of viral DNA extracted from the sea ice-derived samples were also below the detection limit (<0.1 ng/μl) of Qubit 3.0 (Thermo Fisher Scientific). Therefore, all samples were subjected to low-input, quantitative viral metagenomic sequencing, which can recover viruses from samples with DNA concentrations as low as 100 fg/μl (
35–37). The recovered DNA was used for short-read metagenomic sequencing library construction (see Materials and Methods) to yield five viral metagenomes (viromes) with a total of 355,904,145 reads (Table S2 [
40]). The quality-controlled reads of each sample were assembled individually, resulting in 7,404 and 2,577 contigs of ≥5 and ≥10 kb, respectively, across the full data set (Table S2 [
40]).
VirSorter (
41) predicted a total of 1,692 viral contigs of ≥5 kb in size, almost all of which (1,686 of 1,692; categories 1, 2, and 3) were not identified as a prophage and so were interpreted to derive from extrachromosomal prophages, lytic infections, or partially assembled prophages (Table S2 [
40]) (
41). Of these 1,686 contigs, most (1,393) were high-confidence viruses belonging to category 1 or 2 that we then dereplicated into 1,305 viral populations using currently accepted cutoffs that approximate species-level taxonomy (
42–45). On average, about 20.0% (range of 1.3 to 70.1% for five viromes) of the quality-controlled reads were recruited to these viral populations (Table S2 [
40]), which were about 2-fold higher than the typical read recruitments in the Global Ocean Viromes data set (
23,
42). From each of the 1,305 populations the longest contig was selected for all further investigations, except for the taxonomic and host predictions, which were restricted to the 476 populations with sequences of ≥10 kb in length. Rarefaction curves were constructed (see Materials and Methods) and showed that the sequencing depth of the cryopeg sample was sufficient to capture the diversity of long viral populations present, and that the sea-ice viromes were slightly undersampled (Fig. S1 [
40]). Thus, additional, unsampled viral populations likely exist in the sea-ice environment.
Viral communities comprise mostly novel genera and differ across habitats.
We next considered how viruses in these underexplored extreme environments compared to known viruses. Because viruses lack any single, universally shared gene, we established taxonomy using gene-sharing analysis from viral sequences ≥10 kb in length using vConTACT v2 (
46,
47). In our data set, that meant comparing shared gene sets from 476 viral populations (166 from cryopeg plus 310 from sea-ice populations) to genomes from 2,304 known viruses in the NCBI RefSeq database (version 85) (
Fig. 2). Such analyses produce viral clusters (VCs), which represent approximately genus-level taxonomic assignments (
46,
47). The 2,304 RefSeq viral genomes formed 370 VCs (Table S3 [
40]). Of the 476 viral populations in our data, 255 (83 cryopeg plus 172 sea ice) were grouped into 97 genera/VCs (VC with ≥2 viruses), while only 56 (33 cryopeg plus 23 sea ice) of these 255 populations formed VCs (32 genera/VCs) with RefSeq viral genomes and could be assigned taxonomy. The other 221 (of 476) populations (83 cryopeg plus 138 sea ice) remained isolated as singletons (i.e., as single-member VC) due to having no or few shared genes with other viruses (
Fig. 2). Together, only 12% (56 of 476) of viral populations could be assigned taxonomy. These results indicate that myriad unique viruses inhabit cryopeg brine and sea-ice environments. Comparing viral clusters in the cryopeg to those in the sea-ice samples showed that only five VCs (of 97 total VCs) were shared between the two sample types (Fig. S2 [
40]), which implies largely distinct viral communities between cryopeg and sea-ice habitats. The shared VCs suggest similarities in viral reproduction, as genus-level viral communities typically reflect the broad characteristics of viral replication and packaging (
46), as opposed to ecological distinctions (see below). Four of these five shared VCs contained viral populations exclusive to cryopeg brine and sea-ice environments (i.e., not available in the RefSeq database; Fig. S2 [
40]), while the other one included sequences most similar to a temperate
Myxococcus phage Mx8 (VC_248_0; Table S3 [
40]) in the RefSeq database (
48). We then determined the number of viral genera (i.e., VCs) in each sample by mapping QC reads to viral populations in each VC (see Materials and Methods). Cryopeg brine harbored 167 populations (representing 47 VCs), whereas the sea-ice samples contained 54 (30 VCs), 51 (24 VCs), 169 (48 VCs), and 255 (44 VCs) populations for the upper, middle, and lower sea ice and sea-ice brine, respectively (Fig. S3 [
40]).
We then looked more closely at the population (approximately species) level to compare viral communities in the cryopeg and sea-ice environments. We first calculated the abundance of each of the 1,305 viral populations (≥5 kb in length) by read mapping (see Materials and Methods). The 100 most abundant populations (summed across all five samples) were used to compare viral community compositions in the five viromes by a heatmap (
Fig. 3A). The results showed that the viral community in cryopeg brine comprised different populations from those in the sea-ice brine and bulk sea-ice samples, with the exception of one shared population (
Fig. 3A and Table S4 [
40]), and that the four sea-ice samples shared most of their populations. Principal coordinate analysis (PCoA) of the viral population abundances also showed that the viral community in cryopeg brine was distinct from the sea-ice communities (
Fig. 3B). From the sea-ice environment, the upper sea-ice, middle sea-ice, and sea-ice brine samples clustered together (
Fig. 3B), whereas lower sea ice grouped separately from the sea-ice sample cluster.
These results indicate that the cryopeg brine supports a unique viral community (consistent with a previous study [
15]), which may be due to this cryopeg system having been separated from the atmosphere and meteoric water for at least 14,000 years and possibly much longer (
9,
17,
18). Viral communities inhabiting the middle and top layers of the ice core were similar to each other and to that of sea-ice brine, as expected given that this brine was drained from these two sea-ice layers. The separation can be attributed to the ice-algal bloom and associated bacteria, as well as some physiochemical factors in the bottom portion of the ice (i.e., high concentrations of chlorophyll, phaeopigment, bacteria, EPS, POC, and PON; Table S1 [
40]). These results indicate that viral communities vary across vertical positions of the sea-ice core, which is consistent with previous reports on varied bacterial and microalgal communities at different ice depths (
49,
50).
Viruses infect dominant microbes in subzero and salty habitats.
We next examined microbial communities in these samples to identify potential hosts. Microbial community structures, generated by 16S rRNA gene amplicon sequencing (see Materials and Methods), showed all samples to be dominated by
Bacteroidia of the phylum
Bacteroidetes and
Gammaproteobacteria of the phylum
Proteobacteria (Fig. S4 and Table S5 [
40]). Based on their most abundant genera, the microbial community in the cryopeg brine differed from those in the sea-ice environments (Fig. S4 [
40]). While the cryopeg community was dominated by
Marinobacter (relative abundance of 53.0%),
Gillisia (21.1%), and
Demequina (7.7%), these groups were of minor relative abundance (0.1 to 1.1%) in the sea-ice samples (Fig. S4 [
40]). Instead, sea-ice communities were dominated by
Paraglaciecola (3.4 to 21.5%),
Reinekea (1.2 to 12.8%),
Colwellia (4.9 to 8.2%),
Polaribacter (8.4 to 36.6%), and
Glaciecola (0.5 to 2.5%), which were all rare (≤0.1%) in the cryopeg brine (Fig. S4 [
40]). Many of these abundant genera contain species that are halotolerant or psychrotolerant halophilic and/or psychrophilic, such as
Marinobacter halophilus (
51),
Colwellia psychrotropica (
52), and
Glaciecola punicea (
53). In addition, some members of
Marinobacter can utilize aromatic and aliphatic hydrocarbons as sole carbon and energy sources (e.g., see references
54–56), and
Glaciecola can produce cold-active and salt-tolerant enzymes to hydrolyze xylan and cellulose (e.g., see references
57–59). These results indicate that the bacterial communities in cryopeg brine and sea-ice ecosystems are dominated by taxa that are likely equipped with genomic adaptations to subzero and salt conditions and that play important carbon-cycling roles in these extreme environments.
We then explored the potential impacts of viruses on these abundant microbes by linking viruses
in silico to their hosts. Hosts were predicted using two methods, one based on the viral and host nucleotide sequence similarity (
60) and the other based on genome composition (
61). Because the accuracy of host prediction increases with viral contig length (
61), only the 476 viral populations of ≥10 kb in size were used for host prediction. The sequence similarity method (BLASTN) predicted hosts for only 13 populations (Table S6 [
40]), whereas the sequence composition method (using the software VirHostMatcher [
61]) linked many more populations (
n = 141) to microbial hosts (Table S7 [
40]). Hosts at higher taxonomic levels (i.e., phylum and class levels) were most reliably linked to populations, similar to previous reports (
60,
61). Both methods consistently predicted phylum- and class-level hosts for the same ten populations and genus-level hosts for five of these ten populations (Table S6 and S7 [
40]). Although only 29.6% of viral populations could be linked to a host, these host predictions indicated that viruses in cryopeg brine and sea-ice samples infect microbes inhabiting these extreme environments.
The predicted host phyla (identified by the sequence composition method) that were most abundant included
Proteobacteria,
Bacteroidetes, and
Actinobacteria (Fig. S5 and Table S7 [
40]). Interestingly,
Actinobacteria-linked viruses were abundant (12.7%; Fig. S5 [
40]) in cryopeg brine but were not detected in sea-ice environments (except at low abundance [0.3%] in sea-ice brine), indicating that
Actinobacteria-associated viral predation happens more often in cryopeg brine than in sea-ice environments. This conclusion is consistent with the higher relative abundance of
Actinobacteria in cryopeg brine (12.3%) than in sea-ice environments (0.4 to 2.5%; Fig. S5 [
40]). The predicted host genera (identified by the sequence similarity method) that were most abundant in these samples included
Marinobacter,
Glaciecola, and
Colwellia (Table S6 [
40]), many members of which are tolerant of low temperature and high salinity, as discussed above. The relative abundance of
Marinobacter-associated viral populations was high (12.9%) in cryopeg brine (Table S6 [
40]), including one (i.e., CBIW_NODE_98) with the third highest coverage (coverage of 684) among all populations in this sample (Table S4 [
40]). This finding is consistent with the dominance (53.0%) of the bacterial genus
Marinobacter in this cryopeg brine sample (Fig. S4 [
40]) and in other samples of cryopeg brine from the same location (
9). The
Glaciecola-associated virus SI3M_NODE_10 was the second most abundant (relative abundance, 14.8 to 18.0%) viral population in both middle and upper sea ice viromes (Tables S4 and S6 [
40]), while members of
Glaciecola only accounted for 1.4 to 1.9% of microbial profiles in these samples (Fig. S4 [
40]).
Colwellia-linked virus SI3M_NODE_1 was the sixth most abundant viral population and had a relative abundance of 1.3% in middle sea ice, where the microbial community was dominated by the genus
Colwellia, with 8.3% relative abundance (Fig. S4 [
40]). Given the high relative abundances of
Marinobacter,
Glaciecola, and
Colwellia, as well as their associated viruses, we predict that these viruses influence the responses of their hosts to cold and salty conditions and contribute to ecosystem function in these extreme environments.
Overall, the microbial profiles and host predictions suggested that viruses have infected the abundant microbial groups and, thus, might have an important role in cryopeg and sea-ice ecosystems by influencing their hosts.
Virus-encoded cold and salt tolerance genes potentially impact their hosts.
We next sought to identify virus-encoded auxiliary metabolic genes (AMGs) that might directly modify host cold or salt adaption during infection. Briefly, 22,820 predicted genes from the 1,305 viral populations (length, ≥5 kb) were queried against functional databases, which resulted in about half (11,002) matching (bit score, ≥60) annotated sequences in KEGG, UniRef, or InterProScan databases (Table S8 [
40]). Within these, 76 were identified as possible AMGs (see Materials and Methods), which derived from 293 viral genes in 138 viral populations (Table S9 [
40]) and included 61 AMGs that were reported previously (
23,
62). Among those not seen before, we explored whether they may offer new insights into how viruses manipulate microbial metabolisms. For example, 3 new AMGs included
Cellulase,
CHB_HEX_C_1, and
GDP_Man_Dehyd that we hypothesized mediate sugar metabolisms in their hosts during infection (Table S9 [
40]). However, we focused on fatty acid desaturase (
FAD) genes due to their potential function in cellular tolerance of extreme conditions. Specifically, in diverse microbes,
FAD genes are known to desaturate cell membrane lipids by introducing double bonds into the hydrocarbon chains of fatty acids (
63), so as to increase membrane fluidity (
64) to help cells cope with stresses, including low temperature (
65,
66) and high salinity (
67,
68).
From four viral populations, we identified 12
FAD genes (Table S9 [
40]) that we had high confidence were encoded by viruses due to flanking viral genes (Fig. S6 [
40]). We then determined the evolutionary history of 11 virus-encoded
FAD genes (
vFAD; one was removed due to potential recombination; see Materials and Methods) by phylogenetic analysis with 217 and 263 microbial
FAD (
mFAD) genes from the concurrently sampled microbial metagenomes and NCBI’s nr database, respectively. This tree resolved
FAD genes that belonged to the same bacterial or eukaryotic phylum into distinct clades (Fig. S7 [
40]), which implied that these
FAD genes had not been transferred across different bacterial phyla or between bacteria and eukaryotes. To look at this more closely, we used a subset of the
FAD genes (see Materials and Methods) and found patterns similar to those observed in the above “all-sequence” tree, specifically, four clades of
vFAD, which we designated A to D (
Fig. 4). These results suggest that host-to-phage
FAD transfers occurred at least four times, which is consistent with multiple transfers observed in cyanobacteria and their cyanophages for the AMGs
PstS (
69),
PsbA and
PsbD (
70), and
PTOX (
71).
Interestingly, one sea-ice-originated
vFAD gene (SB_NODE_339_gene_12) clustered with 17
mFAD genes from the
Proteobacteria phylum in clade D (
Fig. 4), which we interpret to represent a case of phage-mediated
FAD gene transfers among
Proteobacteria members. In addition, this
vFAD sequence shared a high amino acid identity (93.6%) to a gammaproteobacterial
FAD sequence (SB.02_scaffold_119364_c1_1), which was detected in a microbial metagenome from the sea-ice brine, indicating a likely
FAD-gene transfer between this gammaproteobacterium-virus pair. Another sea-ice-originated
vFAD gene (SB_NODE_17_gene-11) clustered with
mFAD genes from
Bacteroidetes (
n = 5) and unclassified bacteria (
n = 4) in clade B (
Fig. 4). These results were further supported by a matching host prediction of the viral population (SB_NODE_17) to
Bacteroidetes members (Table S7 [
40]). Taken together, this evidence supports that this
FAD was transferred from
Bacteroidetes to its phage, and that this transfer event was probably host range limited.
The
vFAD from cryopeg brine formed two distinct clades, A and C, that did not contain any
mFAD (
Fig. 4). Therefore, the origin of the cryopeg
vFAD is less clear. It is possible that these
FAD genes have been acquired from low-abundance hosts whose
FAD genes were not detected in our data set due to the limitation of sequencing depth, or that they have diverged significantly from any potential “source” microbial taxon in the ancient cryopeg. Interestingly, all identified cryopeg
vFAD could be linked to only two viruses (five copies each; Fig. S6 [
40]). It is likely that multiple copies of
FAD in each virus were acquired either from different hosts or from the same host over a long period of time. These results suggest that gene transfers occurred both from host to virus and virus to host multiple times. Our findings are consistent with previous reports of virus-mediated
FAD gene transfers among microbial hosts (
72) as well as numerous photosynthesis genes being transferred within and between cyanobacteria (
69,
70,
73).
We next analyzed the evolutionary dynamics of
vFAD and
mFAD homologs across lineages (see Materials and Methods). Among the major drivers of microbial evolution is selection pressure. We detected significant differences in the ratio of nonsynonymous to synonymous evolutionary changes (
dN/
dS) (
ω) between
vFAD and
mFAD clades (Table S10 [
40]), indicating that
vFAD and
mFAD were under different selection pressures. Purifying selection removes potentially deleterious mutations from the genetic background of random neutral mutations. The
vFAD genes appear to be under purifying selection, implying that they remain functional. The power of this inference depends upon the availability of closely related, known functional cellular homologs: if they are variably available across the data set, such homologs can lead to artifacts (
74). However, our evaluation of the data for a large
dS confirmed that variability was not responsible for the observed differences in selection pressure (Table S11 [
40]). Thus, we envisage that the microbes have elevated dosages of
FAD gene expression that provides advantages to the microbial host, leading to a fitter lineage in nature. This interpretation is in line with previous reports that demonstrated the role of selection pressure in shaping the survival of microbes (
75–77).
Indeed, several lines of
in silico analyses suggested these 12
vFAD genes are functional (
Fig. 4). First, the deduced amino acid sequences of the 12
vFAD revealed that they encoded the well-conserved three histidine cluster motifs (Fig. S8 [
40]). Second, 11 of the 12
vFAD encoded the three conserved histidine motifs (Fig. S8A [
40]), with the 12th
vFAD (SB_NODE_339_gene-12) encoding other histidine cluster motifs (Fig. S8B [
40]). Third, analysis of predicted protein sequences from the 12
vFAD using Phyre2 (
78) showed that all had 100% confidence scores (except one with 99.8%) that linked them to a stearoyl-coenzyme A (CoA) desaturase, 4ZYO_A, which introduces a double bond at the Δ9 position (the bond between carbons 9 and 10) of saturated fatty stearoyl- or palmitoyl-CoA (
79). Based on motif sharing with cyanobacterial FAD proteins and 4ZYO_A, we deduced the
vFAD from SB_NODE_339_gene-12 might introduce the double bond at carbon Δ6, while the other 11 vFAD might work with carbon Δ9 for desaturation (
80–82).
Together, these results suggest that these viruses from cryopeg brine and sea-ice environments obtained
FAD from their host through HGT and encode AMGs capable of altering host membrane fluidity. Although future experiments are necessary to validate the activity and function of these virus-encoded proteins, and information (e.g., lytic versus lysogenic lifestyles) for these vFAD-carrying phages is limited, the evidence that currently exists suggests they are functional, leading us to hypothesize a conceptual model for their impacts (
Fig. 5). Specifically, we propose that
vFAD-carrying phages help their hosts to desaturate membrane fatty acids and increase membrane fluidity, and that these changes could increase host tolerance to cold and salt conditions over the course of an infection cycle. For the virus, we posit that this ability alters host membrane structure to promote viral infection and/or virion assembly and release via host lysis or membrane budding, as seen in other systems (
83).