DNA extraction, qPCR, and 16S rRNA gene sequencing.
DNA extractions, qPCR, and 16S rRNA gene sequencing were described previously by Vuillemin et al. (
17). In brief, subcores were sampled aseptically with sterile syringes in a UV-sterilized DNA/RNA clean HEPA-filtered laminar flow hood. DNA was extracted from 10 g of sediment transferred into 50 ml of lysing matrix E tubes (MP Biomedicals) containing silica glass beads and homogenized for 40 s at 6 m/s using a FastPrep-24 5G homogenizer (MP Biomedicals) in the presence of 15 ml of preheated (65°C) sterile-filtered extraction buffer (76 vol% 1 M NaPO
4 [pH 8], 15 vol% 200-proof ethanol, 8 vol% MoBio lysis buffer solution C1, and 1 vol% SDS). The samples were incubated at 99°C for 2 min, frozen overnight at −20°C, thawed, and frozen again at −20°C overnight, followed by an additional incubation at 99°C for 2 min and a second homogenization step using the settings described above. After the second homogenization, the samples were centrifuged for 15 min, and the supernatants were concentrated to a volume of 100 μl using 50-kDa Amicon centrifugal filters (Millipore). Coextracted PCR-inhibiting humic acids and other compounds were removed from the concentrated extract using the PowerClean Pro DNA cleanup kit (MoBio). Extraction blanks were included alongside the samples to assess laboratory contamination during the extraction process.
DNA was quantified fluorometrically using a Qubit system with a double-stranded DNA (dsDNA) high-sensitivity kit (Life Technologies). DNA templates were used for qPCR amplifications with updated 16S rRNA gene primer pair 515F (5′-GTG YCA GCM GCC GCG GTA A-3′) and 806R (5′-GGA CTA CNV GGG TWT CTA AT-3′) as previously described (
35). All qPCRs were set up in 20-μl volumes with 4 μl of the DNA template, 20 μl of SsoAdvanced SYBR green supermix (Bio-Rad), 4.8 μl of nuclease-free H
2O (Roche), 0.4 μl of primers (10 μM; biomers.net), and 0.4 μl of MgCl
2 and carried out on a CFX-Connect qPCR machine (Bio-Rad) for gene quantification. For 16S rRNA genes, we ran 40 PCR cycles of two steps corresponding to denaturation at 95°C for 15 s and annealing and extension at 55°C for 30 s. All qPCRs were set up in 20-μl volumes with 4 μl of the DNA template. Gel-purified amplicons were quantified in triplicate using QuantiT dsDNA reagent (Life Technologies) and used as a standard. An EpMotion 5070 automated liquid handler (Eppendorf) was used to set up all qPCRs and prepare the standard curve dilution series spanning from 10
7 to 10
1 gene copies. Reaction efficiency values in all qPCR assays were between 90% and 110% with
R2 values of >0.95% for the standards. Barcoded PCR for Illumina sequencing was performed using the custom primer dual-indexed approach that targets the V4 hypervariable region of the 16S rRNA gene using dual-indexed 16S rRNA gene primer pair 515F (5′-
GTGYCAGCMGCCGCGGTAA-3′)/806R (
GGACTACNVGGGTWTCTAAT) (
35). Barcoded V4 hypervariable regions of amplified 16S rRNA genes were sequenced on an Illumina MiniSeq platform according to a previously established protocol (
35). Bioinformatic processing of these previously published sequence data was described previously by Vuillemin et al. (
17) in detail.
Long-term incubation setup.
Prior to setting up the incubations, the subcores were sampled with sterile syringes using the sample aseptic technique used for DNA extraction. For each sample depth, 7 g of abyssal clay was placed into sterile 20-ml glass flasks and incubated with 4 ml of sterile artificial seawater composed of either H
218O (97% atomic enrichment) or unlabeled artificial seawater. Vials were crimp sealed, with an oxygenated headspace of approximately 10 ml, and incubated at 8°C. The artificial seawater was different from the porewater at depth because there was no added nitrate, but there was also no added ammonia, which should be similar to
in situ conditions where ammonia is generally below the limit of detection (
17). Oxygen was measured continuously throughout the incubations using noninvasive fiber-optic measurements as described previously (
36). Small fluctuations in the oxygen measurements in the killed-control and experimental incubations were likely due to temperature fluctuations of the incubator itself (±1°C) since the noninvasive fiber-optic oxygen sensor spots are temperature sensitive (
36). Oxygen consumption was detectable over 18 months in slurries consisting of sediment and sterile artificial seawater (see
Fig. S1 in the supplemental material), suggesting the presence of actively respiring microbes. The rate of oxygen consumption was calculated by subtracting the starting concentration of O
2 from the final concentration of O
2 and dividing this by the number of days for the incubation.
We used quantitative stable isotope probing (qSIP) to measure the atoms percent
18O enrichment of actively growing microbial taxa as described previously (
17,
37). In brief, after 7- and 18-month incubations, DNA was extracted and subjected to cesium chloride (CsCl) density gradient centrifugation. The same 16S rRNA gene 515F/806R primers (described above) were used for qPCR (described above) to determine density shifts in the peak DNA of buoyant density (BD) for each incubation. 16S rRNA gene amplicons from each fraction resulting from density gradient fractionation were Illumina sequenced as described previously (
17). To identify contaminants that may have entered during the fractionation process, we also included in the sequencing run extraction blanks from the SIP fractionation. OTUs containing sequences from extraction blanks were removed. Excess atoms percent
18O enrichment were calculated for each OTU (including OTU_6, corresponding to the subseafloor
Thalassospira strains) according to the equations for quantifying per-OTU atomic enrichment.
The number of doublings for the Thalassospira OTU (OTU_6) detected at the 18-month time point was calculated using the qPCR-normalized relative abundance of the 16S rRNA genes at 0 and 18 months. The number of generations was calculated as X = 2nX0, Where X and X0 are the final and initial cell densities, respectively, and n is the number of generations.
Enrichments, cultivation, and subcultivation.
After 18 months of incubation in sterile
18O-labeled artificial seawater, 25 μl of the slurry was plated onto solid medium (10 mg/ml yeast extract and 8 mg/ml agar in artificial seawater [30 mM MgCl
2·6H
2O, 16 mM MgSO
4·7H
2O, 2 mM NaCO
3, 10 mM KCl, 9 mM CaCl
2, 450 mM NaCl]), and after 2 days of incubation in the dark at room temperature, abundant colonies were observed growing on the surface of the petri dishes (
Fig. S1). No colonies were observed to grow on control petri dishes that received 25 μl of the
18O-labeled artificial seawater slurry incubated for 18 months using starting material from the autoclaved sediment (killed controls). This indicated that the colony-forming bacteria were from the sediment and not contaminants introduced during the experimental setup of the incubations. We attempted to culture chemoheterotrophic microbes directly from the collected sediment samples under the same conditions, but no CFU were observed on the petri dishes, even after several months of incubation. Thus, long-term incubation of the sediment at 8°C simply in the presence of added water apparently stimulated the activity of many subseafloor bacteria to a point at which they were able to grow on the surface of a petri dish.
A total of 21 colonies were picked from petri dishes containing colonies from the 3-mbsf and 6-mbsf slurries (10 colonies picked from the 3-mbsf slurry and 11 colonies picked from the 6-mbsf slurry). These colonies were streaked individually onto separate new petri dishes, and a single colony was picked from each of the separate petri dishes (representing the original CFU from the enrichment) and grown in sterile liquid medium (10 mg/ml yeast extract and 8 mg/ml agar in artificial seawater [30 mM MgCl
2·6H
2O, 16 mM MgSO
4·7H
2O, 2 mM NaCO
3, 10 mM KCl, 9 mM CaCl
2, 450 mM NaCl]). Single colonies were then grown in liquid medium. A portion of each of these colonies was used for DNA extraction and genome sequencing, and the remaining volume was frozen as glycerol stocks. Growth rates were determined in experiments with 20-ml crimp-sealed glass flasks containing 0.1 ml of the glycerol stock inoculated into 10 ml liquid medium (10 mg/ml yeast extract in artificial seawater [30 mM MgCl
2·6H
2O, 16 mM MgSO
4·7H
2O, 2 mM NaCO
3, 10 mM KCl, 9 mM CaCl
2, 450 mM NaCl]) with a 10-ml headspace and gentle shaking. The optical density (OD) at 600 nm was measured once every 30 min with a spectrophotometer, and growth rates were calculated from the exponential phase using the formula μ = [ln(
X) − ln(
X0)]/(
t −
t0), where
X and
X0 are the final and initial optical densities (at 600 nm), respectively, and
t and
t0 are the times at which those optical densities were made (final and initial, respectively). Under these conditions, the growth rates of the subseafloor
Thalassospira strains were similar across all three clades and ranged from 0.064 to 0.31 h
−1 (
Fig. S5). The growth experiments for all subseafloor and type strains were performed alongside one another in the same bacteriological growth medium formulation, at the same temperature, in identically constructed and prepared flasks, and at the same shaking speed.
Genome sequencing, de novo assembly, and annotation.
DNA was extracted from the isolates grown in liquid culture until the end of exponential phase as described above. After reaching stationary phase, cultures were pelleted via centrifugation, and the supernatant was decanted. The cell pellets were resuspended in a preheated (65°C) sterile-filtered extraction buffer (76 volume % 1 M NaPO4 [pH 8], 15 volume % 200-proof ethanol, 8 volume % MoBio lysis buffer solution C1, and 1 volume % SDS), added to lysing matrix E tubes (MP Biomedicals) containing silica glass beads, and homogenized for 40 s at 6 m/s using a FastPrep-24 5G homogenizer (MP Biomedicals). The samples were centrifuged for 15 min, and the dissolved high-molecular-weight DNA in the supernatant was concentrated to a volume of 100 μl using 50-kDa Amicon centrifugal filters (Millipore). The concentrated extract was cleaned of proteins and other nongenomic DNA organic matter using the PowerClean Pro DNA cleanup kit (MoBio). Extraction blanks were added alongside the samples to assess laboratory contamination during the extraction process. Genomic libraries were prepared using the Nextera XT DNA library prep kit (Illumina). Quality control and quantification of the libraries were performed on an Agilent 2100 Bioanalyzer system using high-sensitivity DNA reagents and DNA chips (Agilent Genomics). Genomic libraries were sequenced to a depth of ca. 100× coverage using a high-output paired-end 2-by-150 sequencing reagent kit (Illumina).
In addition to Illumina sequencing, the high-molecular-weight genomic DNA was sequenced using the Nanopore MinION system. Sequencing libraries for the MinION system were prepared using the ligation sequencing kit (Oxford Nanopore Technologies) according to the manufacturer’s instructions. Barcoded libraries were sequenced on the MinION system using a Flongle R9 flow cell and base called and demultiplexed using the MinIT system with ont-minit-release v19.12.5 and ont-guppy-for-minit v3.2.10 for base calling (Oxford Nanopore Technologies).
A hybrid assembly was performed using both the short (Illumina)- and long (Nanopore)-read sequencing data using Unicycler (v.0.4.0), which uses
de novo-assembled Illumina data from SPADES to polish the
de novo-assembled contigs obtained from Nanopore data using RACON (
39). We used the default settings available in Unicycler to coassemble the Illumina data (forward and reverse fastq files) and Nanopore data (single fastq file), which consisted of the following commands: ./unicycler -1 isolate-6-1_illumina_R1_001.fastq -2 isolate-6-1_illumina_R2_001.fastq -l isolate-6-1_Nanopore.fastq –spades_path ./SPAdes-3.13.0-Darwin/bin/spades.py –racon_path ./racon/build/bin/racon –bowtie2_build_path ./bowtie2-build –bowtie2_path ./bowtie2 -o out. The combined assemblies of Illumina and Nanopore data resulted in a relatively low number of contigs (9 to 12 per genome) and a predicted genome completeness of 100% for nearly all genomes (
Table S1). Genome completeness was determined using CheckM (
40), using the family
Rhodospirillaceae as the reference group (which contains all species within the genus
Thalassospira). Genomes were annotated using RASTk (
41).
Core genome phylogenetic analyses.
The core genome was defined as the set of orthologous genes that were shared in all subseafloor and “type strain”
Thalassospira genomes. NCBI genome accession numbers for the type strain genomes of
Thalassospira (those isolated from surface-world habitats) are as follows:
NZ_JAATJD010000001.1 for
T. tepidiphila,
NZ_JPWA00000000.1 for
T. xianhensis,
NZ_ATWN01000001.1 for
T. lucentensis,
NZ_FTON01000032.1 for
T. xiamenensis,
NZ_AMRN01000001.1 for
T. profundimaris,
NZ_FNTU01000002.1 for
T. permensis,
NZ_JFKB00000000.1 for
T. alkalitolerans,
NZ_CP031555.1 for
T. indica,
NZ_CP024199.1 for
T. marina,
NZ_PGTS00000000.1 for
T. povalilytica,
NZ_NXGX00000000.1 for
T. lohafexi,
NZ_JFKA00000000.1 for
T. mesophila, and
NZ_JRJE00000000.1 for
T. australica. Orthologous genes between the subseafloor and type strains were defined as those sharing >30% amino acid similarity to the collective suite of genes within the type strain
Thalassospira xiamenensis M-5.
T. xiamenensis M-5 was chosen as the reference genome for this purpose because it is the only publicly available genome of a cultivated
Thalassospira isolate that is completely closed and represents a single chromosome and a 190-kb plasmid. A total of 1,809 orthologous genes (defined as being present as only a single copy in each genome) were identified that are carried by all
Thalassospira strains that had >30% sequence similarity to genes within the
T. xiamenensis M-5 genome. We define these genes as orthologs, as opposed to paralogs, because they are present as only a single copy in each genome. Paralogs were identified in
Thalassospira genomes by performing similarity searches for genes (BLASTp queries) that had similarity to a
T. xiamenensis gene (BLASTp subjects) and then removing subjects (potential core genes) that had hits from multiple queries from the same query genome. Only
T. xiamenensis genes (BLASTp subjects) that had similarity from a single BLASTp query across all
Thalassospira strains were kept as orthologs. This should select only single-copy genes (genes that have not been duplicated within the genomes). Because the genes have homology between the different genomes and are present as a single copy across all genomes compared, we interpret them to be orthologs. Each of these 1,809 orthologous genes was individually aligned between all
Thalassospira strains using MUSCLE (
42), and the 1,809 individual alignments were then concatenated into a single core genome alignment for the subsequent phylogenomic analysis using Geneious Prime (version 2019.2.1). After the concatenation of all core genes, the total size of the core genome alignment was 1,817,073 nucleotide characters, with 34 taxa (21 subseafloor strain and 13 type strain taxa). A maximum likelihood phylogeny was created using PhyML (
43) with a generalized time-reversible (GTR) model of evolution and 100 bootstrap replicates, which was implemented in SeaView (
44). The resulting phylogenetic tree and the concatenated core genome alignment were used as inputs for subsequent ClonalFrameML and
dN/
dS ratio (ratio of nonsynonymous to synonymous mutations) analyses. Full-length 16S rRNA gene sequences were amplified from the isolated subseafloor bacteria from DNA extracted from cell pellets (using the protocol described above) using PCR (30 cycles of 95°C for 30 s, 60°C for 30 s, and 72°C for 2 min) with universal PCR primers 27F (5′-
AGAGTTTGATCMTGGCTCAG-3′) (
45) and 1391R (5′-
GACGGGCGGTGTGTRCA-3′) (
46) and Sanger sequenced, as a quality control measure. Quality control and assembly of the Sanger-sequenced 16S genes (forward and reverse sequences, each ca. 850 bp in length) were performed in CLC Genomics Workbench using the default base-calling quality and
de novo assembly settings for Sanger sequence data. Phylogenetic analyses were performed on the Sanger-sequenced 16S sequences with maximum likelihood methods as described above.
The contributions of mutations and recombination to the genomic diversity in the concatenated core genome alignment, the number of recombination events (imports) per genome, and the positions of recombination hot spots were investigated using ClonalFrameML (
26). This approach allows the assessment of homologous recombination within the core genome only (e.g., genes shared between all compared genomes) and does not assess nonhomologous gene acquisition (e.g., horizontal gene transfer) or gene loss events. Nucleotides unaffected by homologous recombination are referred to as unimported, and nucleotides subject to recombination are referred to as imported (
26). ClonalFrameML provides the relative ratio of recombination to mutation (
R/θ), the mean length of recombined DNA (δ), and the mean divergence of imported DNA (ν). These results were used to calculate the relative contribution of recombination versus mutation to the overall genomic diversity (
r/
m), using the formula
r/
m = (
R/θ) × δ × ν. ClonalFrameML was performed in three separate runs, containing a core genome alignment that contained (i) all genomes, (ii) only the subseafloor genomes, and (iii) only the type strains. The resulting
r/
m values from these three groups (presented in
Table 1) were then used to interpret the relative importance of mutations compared to recombination in the separate groups (e.g., type strains versus subseafloor strains). The
r/
m analysis provided by ClonalFrameML is capable of identifying recombination between closely related strains where the mean ν value can be as low as 0.031 (
26). The mean ν value between our closely related subseafloor strains is slightly lower but comparable (ν = 0.026) (
Table 1). The mean divergence of imported DNA that we describe is like those described in the original ClonalFrameML methods paper, suggesting that the
r/
m values that we calculated can detect recombination between the closely related subseafloor strains.
In addition to calculating sites and rates of recombination in the core genome, ClonalFrameML also estimates the ancestral sequences at internal nodes of the clonal genealogy and any missing base calls in the observed sequences. The reconstruction of ancestral sequence states is performed using maximum likelihood, and the ClonalFrame model can be thought of as a hidden Markov model (HMM) when the ancestral and descendant genomes for each branch of the clonal genealogy have been observed or reconstructed (
26). The hidden state of the HMM records whether each nucleotide was subject to recombination or not on the branch connecting the two genomes. We acknowledge that drawing inference under the resulting ancestral recombination graph is a notoriously complex statistical problem (
26). Instead, here, we use ClonalFrameML only to assess within-group recombination (e.g., between species within the genus
Thalassospira), and thus, our analysis cannot assess the influence of external recombination (from species outside the genus
Thalassospira).
The
dN/
dS ratio in the core genome alignment (global ω ratio) was estimated using HyPhy v2.2.4 (
47) and applying the adaptive branch site random-effects likelihood (aBSREL) approach (
48) to all branches in all subfamilies. The
dN/
dS calculation is based on an in-frame codon alignment of the core genome, together with the corresponding core genome phylogeny. As described previously by Smith et al. (
48), aBSREL tests for each branch in the phylogeny (e.g., each genome in the core genome alignment) and whether a proportion of sites have evolved under positive selection and will infer the optimal number of ω classes (
dN/
dS) for each branch. aBSREL takes into consideration that different branches may feature evolutionary patterns with differing complexities and hence may be better modeled by more or fewer ω classes. To this end, aBSREL uses AICc (small-sample AIC [Akaike information criterion]) to infer the optimal number of ω for each branch in the core genome phylogeny. Tests for statistical significance of branches under positive selection (e.g., genomes with a significant
dN/
dS ratio) in aBSREL are based on likelihood ratio test (LRT) distributions, whereby a null model is defined in which no positive selection is allowed on a branch, and the LRT is used to determine whether the null model can be rejected (
48). The aBSREL algorithm that calculates the global ω (
dN/
dS) ratio does not provide accurate results for positive selection when highly similar sequences are included in the in-frame codon alignment (
48). Therefore, we calculated the
dN/
dS ratios from the in-frame codon core genome alignment with aBSREL 10 separate times, with each run containing one selected genome from each of the three subseafloor clades: (i)
T. xiamenensis strain Neogene, (ii)
T. xiamenensis strain Miocene, and (iii) “
Ca. Thalassospira pliocenensis.” The results for each of these separate aBSREL runs are summarized in
Table S2. By running aBSREL in this manner (multiple times using only one representative of highly similar subseafloor clades), statistically significant
dN/
dS values were obtained for each subseafloor genome (
Table S2). The 10 different aBSREL run output files produced from the HyPhy program (.json files) were then visualized in HyPhy Vision (
http://vision.hyphy.org/aBSREL), a Web graphical user interface that allows visualization of branch site selection,
P values, and global ω (
dN/
dS) classes for each branch (genome) from the in-frame codon core genome alignment.
Low
dS values may produce unreliable
dN/
dS ratios. aBSREL does not calculate separate
dN or
dS values. We investigated how the magnitude of
dS in the subseafloor strains compared to that of the type strains using the FitMG94 workflow in HyPhy (
https://github.com/veg/hyphy-analyses/tree/master/FitMG94) to fit the Muse-Gaut model of DNA sequence evolution (
49) to the core genome alignment, which reports the number of synonymous and nonsynonymous substitutions per nucleotide site for each branch on the tree. The FitMG94 workflow also uses a corrected empirical estimator (CF3x4) that provides improved estimates of several parameters in the evolutionary model (
50). The results of FitMG94 showed that there was no significant difference in the magnitude of
dS in the type strains (0.02 to 0.36) compared to the subseafloor strains (0.001 to 0.07) (
P = 0.59 by a two-sided
t test).