Environmental context of the three sediment locations.
We investigated CPR in marine sediment cores from three locations: Mohns Ridge (MR) (which is part of the Arctic Mid-Ocean Ridge) core MR-GS14-GC08, North Pond (NP) core NP-U1383E, and organic-rich sediments off the Costa Rica Margin (CR) at site U1379 (core CR- U1379B) (
Fig. 1A), where concomitant metagenome and 16S rRNA gene amplicon sequencing data were published (
45–47) and are available for detailed analyses of CPR. The seafloor depths of these sites are in the range of 127 to 4,435 m. Sediments at these sites are characterized by a nearly constant rate of sediment deposition but without underlying energy inputs like hydrothermal vents or episodic sediment changes like mass wasting events in trenches. Based on the existing geochemical profiles (see Fig. S1 in the supplemental material) and definitions described previously (
45–47), the investigated sediments represent a variety of redox zones ranging from the oxic zone down to the sulfate reduction zone (
Fig. 1), although they do not represent a coherent time series of sediment samples formed at a single location. The four sediment horizons from the MR core span the upper oxic zone (10 cm), the oxic-anoxic transition zone (OATZ; 100 cm), the nitrate-ammonium transition zone (NATZ; 160 cm), and the Mn reduction zone (250 cm) (
Fig. 1B). Additionally, a total of four oligotrophic sediment horizons from the NP core were included, two of which are located in the upper oxic zone (100 cm and 1,000 cm), and the remaining two represent the transition zones between the oxic and anoxic sediments (i.e., the oxic-anoxic transition zone [at ~2,200 cm] and the anoxic-oxic transition zone [AOTZ, at 2,950 cm below seafloor {bsf}]) (
Fig. 1E). Finally, for the CR core (
47), four sediment horizons in the interval of 2 to 9 m below the seafloor were included (
Fig. 1H). Sulfate should be the dominant electron acceptor in sediments of this interval, due to the combination of the following three observations. (i) Sulfate in the porewater decreased with depth (
Fig. 1H), although its depletion depth (~40 m below seafloor [mbsf]) was not covered by the investigated sediment layers of this study. (ii) The sulfur isotope of sulfate (δ
34SO
4) increased with depth (
48), indicating biological sulfate reduction. (iii) Dissolved Mn in the porewater, the product of Mn oxide reduction, showed a decreasing rather than increasing trend with depth in the investigated sediment interval (Fig. S1C), indicating that Mn oxides were no longer an important terminal electron acceptor. Therefore, this extensive data set allowed us to capture CPR diversity and genomes from a wide range of redox conditions formed in the sediment’s early diagenesis.
Only part of the CPR diversity can be captured by 16S rRNA gene amplicon sequencing.
We first examined the presence and diversity of CPR in the metagenome sequencing data. By searching and classifying the putative 16S rRNA gene reads using phyloFlash (
49), we found that CPR bacteria account for 2.0 to 4.2% of the total microbial communities in MR core GC08 (
Fig. 1B), 0.4 to 28% in NP (
Fig. 1E), and 0.3 to 1.6% in CR (
Fig. 1H). We noted that the variations in the relative abundance of CPR in different sediment layers, especially between those layers in the NP core, were not artifacts caused by different metagenome sequencing depths between samples, because the total 16S rRNA gene reads detected in the metagenome sequencing data sets were comparable (Table S1). In contrast, based on the concomitant 16S rRNA gene amplicon sequencing data, there were only 26 (1.1% of the total recovered operational taxonomic units [OTUs]), 27 (0.4% of the total), and 3 (0.3% of the total) OTUs affiliated with CPR at NP, MR, and CP, respectively. These CPR OTUs accounted for <2% of the total microbial communities of the three sediment sites (
Fig. 1B,
E, and
H), lower than those estimated by metagenome sequencing. The discrepancy between these two methods may be related to the fact that the amplicon sequencing underestimates the relative abundance of CPR because most of the divergent and intron-containing 16S rRNA genes of CPR can evade the detection of universal primers (
10). Nevertheless, these results suggested that members of CPR are usually part of the rare community (<5% of the total prokaryotic communities, irrespective of the detection method) yet are widespread (frequently encountered) in marine sediments of various redox zones, which was also supported by reports from marine sediments of other locations such as hadal trench sediments in the Pacific Ocean (
41,
42).
Metagenome sequencing also revealed a higher diversity of CPR than did the 16S rRNA gene amplicon sequencing. Based on the taxonomical classification of the 16S rRNA gene reads in the metagenome sequencing data, CPR of the classes “
Candidatus Paceibacteria,” “
Ca. Gracilibacteria,” “
Ca. Microgenomatia,” “
Ca. Saccharimonadia,” “
Ca. ABY1,” and “
Ca. WWE3” were present in the three sediment sites. Among the total 12 sediment layers where metagenome sequencing data are available, “
Ca. Paceibacteria” was generally the most abundant CPR class, although the occasional dominances of “
Ca. Gracilibacteria” at 22 mbsf of NP (
Fig. 1G) and “
Ca. Microgenomatia” and “
Ca. ABY1” in the three deep layers of CR were also detected (
Fig. 1I). In contrast, only members of the classes “
Ca. Paceibacteria” and “
Ca. Gracilibacteria” (see Fig. S2 for the phylogenetic placements of individual OTUs) were detected by 16S rRNA gene amplicon sequencing (
Fig. 1C and
F): “
Ca. Paceibacteria” was the dominant CPR class (~80% of the total CPR community) in most of the sediment layers of MR core GC08 (
Fig. 1C), and members of “
Ca. Gracilibacteria” dominated in most of the examined sediment layers of NP (
Fig. 1F). The discrepancy in obtained CPR community structure from the two methods again demonstrated that the amplicon sequencing method can probably capture only a subset of the total CPR population in complex communities.
Novelty of CPR MAGs recovered from marine sediments.
To further characterize CPR in these sediments, we recovered CPR genomes from the existing metagenome sequencing data of these sites. We obtained four CPR MAGs (MR_Bin143, MR_Bin147, MR_Bin1662, and MR_Bin1762) from Mohns Ridge, two (NP_Bin194 and NP_Bin050) from North Pond, and five from sediments of the Costa Rica site (CR_Bin034, CR_Bin039, CR_Bin047, CR_Bin053, and CR_Bin021) (
Table 1). These 11 CPR MAGs are considerably small in genome size (0.39 to 1.10 Mbp) and contain 12 to 153 scaffolds (
Table 1). These genomes have 462 to 1,264 coding sequences, with the coding density varying in the range of 86.2 to 92.0% (
Table 1). Based on the universal 43 single-copy genes of CPR bacteria (
10), all except two (MR_Bin1762 and CR_Bin021) were estimated to be of >97.7% completeness with <2.3% redundancy (
Table 1). CPR bacteria are known to lack some of the single-copy genes found in non-CPR bacteria (
10), so based on the single-copy genes automatically determined for bacteria by CheckM (
50), completion estimates were lower (61.0 to 80.3% complete) (
Table 1). All of these CPR MAGs except CR_Bin039 have a reconstructed 16S rRNA gene and thus can be regarded as high-quality MAGs.
The novelty of these CPR MAGs was evident from phylogenetic analyses based on (i) the 14 concatenated ribosomal proteins (
Fig. 2A) and (ii) the 16S rRNA gene (
Fig. 2B). To ensure consistency with the current literature, we adopted the updated genomic distance-based classification scheme of GTDB (
51) to name CPR lineages. While six CPR classes were present in the three sediment sites, as revealed by the bulk metagenome sequencing data (
Fig. 1), all 11 recovered CPR MAGs were classified as members of novel genera, families, or even orders within the class “
Ca. Paceibacteria.” The phylogenetic novelty of these genomes was also supported by the GTDB classification, in which all the CPR MAGs showed <80% average nucleotide identities (ANI) with their most similar genomes included in the GTDB (07-RS207) database, and relative evolutionary divergence (RED) values of 0.74 to 0.83 were calculated. In particular, NP_Bin050 represented a new order (named o__JAHCSD01 in GTDB 07-RS207), with the closest phylogenetic relationship to members of the order
Portnoybacterales (
Fig. 2A). MR_Bin147 was a member of a new genus in the family GWB1-50-10 in the order UBA6257, an order mainly constituted by MAGs that previously were classified as
Jorgensenbacteria recovered from the groundwater environment (
20,
52) (
Fig. 2A). The other CPR MAGs, belonging to the order “
Ca. Paceibacterales,” showed phylogenetic novelties at suborder levels. MR_Bin1662 and MR_Bin1762 formed a new family (we provisionally named it “
Candidatus Bathypaceibacteraceae”) (
Fig. 2A). Three MAGs from the CR core (CR_Bin039, CR_Bin053, and CR_Bin034) also formed a new family (provisionally named “
Candidatus Sedimentipaceibacteraceae”) (
Fig. 2A). NP_Bin194 represented a new genus within the RBG-13-36-15 family. In addition, MR_Bin143 represented a new genus in the UBA10102 family, which is composed mainly of genomes previously classified as
Wildermuthbacteria. Finally, CR_Bin047 represented a new genus within the family GWA2-38-27 (
Fig. 2A).
The phylogenetic novelty of these CPR MAGs was supported by the phylogenetic analysis of the 16S rRNA gene, which was broadly congruent with the concatenated ribosomal protein tree (
Fig. 2B). It also showed that close relatives of these CPR MAGs were previously detected by clone libraries in anoxic marine sediments of various locations, including the Gulf of Mexico (
53), Shimokita subseafloor sediments (
54), Angola Basin sediments (
55), and South Pacific Gyre ferromanganese nodules (
56), although quantitative information about their abundance at these sites is lacking. The phylogenetic relatedness between CPR MAGs in marine sediments of different oceanographic regions may reflect the habitat or ecological niche preferences or the availability of the hosts of the CPR groups.
After integration of the 16S rRNA gene amplicon sequences into the 16S rRNA gene phylogenetic tree, it was clear that three of the four CPR MAGs recovered from MR were also captured by the 16S rRNA gene amplicon sequencing: MR_Bin1662 showed a 100% match with OTU_152, MR_Bin1762 matched OTU_556, and MR_Bin147 corresponded to OTU_525. It was also clear that the recovered CPR genomes represent only a subset of the CPR diversity revealed by the 16S rRNA gene amplicon sequencing. Despite that most CPR MAGs from NP and CR contained a 16S rRNA gene sequence, none of them was a good match with the amplicon-recovered OTUs from these sediments. The misrepresentation of these novel CPR MAGs in the 16S rRNA gene amplicon sequencing data may be due to the divergence of their 16S rRNA genes because they have one or more mismatches with the forward PCR primer used than those recovered from the MR sediments (see Fig. S3 in the supplemental material). Another possibility is that these CPR MAGs were rare taxa such that stochasticity and local heterogeneity might have led to their evasion in the PCR amplification. “
Ca. ABY1” and “
Ca. Microgenomatia,” two important CPR classes that prevail in the three investigated sediment cores, especially in the CR core (
Fig. 1I), were absent in our genome inventory but had been recovered from Mariana Trench hadal sediments (
40). This indicated that the CPR MAGs detected by bulk metagenome sequencing are indeed part of the microbiome of the vast marine sedimentary environment. The mismatch of 16S rRNA gene sequences between the MAGs and the amplicon sequencing also suggested that some yet unknown CPR bacteria remained to be revealed in global marine sediments (for an example, see reference
57) and highlights the necessity of genome recovery for the discovery of novel CPR bacteria.
Varied preferred niches of marine sediment CPR MAGs.
The presence of the 11 CPR MAGs was confined to anoxic sediment layers at their source locations. Based on the genome coverage calculation, CPR bacteria represented by these MAGs prefer anoxic sediment layers. The four MAGs from MR were present mainly in anoxic sediments in the NATZ or the Mn reduction zone below the oxygen depletion depth (
Fig. 3A), the two CPR MAGs from NP mainly existed in the AOTZ (29.50 mbsf), and the five MAGs from CR were detected in the uppermost sequenced sediment horizon (2.0 mbsf) and to a lesser extent in the deepest horizon (~9.0 mbsf) (
Fig. 3A). Read mapping across the three sediment sites suggested that the individual CPR MAGs were generally specific to the sediment core of origin (
Fig. 3A), except that the five CPR MAGs recovered from CR were also detectable in the AOTZ of core NP-U1383E (
Fig. 3A). Similar site-specific diversity of CPR genomes was also recently reported in the groundwater and freshwater lake environment by large-scale metagenomic surveys (
23,
58).
As with where they were detected, the growth of the 11 CPR MAGs, inferred from the index of replication (iRep; values of >1 indicate proliferation) (
59), occurred only in their primary niches (i.e., the sediment layer where the highest genome coverage was detected for a certain genome). The highest iRep of the four individual MAGs from MR occurred in either the NATZ or the Mn reduction zone, the two from NP in the AOTZ, and the five from CR in the uppermost anoxic zone (
Fig. 3B). The calculated iRep (in the range of 1.2 to 2.2) suggested that these CPR MAGs were actively replicating in their primary niches at the time of sampling.
CPR are present in a wide range of redox niches but may not be directly selected by prevailing electron acceptors. Through amplicon sequencing and genome binning, we detected CPR in a variety of redox niches in marine sediments, with the dominant electron acceptors shifting from oxygen to nitrate, Mn oxides, and sulfate. Although none of the CPR MAGs was recovered from oxic sediment metagenomes, CPR 16S rRNA genes were detected in the investigated oxic sediments, especially in cores MR-GS14-GC08 and NP-1383E, which contain extensive oxic zones (
Fig. 1). We estimated the absolute abundances of CPR in these two cores as the product of the total cell numbers (i.e., the sum of archaeal and bacterial 16S rRNA gene abundances) (
46) and the relative abundances of CPR shown in
Fig. 1. In cores of both MR and NP, the estimated absolute abundance of CPR can reach over 10
5 cells g
−1 in the oxic zones (
Fig. 4). Among the CPR genomes detected in MR sediments, MR_Bin147 was present in the oxic zone of GC08 (
Fig. 3), despite that this genome lacks genes encoding the cytochrome
o ubiquinol oxidase (complex IV) involved in oxygen reduction (see results below). Most known CPR organisms were previously detected in oxygen-limited or anoxic environments (
33,
60), whereas some CPR organisms were also occasionally reported in oxic groundwater (
24,
60–62), freshwater lakes (
58), and soils (
63). Our survey suggests that the overall CPR population was present in marine sediments throughout the early diagenesis processes even in bulk oxygenated sediments, but individual genomes showed preferences for different anoxic layers in different cores. Whether they live in the presence of oxygen or inhabit microenvironments in sediment particles without oxygen remains unknown. Considering that (i) the recovered CPR genomes seem not to have metabolic pathways sensitive to the external redox condition changes (the switching of the dominant terminal electron acceptors) and (ii) they have their narrow niches (indicated by the restricted distributions of their presence and active replication [i.e., iRep values]), the distribution of CPR bacteria in marine sediments may not be directly selected by the prevailing electron acceptors. The lack of direct environmental dependences of CPR has also been observed in a recent large-scale genome survey in the groundwater environment (
24).
Limited energy metabolisms and biosynthesis capacities in marine sediment CPR. The CPR organisms recovered from marine sediments are likely fermentative organotrophs with simplified energy metabolism pathways. All recovered MAGs lack hydrogenases, especially the cytoplasmic bidirectional group 3 [NiFe] hydrogenase that has been proposed to be involved in pumping protons to build the proton motive force and help ATP generation in CPR genomes (
33). All marine sediment CPR MAGs, except MR_Bin147, lack ATPases (
Fig. 5), indicating that they are not capable of conserving energy by proton motive force. Instead, they may synthesize ATP through partial glycolysis via fermentation and substrate-level phosphorylation (
33), similar to the recently characterized CPR bacterium
Vampirococcus lugosii (
64). This was confirmed by the presence of the glycolysis pathway-related genes in all sediment CPR MAGs recovered in this study, which facilitates the degradation of glucose to produce pyruvate or further to acetate, although some genes that regulate few intermediate steps were missing likely due to the incomplete nature of the CPR genomes (
Fig. 5). The important intermediate compound of glycolysis, fructose 6-phosphate, can be provided by the pentose phosphate pathway, which is encoded in most of the recovered CPR MAGs (
Fig. 5). We also note the lack of a complete tricarboxylic acid cycle (TCA), despite some CPR MAGs having a small subset of the enzymes in this cycle, likely for biosynthetic purposes. Like their close relatives from other environments (
33), the marine sediment CPR MAGs lack a respiratory electron transport chain, as evidenced by the absence of NADH dehydrogenase (complex I) and complexes II to IV of the oxidative phosphorylation pathway (
Fig. 5), suggesting that they are nonrespiring.
Although various glycoside hydrolases (GHs) have been detected in some CPR MAGs in groundwater (
22,
65) and their expression in some CPR has been detected in subseafloor sediments (
66), the GHs detected in our CPR MAGs are very limited. Six MAGs have a GH1 (for hydrolyzing carbohydrate moiety), while GH3, GH57 (starch), GH63, and GH130 (mannose) were also detected in fewer than three CPR MAGs (
Fig. 5), suggesting that marine CPR MAGs have very limited saccharolytic capacities. However, the majority of the detected carbohydrate-active enzymes (CAZymes) belong to families 2 and 4, which associated with glycolipid synthesis, similar to CPR genomes detected in groundwater (
21). Marine sediment CPR MAGs also lack the genes encoding nitrate reductase that were previously reported in genomes recovered from hadal sediments (
43). Although the copper-containing nitrite reductase (NirK) has been previously noticed in some CPR genomes (
22,
33), none of the marine sediment CPR genomes contain this gene. Therefore, whether CPR MAGs have an impact on the nitrogen cycle in marine sediments remains unclear.
Most of the sediment CPR genomes lack complete biosynthesis pathways for amino acids, lipids, cofactors, and nucleotides. Among the 20 known amino acids, only the genes for the biosynthesis of lysine and arginine are present (
Fig. 5), indicating that they may need to obtain the rest of the amino acids from the external environment or their presumed host cells. Similarly, for nucleotide synthesis, they have only 4 to 14 genes for the metabolisms of purine and pyrimidine (
Fig. 5). The CPR genomes have no recognized genes responsible for the synthesis of lipids or cofactors. Also lacking in these sediment CPR genomes are genes for flagellar biosynthesis (only 1 of the 46 required genes is encoded) (
Fig. 4), indicating that they are probably nonmotile in the sediment environment. These CPR genomes also lack ABC type transporters and have very limited genes (<5 genes) for the Sec secretion system (
Fig. 5). However, like other CPR genomes (
33), all sediment CPR MAGs have extensive genes for the biosynthesis of peptidoglycan, suggesting that they may have intact cell walls.
Given the multiple auxotrophies detected in the recovered genomes and small genome sizes, we anticipate an episymbiotic lifestyle for the sediment CPR MAGs similar to that of their nonmarine relatives (
34,
64,
67,
68). The lack of ABC transporters in the CPR MAGs may force them to rely on the host cells to obtain the necessary substrates for their metabolism. In MR_Bin147, the genome carries genes for hemolysin synthesis and hemolysin transporter, which could export hemolysin to the surface of the host cells and contribute to the host cell wall and membrane disruption and cell content release (
Fig. 6). Similar to CPR genomes in other environments (
27,
34,
69,
70), marine sediment CPR genomes harbor genes for type IV pilus synthesis (
Fig. 6), which may provide access for the membrane-bound translocation complex to environmental double-stranded DNA (
71). In addition, it also has a competence-related integral membrane protein, ComEC, which plays a role in the uptake of host DNA (
72) that can be degraded to various restriction endonucleases to provide the nucleotides necessary for growth (
Fig. 6).