INTRODUCTION
Plasmids are extrachromosomal, mainly circular, double-stranded DNA (dsDNA) elements found in many species of
Bacteria,
Archaea, and
Eukarya (
1). Typically, plasmids harbor replication control, partitioning, and mobilization genes, ensuring their transfer and maintenance within the cell. Some bacterial plasmids are mobile (
2), thereby facilitating an exchange of genetic material between species and hence contributing to inter- and intraspecies genomic diversity. Mobile plasmids may carry antibiotic and biocide resistance genes and genes coding for virulence factors (
3) and will confer those characteristics on their new host. These accessory genes of plasmids are selected by the evolutionary forces in the particular ecological niches in which the host resides, and therefore, they constitute a key element in studies of microbial habitats and of the ability of microbes to survive under specific conditions (
4–6).
In the past, studies on plasmids were performed by culturing strains of host bacteria in bacterial collections (
7). As a result, studies on plasmids derived from natural environments (
8–11) were limited to plasmids from cultivatable bacteria, plasmids that can conjugate to cultivatable bacteria, and plasmids for which there was a known primer (
12). The consequent lack of studies related to the ecology of plasmids in natural environments can be attributed not only to difficulties in culturing plasmid hosts (
13–15) but also to challenges in isolating plasmid DNA (
4,
6,
7,
16) and to the lack of tools to detect and quantify plasmids (
16,
17). In recent years, however, marked progress has been made in addressing these problems and challenges as a result of the availability of relatively low-cost high-throughput sequencing technologies, which has increased the possibility of obtaining higher-quality data on plasmids. Thus, new studies have identified plasmids in metagenomic data sets from groundwater (
13), bovine rumen (
5,
18), wastewater (
19), and marine (
20,
21) samples. In all these studies, plasmids were enriched and amplified during the DNA extraction process to ensure that sufficient reads were sequenced to enable accurate
de novo plasmid reconstruction in the analysis step.
In recent years, there has thus been a rapid increase in the number of known plasmids, from 13,789 bacterial plasmids recorded in the plasmid database PLSDB when it was first published (September 2018) to 34,513 records in the current version (June 2021; PLSDB v. 2021_06_23) (
22), i.e., a 2.5-fold increase within less than 3 years. However, marine plasmids—constituting only approximately 1% of plasmids in the database—are clearly underrepresented. Thus, there are striking knowledge gaps pertaining to the distribution, genetic repertoire, and environmental impact of marine plasmids (
23). In contrast, there has been a rapid accumulation of environmental microbiome studies in general, and specifically marine microbiome studies, with studies ranging from the limited 16S amplicon sequencing to full metagenome deep sequencing. However, most of these studies have focused on bacterial genomes, with no attempts to specifically identify plasmids, and hence they did not include specific plasmid amplification steps. Nonetheless, full metagenomic data may contain sufficient reads of plasmid DNA to provide information about plasmids. The deeper the sequencing, the greater the likelihood that the data will include sufficient plasmid reads to facilitate accurate
de novo plasmid assembly (
6,
13,
18,
24).
There are several tools for the detection of plasmids in metagenomics data (
25–27), but most of these tools are based on alignment to known plasmids, which are mainly terrestrial and freshwater plasmids. The implication of the underrepresentation in the PLSDB (
22) of marine plasmids, which are most probably different from terrestrial and freshwater plasmids (
11,
28,
29), is that the majority of currently available methods are very limited for analyzing marine environmental data. Nevertheless, there are a few tools that are suitable for
de novo assembly of marine plasmids from metagenomics data (
17,
30,
31). While the accuracy of these
de novo assemblers needs to be improved (
31), they have indeed facilitated the discovery of novel plasmids in environments for which knowledge about plasmids is limited, such as marine environments.
In the current study, we constructed a pipeline for
de novo plasmid assembly and detection from multiple related environmental samples and applied the pipeline to public metagenomics samples collected from the Red Sea (
32) (see the full description in Materials and Methods). Of 362 plasmid candidates, we classified seven as probable plasmids (where only one of the seven has previously been reported). We showed a strong correlation between the plasmid distribution patterns in marine environments and the physical conditions of those environments (such as depth and temperature), while only some plasmid distribution patterns in marine environments correlated with microbial distribution patterns. The complex correlation structure between the physical conditions and the distribution patterns of plasmids and microorganisms illustrates the potential contribution of plasmids to the adaptation of bacteria to their ecological niches.
DISCUSSION
In this study, we reanalyzed metagenomic sequences from the Red Sea (
32), with the aim of detecting and characterizing plasmids of marine origin. Although the isolated DNA had not been enriched for plasmids, we detected 362 plasmid candidates from 45 sampling points. We note that it is estimated that in most cases, plasmids can be detected only if plasmid extraction procedures are performed (
5,
13). Moreover, the data set used in this study was collected from the 0.1- to 1.2-μm fraction, which excludes most of the eukaryotes, such as diatoms, which were shown to contain plasmids (
44). Thus, these candidates constitute just the tip of the iceberg of the marine plasmidome in that region.
The physical properties (depth and temperature) of the sampling points were shown to correlate with the distribution patterns of the plasmid candidates. While other studies have reported temperature as a key predictor of microbial diversity in the oceans (
45–47), the physical properties of the sampling points are better reflected by the distribution of plasmids than by the microbial distribution at the genus level (
Fig. 3A) (
34). Interestingly, in terms of plasmid composition, all three GAIW sampling points were coclustered with three sampling points that are not GAIW (
Fig. 2A, cluster Cd). The GAIW sampling points, where the temperatures are lower (21 to 23°C) than those of the Red Sea, were clustered with warmer sampling points (~24°C). Thus, at least in this case, the effect on the plasmidome of the incoming stream is stronger than that of the water temperature.
The distribution patterns of some plasmid candidates strongly correlate with the distribution patterns of at least one microbial genus (
Fig. 3B). Such strong correlation between the distribution patterns indicates colocalization and may suggest a potential host for the plasmid candidate. On the other hand, the distribution patterns of other candidates are not strongly correlated with the distribution patterns of any genus (
Fig. 3B), which may suggest that these candidates are broad-host-range plasmids. However, the potential assignment of host and host range based on colocalization is hypothetical and should be further tested and validated. Notably, one plasmid candidate cluster (C1) strongly correlates (Pearson correlation coefficient of 0.97) with one microbial cluster (Cb7). Both the microbial and the plasmid candidate cluster distribution patterns are higher in deep-water environments (
Fig. 3A and
D, respectively) and may correspond to unique populations of microbes and plasmids which are specific to deep waters. Thus, we hypothesize that C1 plasmids provide deep water adaptation, and it is possible that their distribution pattern is attributed to a single deep-water genus. In contrast, the distribution pattern of C6 plasmids, for example, does not strongly correlate with a single microbial cluster (
Fig. 3B) and thus cannot be attributed to a single genus, and those plasmids are hypothesized to have a broad host range.
Most plasmid candidates (90.61%) were present at more than one sampling point, suggesting that they represent a reliable sequence and not an assembly artifact. Partial coverage of some plasmids at some sampling points was also revealed, indicating either conserved regions that are shared between different plasmids—be it a shared backbone (namely, the same plasmid with a different cassette) or a shared functional gene—or plasmids at low copy numbers that are not fully covered within the sequencing data. This pattern of conservation of plasmid sequences across the samples was previously identified by Kothari et al. (
13), who attributed ecological significance to plasmids in terms of maintenance and transfer of conserved key functionalities in an ecosystem, based on reports of the plasmidome in soil (
48) and rumen (
5) environments. We demonstrated this pattern locally with our draft plasmidome and globally in the case of three plasmids, 106_LNODE_1, 1_LNODE_1, and 297_RNODE_15, only one of which has been previously reported (
28).
Only 3,062 of the predicted ORFs (50.8%) matched proteins with known functions. Previous plasmidome annotation studies identified 25 to 61% of ORFs in lakes (
6), a wastewater treatment plant (
49), and rumen (
5,
18), and our results are thus in agreement with previously reported studies. Nonetheless, despite the low number of annotated plasmids from marine sources in the plasmid database and the reported differences between marine and terrestrial plasmids, the ability to match 50.8% of the predicted ORFs with a known protein is encouraging and serves as a good starting point for the identification and annotation of marine plasmids.
The genes encoding DNA replication, recombination, and repair (COG category L) were the most represented genes in the marine plasmidome—significantly more than in the COG database. Since replication is required for plasmid survival and is one of the main backbone plasmidic functions, genes involved in replication and transposable elements are highly abundant in the plasmids (
50). The other functional categories that are highly represented in this study (cell wall/membrane/envelope biogenesis, energy consumption and conversion, posttranslational modifications and protein turnover, carbohydrate metabolism and transport, amino acid transport and metabolism, inorganic ion transport and metabolism, intracellular trafficking, secretion, and vesicular transport) are also similar to those in previous plasmidome studies (
5,
6,
13,
49). These functional categories mostly represent functions that are part of the plasmid maintenance and survival functional repertoire (replication, conjugation, and resistance) and are thus expected to be represented in most plasmids (
13).
The assignment of functions to the ORFs allowed us to identify seven candidates as most probably plasmids and 167 candidates as putative plasmids, based on the genes that they carry. Unlike bacterial chromosomes, plasmids are very dynamic and do not have specific markers, as some plasmids are known to integrate into the host genome and transfer sections of the chromosome along with their conjugative machinery into a recipient cell (
51). Such integrations occur via homologous or nonhomologous recombination, allowing fast and large evolutionary jumps within the affected genes (
52). Thus, the fact that a candidate plasmid contains chromosomal or viral genes, or a partial match to chromosomal or viral DNA (as was observed for 158 of 362 candidates [43%]), does not rule out the possibility that these candidates are plasmids, as it is common for plasmids to exchange DNA with bacterial chromosomes and viruses (
52).
Generally, plasmids vary markedly in size, with the smallest being around 846 bp (and carrying only the replication initiation gene) and some plasmids exceeding the size of some bacterial chromosomes (and carrying ≥1,000 genes) (
53). Associations between plasmid and bacterial genome size and between plasmid size and the mobility genes it carries (if any) were shown by Smillie et al. (
53). Sizes of candidates classified in this study as definitely “plasmids” were in range of 4 to 20 kb, which is a typical size for both mobilizable and nontransmissible plasmids (
43). The distribution of candidate sizes showed peaks at around 5, 30, and 60 kb, which is similar to the distribution of plasmids described by Smillie et al. (
53). It is important to mention that candidates’ size range in this study might not represent the real plasmid size range, due to the limitations of the sequencing and bioinformatic and microbial extraction methods (
13,
30).
As mentioned above, marine plasmids differ from terrestrial plasmids in terms of their DNA sequence and carried genes and are underrepresented in the PLSDB. In accordance, matches to the PLSDB (
22) were found for only a few plasmid candidates (1.65%). One of the hits was to a plasmid previously isolated from
Roseovarius sp. strain THAF27 from a marine aquarium sample (GenBank accession no.
CP045397.1). That plasmid was characterized as a cassette-containing pLA6_12-like plasmid, restricted to marine environments (
28). Interestingly, the other hits were for plasmid candidates with relatively short sequences, which aligned to larger plasmids (
Table 1). The annotation of the ORFs in these overlapping regions indicates that most of these short candidates were transposons, but it was not clear whether they exist as stand-alone mobile genetic elements or whether they are part of other plasmids or chromosomes and were found by our pipeline by virtue of repeats at the ends of the transposon. Some of these short candidates were classified by us as “uncertain” due to the lack of plasmid-associated genes. Nevertheless, plasmid candidates that were matched to plasmids in the PLSDB were previously isolated from strains of
Roseovarius species,
Vibrio parahaemolyticus,
Vibrio alginolyticus,
Rhodococcus species, and
Acinetobacter baumannii.
Vibrio and
Roseovarius are typical of marine environments, whereas
Rhodococcus is mostly known as a soil bacterium. However, some
Rhodococcus species were previously found in marine environments (
54). Interestingly, although
Acinetobacter baumannii is a soil bacterium, it was shown that the Rep proteins of many
Synechococcus plasmids are similar to those of plasmids from several strains of
Acinetobacter baumannii (
20). Hence, it is not surprising to find matches to plasmids from
A. baumannii in our study as well.
In addition to 106_LNODE_1, which is the local version of the previously reported hitchhiker plasmid pLA6_12 (28), we found “relatives” with high sequence similarity to two more plasmid candidates that were classified as “plasmid” by us in other whole-genome sequences from available marine metagenomes. Plasmid 1_LNODE_1, which in our analysis was de novo assembled and revealed by mapping for 19 different sampling points, and plasmid 297_RNODE_15, which was found at three sampling points in our analysis, were also partially detected in biosamples collected in the Tara Oceans expedition 2009 to 2013 and the Malaspina Expedition 2010 in different locations in the Atlantic, Pacific, and Indian oceans. This global distribution with different functional cassettes suggests that these plasmids are potential vectors able to transport different integration cassette motifs across vast geographic distances. The dynamics of interchangeable cassettes provides evidence for horizontal gene transfer events, which support the classification of these candidates as plasmids. The observed variable regions in the plasmids contain genes encoding either resistance to toxins or various toxin-antitoxin systems or lack mobility genes, which might shed light on rapid adaptation to environmental pollutants.
The plasmid candidates found in 45 and 31 of the 45 stations carried antibiotic resistance and metal resistance genes, respectively, in accordance with antibiotic resistance and metal resistance being functional capabilities frequently supplied by the plasmidome (
13,
55). Even though previous studies found no evidence for resistance to metals or biocides being associated with contamination (
28), we showed that both metal and antibiotic resistance genes were enriched for specific sampling points, suggesting that there is some association between the environmental conditions and the development of resistance. For example, plasmid 106_LNODE_1 in our study (a version of pLA12_6) includes the MAPEG cassette, providing protection against xenobiotics and/or oxidative stress (
56), which have been reported to be typical environmental stressors in the Red Sea (
28). This plasmid is found in many other locations around the globe, but with a different cassette of functional genes at each location, suggesting adaptation of the variable cassette to the specific conditions of the location in which the plasmid resides. Thus, our analysis of the global distribution of plasmids found in this study shows that plasmids provide site-dependent phenotypic modules to their ecological niches, in accordance with previous studies (
28,
57).
In this study, we reanalyzed publicly available metagenomic sequencing data from the Red Sea and demonstrated that marine plasmids can be discovered and characterized from publicly available metagenomics data that have not been enriched for plasmids. In addition, we showed site-dependent plasmid distribution and correlations between plasmid distribution patterns and environmental conditions, demonstrating the importance of plasmids in the microbial ecosystem. We detected seven definite plasmids and 355 other candidate plasmids for which additional research is required to validate their plasmidic nature. To increase the representation of environmental marine plasmids in plasmid databases, additional studies on plasmid detection in marine environments should be performed. Our study gives only a glimpse into the marine plasmidome, probably one of the largest and most untapped sources of genes with novel functions.