V. cholerae populations can be locally dominated by a few clonal complexes.
We isolated a total of 480 strains of
V. cholerae and
V. metoecus (438 and 42, respectively, with 397 strains isolated in August 2009 and 83 isolated in September 2009 [see Table S1 in the supplemental material for a list of all isolates]). For both August and September, we obtained isolates from four size fractions (>63, 5 to 63, 1 to 5, and 0.2 to 1 μm) of three water samples each from Oyster Pond and its connected lagoon in Falmouth, MA. To estimate population structure and diversity, we performed multilocus sequence typing (MLST) by partially sequencing seven housekeeping genes from all isolates. In MLST, isolates with identical sequences at all seven loci belong to a single sequence type (ST). When STs differ at only one locus out of seven, they are considered part of a clonal complex (CC), i.e., a group of closely related strains sharing a recent common ancestor (
28).
The 438
V. cholerae strains isolated formed 17 CCs composed of 72 unique STs, with an additional 13 STs found as singletons (not part of a CC) (
Fig. 1 and
2). Chao1 richness estimation predicts the presence of a minimum of 137 and a maximum of 317 STs, suggesting the existence of a considerably higher number of STs than what was observed (
Fig. 2D). Using a slightly broader OTU definition corresponding to that of a clonal complex, which groups together isolates with at least six identical alleles out of seven and counts all 13 singleton STs as separate OTUs, Chao1 richness estimation predicts the existence of 30 to 52 OTU, of which 30 have been observed (
Fig. 2D). Our data set therefore appears to contain most if not all the clonal complexes present in our samples (60 to 100%) but only a portion of existing sequence types (30 to 60%).
Three
V. cholerae CCs dominated our sampling site, containing a total of 263 isolates (CC1, 90 isolates; CC2, 103 isolates; CC3, 76 isolates), representing 60% of our sampling effort. Most CCs were composed of a dominant ST, with other variant STs occurring only sporadically (
Fig. 1 and
2). The proportion of
V. cholerae STs found in CCs (87%) in our in-depth sampling of a single geographical location is much larger than what has been found in previous studies of environmental populations of this species (13 to 18%) (
33,
44). This suggests a much more clonal structure and lower diversity than previously estimated for natural populations of
V. cholerae. Our study comprises the largest data set isolated to date, with 438 isolates grouping into 85 STs, with 72 STs found in 17 CCs. It also targeted a restricted geographical area (two water bodies, pond and lagoon, within 50 m of each other and connected by a channel) over a short time frame (less than 1 month). The most recent comparable study sampled 109 isolates from a dozen sites (lagoon, channel, river, and sea) in a 20-km-long Mediterranean lagoon system over the course of half a year, discovering 78 STs of
V. cholerae, with only 14 STs found in five CCs (
44). Another study isolated
V. cholerae from 15 sites (creek, river, and harbor) on a 100-km stretch of the Californian coast near the San Francisco Bay area and identified 113 STs and 8 CCs (composed of three STs each at most) from 156 strains over the span of a year (
33).
This notable difference in the degree of clonality between the populations observed here and the populations investigated in other studies is accompanied by a difference in the estimated ratio of recombination to mutation events. This value is usually given as r/m, the ratio of probabilities by which either recombination or mutation affects a site (which is a measure of the importance of recombination in the creation of nucleotide diversity), or rho/theta, the ratio of absolute number of recombination and mutation events. Keymer and Boehm (
33), using various methods of assessing the impact of recombination and mutation, estimated rho/theta from 4:1 to 6.5:1 and r/m of at least 45:1. Similarly, Esteves et al. (
44) estimated an r/m of 37:1, and Vos and Didelot (
45) estimated an r/m of 20:1 for various coastal
Vibrio populations. In stark contrast to that, our r/m estimates ranged from 1.6 (ClonalFrame) to 3.8 (LDHat) (
33). Similarly, our rho/theta estimations ranged from 0.49 (ClonalFrame) to 1.9 (empirical estimate, see Table S2 in the supplemental material) (
32), considerably below previous estimates for
V. cholerae and closely related species.
Our observation of few dominant clonal complexes (low evenness, moderate rate of recombination) in the
V. cholerae population we studied thus stands in contrast with the large number of highly recombinogenic singletons found in previous reports. It is possible that sampling depth had so far been insufficient to capture local population structure adequately. Previous sampling efforts have been limited to ∼10 isolates per water sample, adding more than a dozen sites often separated by 1 to 10 km to compose a population of ∼100 to 150 isolates. If the variation between strains present at each site is significant, such superficial sampling will yield artificially high evenness, with numerous STs of low abundance being observed. As we have sampled most of the diversity present at our two sites at the level of CCs, we obtained a more accurate measure of evenness in our sample. Furthermore, although sequences of some of the seven partially sequenced housekeeping genes in our study were often identical to sequences from previous studies found in the NCBI nr database (especially the commonly sequenced
recA and
mdh genes), none of the actual STs (all seven gene sequences from a single isolate combined) identified in this study have identical matches in public databases (see Table S3 in the supplemental material). This is consistent with previous findings that
V. cholerae might be a globally panmictic species in which a number of individual alleles are found over a wide geographic range, with local recombination and selection creating locally dominant variants with unique allele combinations (
16). We do not believe that the differences we observed are due to technical errors: our methods of minimizing PCR and sequencing errors do not appear to be particularly more or less stringent than those in other studies, and the nature of the observed population structure is rather robust to the introduction of errors. Since the majority of observed STs belong to CCs, and most exist at least in duplicate, an overestimation of STs due to the faulty identification of single nucleotide polymorphisms would have only moved our data set away from an even more unexpectedly clonal appearance.
A comparison of the results presented here with those of previous studies thus suggests that while diversity is high on a large spatial scale (kilometers) (
33,
44), it might be limited within a given environment (pond, lagoon, etc.), and as such, sampling schemes might have a large influence on the inference of population structure. Although we can be confident that our extensive sampling allowed us to describe the local Oyster Pond and lagoon population structure adequately, we cannot specifically identify the cause of this highly clonal structure. Infrequent and/or insufficient mixing with bacterial populations from the ocean could lead to a high degree of geographical isolation, resulting in the limited diversity observed. Population structure would then likely be temporally stable and vary between sites with different degrees of isolation. On the other hand, if transient blooms of specific CCs forming locally create the clonal structure observed, we would expect temporal instability. There is some evidence supporting the transient bloom hypothesis. Among the 17
V. cholerae CCs, only five consist of isolates from both August and September (CC1, CC2, CC3, CC5, and CC13), with the remainder found only in a single month. For the five CCs found in both months, their relative abundance varies considerably between months, indicating that temporal instability is likely, even on a short time scale of weeks or months (
Fig. 2C). Additionally, strong zooplankton association of specific CCs could even lead to diurnal fluctuation in observed diversity due to the migration of host animals.
V. cholerae clonal complexes have different spatial distributions.
V. cholerae as a species does not seem to display particular preferences for the environmental parameters explored in this study (i.e., filter size and location). However, the distribution of clonal complexes of strains from this species shows distinct spatial structuring (
Fig. 3). Because of the poor resolution of phylogenetic trees (a consequence of the close phylogenetic relationship between isolates in the population studied), standard methods of inferring statistically significant environmental distribution patterns for clades of bacteria, such as AdaptML (
22) or Ecosim (
46), could not be used. Instead, we opted for an approach where we treated each CC as a sample, comparing the similarity of CCs based on the spatial origin of isolates they contained (filter size and lagoon or pond). SIMPROF (
30) was then used to test whether the calculated Bray-Curtis dissimilarities between UPGMA-clustered CCs differed significantly from each other and from a random distribution. In order to avoid skewing of the data by the potential sampling of clonal expansions, we counted isolates of identical STs from the same origin/month as a single isolate.
This approach enabled us to find statistically significant differences in the environmental distribution of isolates from major
V. cholerae clonal complexes (
Fig. 4). CC1 and CC2 are mostly pond dwelling (90% and 75% of isolates were found in the pond, respectively), with CC1 relatively evenly distributed across size fractions but CC2 mostly found in the smaller size fractions (98% of strains <63 μm). CC3 also has few isolates found in the largest size fraction (96% of strains <63 μm) and is relatively evenly distributed between the pond and lagoon. Isolates from CC5 are predominantly found in the lagoon (95%), with little preference for a specific size fraction. CC13 only had a modest number of isolates (i.e., 12), but 10 of them were found in the >5-μm fractions. The abundance distribution among sample types is significantly different between CC1 and CC2, and both of them are significantly different from that of CC3, CC5, and CC13 (
Fig. 4). These data demonstrate that
V. cholerae clonal complexes can display significant differences in their spatial distribution, both in terms of fraction sizes or water reservoirs. The Oyster Pond and its lagoon share similar chemical parameters, with a slightly higher dissolved organic carbon level in the lagoon (see Table S8 in the supplemental material). The lagoon waters also generally exhibit higher salinity levels than the pond (5 to 10 ppt versus 0 to 5 ppt). CCs might differ in their growth rates at different salinities, although
V. cholerae displays species-wide tolerances to salinities far exceeding those found in this lagoon (
26,
47). A possible indirect influence of location in the different prevalence of these CCs is the composition of the prokaryotic microbiota of the pond and lagoon, as abundances of bacterial taxa have been shown to correlate stronger with each other than with abiotic factors or eukaryotes in marine environments (
48). Another factor which might influence the spatial distribution of clonal complexes is predation by phages. Bacteriophages have been found to play a role in the seasonality of cholera and might be a major factor influencing the abundance of specific clonal complexes in Oyster Pond and the lagoon (
49).
Different evolutionary trajectories for various clonal complexes.
In order to find the possible genetic determinants of the different spatial distribution found for some of the
V. cholerae CCs, we analyzed 20 genomes from the 5 dominant CCs recovered in sampling from both August and September (CC1, CC2, CC3, CC5, and CC13) that we sequenced in a previous study (
50). A 3,410,640-bp whole-genome alignment of these isolates displays 98.6% average pairwise nucleotide identity, with 120,730 phylogenetically informative sites. Within single CCs, any two genomes show between 17 and 124 SNPs, with the exception of CC2, in which multiple regions on both chromosomes 1 and 2 display elevated SNP density. In comparison, every pair of isolates from different clonal complexes differs from each other by approximately 48,000 to 61,000 SNPs (see Table S5 in the supplemental material).
A phylogenetic tree based on a core 2,246,831-bp alignment with multiple reference strains of
V. cholerae and
V. metoecus (
Fig. 5) shows an early well-supported node that separates clonal complexes more frequently found in the lagoon (CC3, CC5, and CC13) from CC1 and CC2, which both show a stronger association with the pond, similar to the UPGMA clustering based on sample origin. CC13 is found to be most closely related to
V. cholerae MZO-3, an O37 serogroup strain from Bangladesh. CC13 and MZO-3 together form the sister clade to pandemic
V. cholerae O1/O139. CC1 and CC2 are sister clades and are related most closely to environmental strains RC385 and VL426 (
V. cholerae bv. albensis).
Single CCs contained a number of genes not shared with members of other CCs (CC1, 113 genes; CC2, 94 genes; CC3, 218 genes; CC5, 103 genes; CC13, 229 genes). A large number of these genes were hypothetical, yet around a third could be placed into Clusters of Orthologous Groups (COG) categories and attributed putative functions (see Fig. S1 and Tables S6 and S7 in the supplemental material). Unique sets of type 6 secretion system (T6SS) effector proteins were also identified for each CC based on the nomenclature by Unterweger et al. (
51) (see Fig. S2 in the supplemental material).
For CC1, the most notable among these genetic differences (also displaying an obvious phenotypic effect) is the presence of the
luxCDABG operon responsible for bioluminescence in vibrios. Isolates in this clonal complex predominantly stem from pond waters and could be considered specialized to this type of environment. Previous studies in Chesapeake Bay, a brackish water habitat similar to Oyster Pond, have found the presence of bioluminescence in around 50% of all isolated strains and, based on clustering by phenotypic traits, hypothesized the presence of the
lux operon to be an ecologically relevant trait of environmental nontoxigenic branches in the phylogeny of
V. cholerae (
19,
52). The bioluminescence trait has been linked to the colonization of zooplankton, which in an illuminated state makes easy prey for visually oriented predators, thus enabling bioluminescent bacteria to invade the nutrient-rich gut regions of vertebrate predators (
19,
53). CC1 strains also harbor the ability to take up choline and convert it to betaine (through the action of the
betTIBA operon, shared with sister taxa CC2, RC385, and VL426). This osmoprotectant would be beneficial in coastal waters with varied salinity levels (
54).
CC2 shows the unique presence of the uronate isomerase genes
uxaC and
uxuA, as well as
uxuB and
uxuR, involved in the metabolism of
d-galacturonate and
d-glucuronate, respectively. Glucuronic acids are compounds produced in the liver of animals and often found in microbial lipopolysaccharides, making it a commonly used carbon source for a number of bacteria (
55). The activity of this gene cluster is confirmed by isolates of this CC being uniquely able to use glucuronic acid and glucuronamide as a carbon source (see Table S8 in the supplemental material). The ability to metabolize these substrates was previously identified as a trait differentiating
V. metoecus from
V. cholerae (
26). CC2 uronate utilization genes display nearly 100% identity on the protein level with orthologs from
V. metoecus (in which this gene cluster is part of the core genome), indicating a recent horizontal gene transfer event from that species.
CC13, together with its sister clade MZO-3, contains both a set of genes encoding a pilus as well as an ADP-ribosyltransferase toxin. These are reminiscent of the two principal virulence factors of pandemic O1/O139
V. cholerae, the toxin coregulated pilus and the cholera toxin (
56). The pilus and toxin found in CC13 display strongest similarity to the type 4 pilus and the heat-labile enterotoxin (LT)-A, both elements of diarrhea-causing enterotoxic
Escherichia coli (ETEC) (
57). Because most of their virulence factors are encoded on mobile genetic elements, any strain of
V. cholerae is theoretically able to become a pathogen, yet almost all strains responsible for epidemic outbreaks of cholera are found in the O1/O139 lineage, with a much smaller number of outbreaks attributed to other serogroups, such as O37 (
58). CC13 represents the closest relative to strains of O37 or O1/O139 serogroups in our data set (
Fig. 5). Out of the 12 CC13 strains isolated, 10 originated from particles of >5 μm. An ability to attach to zooplanktons and other chitinous organisms could increase their potential for virulence, as a single copepod can contain up to 10
4 V. cholerae cells (
6). CC13, O37, and O1/O139 strains might therefore form a clade of
V. cholerae with ancestral adaptations that predispose them to a pathogenic lifestyle (
58).
CC3 and CC5 both contain multiple capsular lipopolysaccharide (LPS)-related genes (several dozens in the case of CC3) of unclear origin in single genomic regions. Horizontal transfer of LPS gene clusters is a frequent occurrence in
V. cholerae (
8) and perhaps a way to evade phage predation, which is dependent on the attachment of virions to surface molecules (
12). In a system underlying negative-frequency-dependent selection by phages, so-called defense-strategists, which rise to high abundance not because of their ability to efficiently use resources but due to their immunity to phage predation, are thought to be able to stably coexist with well-adapted competition strategists (
59).
Is the type 6 secretion system shaping the population structure of V. cholerae?
V. cholerae prevents eukaryotic predation (
60) and kills other bacteria in a contact-dependent way by type 6 secretion system (T6SS)-mediated injection of a combination of toxins into target cells (
51). In
V. cholerae, three different loci encode T6SS toxin/immunity modules, coding for a toxin that is directly injected into target cells and a corresponding immunity protein that confers resistance against that particular toxin. Each distinct module has been assigned a one-letter code, leading to a three-letter designation for a specific strain. For example,
V. cholerae strains belonging to the O1/O139 pandemic lineage are all AAA. A difference at a single locus means that strains are incompatible and will likely kill each other (e.g., CAA versus AAA), as they do not possess an immunity protein against the different toxins they each produce. Even if two strains possess the same toxin/immunity modules, differences in the sequences of the toxins and/or immunity proteins could make strains incompatible, a relationship expressed by a number subscript (e.g., CE
1B versus CE
2B). All strains in major clonal complexes found in Oyster Pond and the lagoon belong to the same compatibility groups, but none of the clonal complexes are compatible with each other (CC1 [CAG], CC2 [CE
1B], CC3 [CE
2B], CC5 [CDC], and CC13 [CAA]). This presents an additional possibility to explain the clonality of the
V. cholerae population and how multiple
V. cholerae clades with at least partially overlapping ecological niches can be sustained in a single environment: a community of
V. cholerae of the same clonal complex could conceivably monopolize the resources around them by killing incompatible invaders of the same species, even those that might theoretically be better adapted to life in that particular niche. Uptake of foreign DNA (natural competence is coregulated with T6SS expression [
61]) from killed cells of other clonal complexes could then provide a means of quick adaptation by horizontal gene transfer (or an additional source of nutrients). The ephemeral small patches of resources prevalent in the aquatic environment of vibrios (
62) would be particularly suited for such a system, as it could allow a limited number clonal complexes to grow to high densities, effectively exclude latecomers, use up all present nutrients, and then continue to spread to other resources. This process could lead to a population dominated by a relatively small number of competing strains, with an early colonizer of resources blooming for a period of time until their number is reduced by external factors, such as phage predation.
Different habitats and evolutionary dynamics for V. cholerae and V. metoecus.
While our sampling effort was primarily aimed at V. cholerae, we also gathered 42 isolates of the closely related species V. metoecus, organized in 7 CCs and 32 STs (including 13 singletons). V. metoecus is rarely isolated (in fact, today, it has only been found in two environmental sites on the U.S. East Coast and a few clinical samples) and could simply be more rare than V. cholerae. The effect of different isolation regimes on the recovery of V. metoecus is unknown, and thus, a potential bias in isolation cannot be ruled out. The lack of sampling depth also makes it difficult to directly compare the population structure of V. metoecus with V. cholerae. Nonetheless, notable differences were observed between these two species.
V. cholerae, as a species, was isolated equally from the lagoon and pond.
V. metoecus, however, which shares most of its phenotypic characteristics with
V. cholerae (
26), is found predominantly in the lagoon water (92% of isolates), suggesting an ecological differentiation at the species level (
Fig. 3).
V. metoecus clades also display notably higher phylogenetic resolution for both whole-genome-based phylogenies and MLST (
Fig. 5; see Fig. S3 in the supplemental material). The reason for this might lie in differential homologous recombination dynamics in those two species. An analysis of recombination/mutation rates using ClonalFrame (
31) determined an r/m ratio of 1.6 for both
V. metoecus and
V. cholerae, while rho/theta was 0.49 for
V. cholerae and only 0.22 for
V. metoecus. This indicates that less than half the number of recombination events in
V. metoecus than in
V. cholerae account for the same amount of introduced nucleotides. We have previously noted that
V. metoecus receives considerably more DNA from
V. cholerae than vice versa, presumably due to their sympatric occurrence, where
V. cholerae is the most abundant donor of nucleic acids (
50). A situation where large
V. cholerae populations cooccur with smaller
V. metoecus populations could lead to a dynamic where the rarer
V. metoecus more often takes up distantly related DNA, while
V. cholerae predominantly exchanges DNA with the more abundant members of its own species.