INTRODUCTION
Cyanobacterial harmful algal blooms (cyanoHABs) dominated by
Microcystis spp. are globally distributed and reported on every continent except Antarctica (
1). These toxic blooms are expected to expand in frequency and severity due to eutrophication, rising atmospheric CO
2, and continued human-induced global warming (
2–4).
Microcystis has a complex genome that encodes many diverse, potentially harmful secondary metabolites (
5–9). These compounds can contribute to an array of negative outcomes, including impacts on human and environmental health (
10,
11), threats to potable water (
12), and disruption of ecosystem function (
2).
Microcystis is capable of producing the hepatotoxin microcystin (MC), a cyclic heptapeptide with over 270 reported structural variants, termed congeners (
13). It is generated by a nonribosomal peptide synthetase multienzyme complex, which is encoded by a set of 10 genes in the bidirectional
mcy operon (
14,
15). This gene cluster is ancient, having originated from a common ancestor of several cyanobacteria genera, including
Microcystis,
Dolichospermum, and
Planktothrix (
16). The
mcy operon also has a polyphyletic distribution within the genus
Microcystis, providing evidence for multiple independent
mcy gene loss events (
16–18). Due to its prevalence and potential toxicity, the expression, regulation, and secretion of microcystin have been studied extensively in locations around the world (
1,
6,
8,
19,
20).
The
mcy operon displays extensive sequence diversity (
14,
21,
22), and each allele is thought to encode distinct MC congeners (
21–24). However, many other factors influence the structural variations of microcystins, including the availability of amino acids in cells (
25), the flexibility of amino acid adenylation domains comprising the biosynthetic enzyme complex (
21), nitrogen form and availability (
26), and C-to-N ratios of available nutrients (
27). The genes
mcyA,
mcyB, and
mcyC demonstrate the highest level of variation and a “mosaic”-like structure (
21). Frequent recombination events in these genes may lead to unique gene sequences that encode distinct MC congeners, such as the replacement of the
mcyB B1 domain with the
mcyC C1 domain (see
Fig. 1 for examples), which leads to the production of at least two MC congeners, microcystin-LR, and microcystin-RR (
22).
While numerous studies have addressed how environmental variables impact congener production under controlled conditions (
21,
22,
26), few have assessed the variation of the
mcy operon structure and sequence in natural populations. Genetic research has largely focused on laboratory studies of
Microcystis cultures aimed at understanding the
mcy operon structure and the production of microcystin (
21,
22,
24,
28). Field studies show that blooms consist of nontoxic and toxic strains (
29) but have relied on PCR or consensus sequences from metagenomic assemblies (
30), both of which may fail to detect important population-level variation. Mapping of metagenomic and metatranscriptomic sequence reads to reference genomes yields valuable insights into the genomic diversity and functional activity of
Microcystis (
31–33). However, these methods have not yet been widely applied to seasonal strain successions of natural
Microcystis populations, which are an important control on bloom toxicity (
18). Thus, the true environmental diversity of
Microcystis populations remains poorly understood.
Here, we studied the diversity of the
mcy operon in a natural
Microcystis population in western Lake Erie, which experiences annual cyanoHABs (
34). The western basin is the shallowest, warmest, and most populated of the Laurentian Great Lakes and provides drinking water to 11 million people as well as fishing and other recreation, which have substantial economic value (Western Lake Erie Basin Project, NRCS). In August 2014, Toledo, OH, experienced a drinking water crisis that left its nearly half a million residents without potable water as a result of microcystin concentrations exceeding the World Health Organization guideline of 1 microgram/L (
12,
35). The 2014 western Lake Erie bloom was characterized by an early toxic phase and a late nontoxic phase that correlated with a succession of
Microcystis strains (
35,
36). This pattern of succession has been observed worldwide but is not fully understood with respect to environmental or ecological drivers (
18).
We collected samples from Lake Erie during cyanoHABs in 2014 and 2018. Sampling of blooms during 2014 (
35), 2018 (
37), and 2018 via an autonomous 3rd-generation environmental sample processor housed within a long range-autonomous underwater vehicle (3G-ESP LRAUV) (P. A. Den Uyl, S. R. Chaganti, L. R. Thompson, R. M. Errera, C. M. Preston, W. Ussler III, C. E. Yancey, J. M. Birch, S. A. Ruberg, G. J. Doucette, G. J. Dick, C. A. Scholin, and K. D. Goodwin, submitted for publication) provided seasonal and spatial variation, high spatial resolution, and high temporal resolution, respectively. The 2014 cyanoHAB data provided insights into spatial and temporal shifts from weekly samples collected from three stations from July to October (
35). The 2018 “HABs Grab” data set achieved higher spatial resolution via 25 samples collected from various points in the lake on the same day (
37). The 2018 3G-ESP captured high temporal resolution through autonomous sample collection over a 2-week period (Den Uyl et al., submitted). Analysis of the
mcy genes and strain diversity in these data sets was used to assess novel genotypes and their relative frequencies as well as high genomic variation across time and space within the western basin and determine the relationships between genotypes and environmental variables to determine patterns of ecological strain succession.
DISCUSSION
We used metagenomic and metatranscriptomic data to reveal the dynamics of
Microcystis strains across a variety of spatial and seasonal scales in 2014 and 2018 blooms in western Lake Erie. The 2014 bloom presented the opportunity to investigate a seasonal succession of
Microcystis strains alongside changing environmental conditions (
35). An early toxic phase of the bloom (i.e., high concentrations of MCs) that coincided with high nitrate (nitrate-N > 400 μg/L) conditions (
41) gave way to a nontoxic phase of the bloom with nitrate-depleted conditions (nitrate-N < 200 μg/L later in the season) (
Fig. 2 and
Table 1). In addition to the widely known toxigenic and nontoxigenic
Microcystis genotypes (all
mcy genes present and absent, respectively), a genotype with a partial set of genes (
mcyA–C) was found to be abundant and expressed in the nontoxic phase of the bloom.
The significant correlations between
mcy genotype relative abundance estimates and concentrations of ammonium and nitrate and pH (
Fig. 2 and
Table 1) suggest that nitrogen availability and the photosynthetic rate of the bloom, which drives extreme changes in pH (
33), may influence the competitive fitness of each genotype and thus shape the
Microcystis strain composition in the blooms.
Microcystis also showed microdiversity across the whole genome. Nucleotide similarity between samples was clustered by bloom phase (
Fig. 4), indicating that discrete subpopulations (
7) may be ecologically distinct as well. Based on known mutation rates, genetic divergence between subpopulations was estimated to require at least thousands of years, indicating that the observed succession involved primarily ecological selection rather than the evolution of genotypes within the season.
Our approach for estimating the relative abundances of genotypes through coverage ratios of metagenomic sequence reads that mapped to the
mcy gene and the 16S rRNA gene was subject to several uncertainties. First, whereas it is commonly assumed that
Microcystis genomes contain one copy of the
mcy and rRNA operons (
42,
43), all nine closed genomes of
Microcystis that were publicly available have two copies of the 16S rRNA gene and one copy of the
mcy operon (
44). The gene copy numbers of
mcy and 16S rRNA genes in native populations are uncertain (
45) and may vary from strain to strain. Variation in coverage and, thus, our estimates of relative abundance could be affected by both the
mcy gene copy number per genome as well as the relative abundance of cells with the various
mcy genotypes. Second, rapid genome replication can result in gene dose effects wherein genes near the origin of replication are present in higher copy numbers than those at the terminus (
46–48). The presence of
mcy genes near the origin of replication or multiple copies of
mcy genes per genome may explain our reported coverage ratio value of 1.58 at WE2 on 4 August 2014 (
Fig. 2), but the available data cannot determine whether these factors were actually at play in these western Lake Erie populations. Third, the observed gene counts could be affected by nonspecific matches to V4 16S rRNA and
mcy genes from other cyanobacteria (e.g.,
Dolichospermum) or homologous biosynthetic genes. However, we used stringent thresholds and aligned all mapped
mcy and 16S rRNA gene sequence reads against a custom universal database and the Silva SSU DB v.138 database, respectively, to confirm that there was no significant spurious mapping. Despite these limitations, the relative abundances measured here track major shifts in genotypes across spatiotemporal scales, and our values are consistent with previous findings in western Lake Erie blooms (
29). While the read-level analysis conducted here has long been used to study the population-level diversity of microbial communities (
49,
50), it has not been widely applied to studies of cyanobacterial blooms.
The results from this study reveal that natural
Microcystis populations consist of multiple
mcy genotypes, some of which have been described previously from studies of pure cultures, such as the complete and absent genotypes (
21,
22,
51). In this study, the
Microcystis population was dominated at times by genotypes that lack
mcy genes altogether (throughout 2018) (
Fig. 3) or by a genotype in which only the
mcyA–C genes were present (late September in 2014) (
Fig. 2). This partial
mcy genotype is present in just 1 of 159 publicly available
Microcystis genomes derived from culture (PCC 9717; NCBI assembly ASM31216v1), which was isolated from a water dam in Rochereau, La Vendée, France. To our knowledge, this partial genotype has been briefly mentioned only once in the literature, by Pearson et al. (
6), who identified it in a PCC 9717 culture. Interestingly, we detected the partial genotype across multiple locations and dates spanning 4 years (
Fig. 2 and
3), indicating that it is a persistent member of the
Microcystis bloom community in Lake Erie. The partial genotype was the most abundant during the secondary bloom phase of 2014 where conditions were nutrient depleted (
29,
52). This persistence and occasional dominance, together with the active transcription of genes from this partial operon (
Fig. 5), suggest that this strain is ecologically successful and that the partial
mcy operon may be functional. The significant negative correlation of the partial genotype ratio estimate with both nitrate and ammonium concentrations (
Table 1) suggests that it is adapted to conditions of low N. Experiments on pure cultures will be helpful in addressing open questions on the ecology, biosynthetic potential, and toxigenicity of this partial
mcy genotype.
The complete
mcy genotype often dominated in the early stages of the bloom under nitrate-replete conditions (
Fig. 2A and
C). Only the complete
mcy genotype coverage ratio was positively correlated with nitrate concentrations (
Table 1). This observation was consistent with microcystin-producing strains having a higher demand for nitrogen (
41,
52,
53) due to the high N content of microcystin metabolites as predicted by nutrient stoichiometry theory (
54,
55). Similarly, only the complete
mcy genotype was positively correlated with pH (
Table 1). This finding suggests that MC-producing strains were associated with increased photosynthetic rates and/or high pH relative to non-MC-producing strains. Increasing photosynthetic rates greatly increase the pH and serve as a good proxy for bloom activity (
56,
57). This agrees with previous results showing that faster-growing strains tend to be MC producers (
54) and that shifts in
Microcystis genotypes correlated strongly with pH, with potential links to carbon acquisition and concentrating mechanisms (
33). Lower pH generally indicates cyanoHAB communities dominated by slow growth and maintenance, which may be a specialty of nontoxigenic strains (
54,
57).
Our results show that several different
mcy variants, reflecting the mosaic structure of the
mcy operon identified in pure cultures, co-occur in natural environments. The study by Mikalsen et al. (
22) demonstrated that the sequence of the
mcyB1 domain (e.g., either a B1-like or a C1-like genotype) influences the MC congener produced within cultured isolates.
Microcystis strains with the B1-like genotype produced LR isoforms, while those with C1-like genotypes produced both LR and RR isoforms (
22). We detected both the B1 and C1 genotypes in western Lake Erie populations, providing genetic context for the production of MC-LR and -RR, two of the most common congeners in this system (
37,
58). The LR- and RR-producing C1 variant was the most abundant form of the
mcyB gene in western Lake Erie during both 2014 (
Fig. 2) and 2018 (
Fig. 3) as well as 2006 (
59), showing that these genotypes are present interannually. While our study focused on
mcyB variants, it is likely that other genotypes are present in these populations, especially within the hyperdynamic
mcyABC region.
Given the dynamic nature of
Microcystis genomes (
5) and evidence of genome rearrangement in response to environmental conditions (
31), one competing hypothesis explaining seasonal trends in genotypic diversity is active horizontal gene transfer (HGT) or loss of
mcy genes during the season. However, the distinct clustering of samples by bloom phase based on genome-wide conANI scores (
Fig. 4) suggested that the seasonal dynamics of
mcy genotypes were due to shifts in the abundances of
Microcystis strains rather than ongoing evolution at the time scale of this study. If the differential coverage ratios of
mcy genotypes were due to active HGT and/or gene loss, the same strains would be present at all times, and the conANI would be at least 99.999% (
39); in contrast, we found that the values ranged from 97.3014% to 99.7936% (see Table S2 in the supplemental material). We conclude that the samples contained distinct strains that shifted in abundance over the course of the bloom and diverged on the order of thousands of to several million years based on whole-genome analysis and a wide range of mutation accumulation rates (
Table 2). While conANI results varied across genes, as expected due to different evolutionary histories and forces (
6,
21–23), even the shortest estimated divergence times were on the scale of hundreds to thousands of years. Rather than HGT, the results from this study suggested that specific genotypes were favored during the bloom according to shifting niche space, with an emphasis on nitrogen form and availability as well as pH (
Table 1).
The co-occurrence of various
mcy genotypes in natural cyanobacterial bloom communities highlights that genotypes commonly observed in culture (
21,
22,
51) as well as novel genotypes that have scarcely been mentioned in the literature (
6) are common in natural populations. Our results show that this genotype diversity shifts dynamically in time and space and suggest that it is likely responsible in part, along with the direct influence of environmental conditions, for the diversity of structural variants of MC observed in nature. The presence of an abundant, transcriptionally active, partial
mcy operon of unknown biosynthetic potential merits further investigation. Its presence further highlights the potential pitfalls of using single
mcy genes to assess MC-producing strains, as
mcyA–C will not distinguish partial and complete genotypes, and the use of
mcyD or
mcyE will not capture the presence of the partial genotype. For example, the Phytotoxigene (Akron, OH, USA) quantitative PCR (qPCR) screening kit, which is commonly used to detect MC-producing potential, detects only the
mcyE gene. Such an approach would not detect the partial operon identified in this study. Overall, this study shows the utility of shotgun omics approaches for resolving known and novel
mcy genotypes that can dominate natural blooms with the potential for new phenotypic outcomes of concern to human and animal health.