INTRODUCTION
A central goal of microbial ecology is to link microbial distribution patterns to underlying ecological processes. Developing such links is important both for fundamental science and applied outcomes, for example to make accurate global biodiversity assessments and prioritize management goals in the face of both local and global change (
1,
2). However, achieving this critically depends on our abilities to adequately characterize biodiversity at the first stage, with various methodological and theoretical challenges limiting our understanding of microbial distribution patterns and their underlying ecological drivers.
Several principles have nevertheless become established in soil microbial ecology through cultivation-independent studies over the last two decades. First, it is appreciated that most soils harbor rich and abundant bacterial communities (
3–6). In most soils, a small number of taxa are abundant and prevalent, while the remaining taxa have low abundance and frequency (the “rare” biosphere) (
7,
8). In common with macroorganisms (
9), four key ecological processes control microbial assembly across space and time: environmental selection, diversification, dispersal, and drift (
10–12). While much work has emphasized the role of deterministic environmental selection in driving bacterial niche differentiation, especially edaphic factors such as pH (
13–17), some studies have also inferred stochastic patterns of community structure, for example due to dispersal limitation or historical diversification (
17–21). The relative strength of these factors can vary across time, for example with dispersal controlling recruitment and selection affecting retention during initial stages of primary succession (
17,
22–24). As is also the case in the field of macroecology, the relative importance of deterministic and stochastic processes in shaping contemporary distributions of microorganisms continues to be debated and there is a large body of often divergent literature in this area. In this regard, a major methodological challenge is to perform sampling and analysis that sufficiently disentangles the autocorrelation between environmental and spatial factors in soil ecosystems (
12,
25–27).
Also controversial is the extent to which microbial communities vary across space. Soil bacteria are generally thought to exhibit weaker biogeographic patterns than macroorganisms (
15,
28). Most empirical studies have reported low exponents for taxa-area relationships (
13,
15,
29–31) and low regression coefficients in distance decay curves (
13,
19,
28,
32,
33), though exceptions have been reported (
34–37). Several hypotheses have been put forward to explain these observations (
38,
39). Primarily, bacteria are thought to be able to maintain wide geographic ranges in the face of environmental variation by entering dormant states (
39,
40), leading to limited geographic turnover and shallow taxon-area curves (
15,
28,
41). However, methodological artifacts may also account for some observations of weak spatial differences (
28). Microbial biogeographic patterns are known to be sensitive to various factors, including spatial scale (
42,
43), sampling effort (
28,
44–46), and taxonomic resolution (
15,
28,
44,
47–49). Communities are inherently prone to being undersampled, whether through insufficient sampling effort, low sequencing depth, or rarefying data (
50,
51). In addition, the processing of 16S rRNA gene amplicon sequencing data typically used to profile communities can reduce data set resolution; reads are usually clustered into operational taxonomic units (OTUs) based on an arbitrary identity threshold (usually 97%), and the rare biosphere is regularly removed (
52,
53). Compounding these issues, the pairwise analyses generally used to quantify community turnover inadequately partition variation from all community members: incidence-based measures are highly sensitive to the rare biosphere, and abundance-based measures focus on the common few (
54,
55).
In this study, we employed three methodological innovations to address these common limitations of microbial biogeographic surveys and reassess patterns of bacterial community turnover. First, we adopted a hierarchical sampling scheme commonly used in macroecological surveys (
56,
57); this enabled us to detect changes in community structure across multiple spatial scales and, in light of controversies in the literature, better distinguish the contributions of environmental and spatial drivers to community assembly processes (
27). Second, we profiled community composition using high-resolution 16S rRNA gene amplicon sequence variants (ASVs), leveraging a new generation of sequence processing tools (
58–60). We compared the effects of the commonly used approaches of filtering and clustering sequences on calculated community turnover; this is important given that clustering sequences reduces taxonomic resolution and thus may increase the overall similarity of the community, thereby weakening biogeographic patterns (
35,
49). Finally, we used the multisite diversity metric zeta diversity to analyze spatial community turnover and predict the strength of underlying deterministic and stochastic drivers (
55). Unlike the commonly used beta diversity that is calculated from pairwise comparisons, zeta diversity describes the number of taxa shared across multiple sites. As a result, this parameter can discriminate diversity patterns across the spectrum of common, intermediate, and rare taxa (
55,
61–63) and infer deterministic and stochastic drivers of community assembly. On this basis, we provide evidence that at the level of exact sequences, bacterial biogeographic patterns are exceptionally stronger than previously reported.
DISCUSSION
In this study, we analyze patterns and drivers of soil microbial composition across multiple scales. Steps were taken to overcome common limitations in microbial biogeographical studies by leveraging innovations in sampling design, amplicon processing, and diversity metrics. We found the following. (i) Soil bacterial communities exhibit strong biogeographic patterns. (ii) Spatial turnover is rapid, as most taxa have low to moderate levels of occupancy. (iii) Community structure is influenced more by niche differentiation due to environmental variation rather than dispersal limitation. Our findings agree with previous literature that reported the uneven distribution of bacteria across communities and the strong influence of deterministic drivers (
3,
12). However, we observed much stronger spatial turnover than reported in most, though not all, previous literature (
13,
15,
28). This is reflected by the compatible findings of four independent analyses using the original high-resolution data set. Occupancy frequency distributions revealed most taxa were shared across less than 10% of samples (
Fig. 1). Through zeta decline analysis, we detected a logarithmic decrease in the number of taxa shared as the number of sites increased (
Fig. 3). In addition, we observed strong distance decay (
Fig. 4a) and taxon-area relationships (
Fig. 4c), with
z values one to two orders of magnitude higher than most previous observations (
13,
29,
34,
35,
37).
Multiple factors may explain why we observed high environmentally driven turnover. These potentially include the choices of sampling site, sampling scheme, sequence processing, and downstream analyses. It is notable that our desert sampling sites contained loessial soils that facilitate dispersal and the regional transect contained high environmental heterogeneity, which is known to be associated with increased bacterial turnover (
31,
66,
69); however, this is unlikely to primarily account for most discrepancies with previous literature, given that rapid turnover was also observed in the local transect where physicochemical variation was lower and similar findings were also made in the global analysis. A more significant factor may be that our study adopted a hierarchical sampling design in order to quantify microbial variation across multiple spatial scales. In this regard, it is well-recognized that sampling design and sample size are critical determinants of taxa-area relationships (
44,
74); this reflects that the detection of rare taxa largely determines species evenness and spatial structure, which in turn affects the exponent
z (
35). Methodological advances that improve the detection and inclusion of rare taxa are therefore predicted to align microbial
z values more closely with those reported for animal and plant communities (
44,
48). it is notable that other studies reporting high taxon-area exponents also used spatially explicit hierarchical designs (
34,
35).
However, the biggest factor likely underlying these discrepancies is the treatment of sequencing data. A pervasive feature of 16S rRNA gene amplicon surveys is the clustering of similar sequences to remove potential “noise” and, less commonly, the filtering or undersampling of low-frequency sequences that constitute the rare biosphere. As summarized in
Fig. 5, such processing greatly reduces and distorts the information in data sets, obscuring patterns in occupancy, turnover, and drivers. We avoided such downfalls by using a recently developed denoising algorithm to resolve sequence variants (
59), while confirming through rarefaction curves that our sequencing efforts captured most rare taxa within and between samples. Through simulating sequence processing, we observed major differences in occupancy frequency, zeta diversity, distance decay, and taxon-area relationships upon filtering rare taxa and, to a lesser extent, clustering similar sequences (
Fig. 5). It should be noted that these observations may appear to conflict with those of a recent study that reported clustering did not “change the rate of microbial taxonomic turnover” (
28). However, this may be an issue of interpretation of distance decay curves. In common with this study (
28), we also observed that the distance decay coefficient of bacteria and archaea remains similar between taxonomic resolutions, reflecting similar observations reported in fungal (
75) and plant (
76) communities. However, as the community similarity (
y intercept) is lower at higher resolution, a higher proportion of taxa are lost overall in unclustered data sets compared to clustered data sets. Thus, it is reasonable to conclude that clustering masks microbial taxonomic turnover and broader biogeographic patterns.
This study also highlights the different patterns and drivers of community turnover between rare, common, and intermediate community members. As demonstrated by the occupancy frequency distribution, filtering sequences removes most rare species and retains most common ones. On the basis of the results of beta diversity analysis, we observed significant differences in the proportion of variation assigned to environmental, spatial, shared, or unexplained components at different taxonomic resolutions. This agrees with recent reports that environmental and spatial drivers differentially act on common and rare taxa (
77,
78). Abundant generalists and rare specialists have been shown to differentially respond to environmental change, reflecting differences in niche breadth (
11,
79,
80). Beyond these pairwise observations, we used zeta diversity to demonstrate that the turnover patterns reflect those typically observed in deterministically structured communities. Zeta decline consistently follows a power law, which indicates that communities are nonrandomly structured such as those with clear niche or range differentiation. However, upon lowering taxonomic resolution, these patterns degrade and increasingly resemble stochastic patterns such as seen in habitats with strong aeolian forces or aquatic flows (
Fig. 5). These findings suggest that at lower taxonomic resolutions (
49) or when rare taxa are removed (
35), the community structure becomes more similar and thus predicted assembly processes switch from deterministic to stochastic. Through incorporating a multisite distance decay model, significant differences in the spatial structure of rare, intermediate, and common taxa could also be detected.
Looking forward, this work demonstrates how microbial biogeography can be advanced using readily implementable approaches. There is scope to use the methodological and theoretical innovations shown here to investigate these patterns across a broader range of environments and temporal scales. Detailed studies are needed to better capture the biotic and abiotic subsets of drivers responsible for changes in community turnover across all occupancy classes; this has been achieved in plant ecology (
61,
81) but is currently lacking in microbial ecology. Likewise, it is critical to compare the patterns and drivers of community turnover in parallel for microorganisms and macroorganisms. Indeed, a key observation of our study is that the
z exponent for the taxon-area relationships microbial communities falls within the interquartile range of higher animal and plant communities, suggesting that microorganisms and macroorganisms exhibit similarly strong spatial structure. However, given that these exponents are highly sensitive to factors such as sampling design, sample size, and taxonomic resolution (
44,
74), a rigorous comparison of turnover between domains requires side-by-side sampling. Finally, emerging advances in long-read and full-length 16S rRNA gene sequencing may enable resolution of biogeographic patterns of microorganisms at both the species and strain levels (
82,
83).
ACKNOWLEDGMENTS
We thank Ya-Jou Chen and Thanavit Jirapanjawat for technical advice, Capucine Baubin, Dimitri Meier, and Stefanie Imminger for field assistance, Andrew Holmes for helpful discussions, and Philip Hugenholtz and David Waite for metagenome sequencing.
This work was primarily funded by a Monash University and Ben Gurion University of the Negev Seed Fund (awarded to C.G., O.G., and S.L.C.). It was supported by an ARC DECRA Fellowship (DE170100310; awarded to C.G.), an ARC Discovery Project Grant (DP170101046; awarded to S.L.C. and M.A.M.), a Holsworth Wildlife Research Endowment (awarded to S.K.B.), a Monash University PhD Scholarship (awarded to S.K.B.), and an NHMRC EL2 Fellowship (APP1178715; salary for C.G.).
C.G., S.L.C., M.A.M., and O.G. conceived and supervised this study. S.K.B., C.G., M.A.M., S.L.C., and O.G. designed the study. S.K.B., O.G., and N.W. were responsible for field sampling. S.K.B. carried out all DNA extractions, sequence processing, and statistical analyses. S.K.B., C.G., M.A.M., S.L.C., D.J.B., and D.J.P. analyzed data. S.K.B. and C.G. wrote and edited the paper with input from all authors.
We declare that we have no conflicts of interest.