INTRODUCTION
The human fecal microbial community serves as a proxy for the human gut community, which exhibits considerable diversity and variability among individuals (
1–3). Human microbiome data sets show that most human gut communities share specific functional gene profiles (
4–6) rather than a single core set of microbial species (
3,
7). Functional similarity coupled with taxonomic variability indicates niche overlap among taxa, which in the gut microbiome of healthy individuals reflects taxonomically distinct sets of cooccurring taxa or enterotypes (
8) that contain similar functional gene profiles. Despite interindividual taxonomic variability, studies that include multiple samples have identified correlations between functional gene composition and taxonomic composition (
5,
9,
10) and relationships between human characteristics and the gut microbial composition. For example, the gut community from an individual is more similar to itself through time than to samples collected from other individuals (
7,
11,
12). Marked shifts in the gut microbial communities are also reported for the very young and very old (
9,
12–14), for healthy versus disease states (
15–17), across different diet regimes (
6,
10,
18,
19), and in culturally isolated human populations (
9,
20). The coherence of the gut microbial community within individuals and among individuals with specific characteristics suggests that gut communities maintain relatively stable equilibrium states (
7). If the gut community composition tracks human characteristics, then identifying the community members or community states that differ across human population boundaries could lead to an improved understanding of how these communities influence human health.
Sampling individuals has proven to be an effective approach for identifying gut microbial community patterns that associate with human health states. However, large variation among gut microbiomes and the expense of sequencing libraries from many individuals limit the efficacy of microbial community comparisons from human populations over different demographic scales, e.g., city, country, or continent. Previously, we demonstrated that highly prevalent
Lachnospiraceae organisms in a human fecal data set were the most abundant in a sewage influent data set (
21) and that a single sewage sample harbored most of the Blautia sequence diversity identified in 10 human fecal samples (
22). We hypothesize that comparison of untreated sewage samples might provide a means to assess the human fecal microbiome and by proxy the gut microbiome within and among human populations. Here, we systematically compare bacterial 16S rRNA gene profiles from healthy adult stool samples generated by the Human Microbiome Project (
5) to the community profiles of >200 sewage influent samples collected from 71 U.S. cities. We used oligotyping (
23), a computational method that uses positional Shannon entropy scores to decompose sequencing data into highly refined sequence-based units that make possible sensitive assessments of beta diversity. From these data, we asked (i) whether sewage influent accurately reflects a composite fecal microbiome from human populations, (ii) if “core” fecal organisms or other community trends exist across U.S. cities, and (iii) whether sewage influent microbial communities correlate with human demographic patterns.
DISCUSSION
Large populations with highly variable phenotypic characters (e.g., human weight and flower color) will include a greater number of variants with more even distributions than small populations. A character variant that is common among individuals in these populations will be abundant in population-level assessments and highly prevalent among populations. However, unlike weight or flower color, where a single variant represents each individual at a given moment in time, microbiomes encompass hundreds to thousands of different kinds of microorganisms or operational taxonomic units (OTUs) that collectively define the character variant of an individual. Although this complexity confounds identification of community states (i.e., variants) that differentiate between individuals (
7), it should not affect the expected distribution of community members in a population-level sample.
In support of this concept, we found that (i) the population-level (sewage) samples recaptured the majority (97%) of the oligotypes from individual stool samples, (ii) a pooled data set of human stool and sewage samples exhibited highly similar oligotype distribution patterns, (iii) sewage samples had higher richness and diversity than stool samples, and (iv) oligotypes that were more prevalent among individuals were more prevalent and more abundant in sewage. We infer that sewage influent represents the composite fecal microbiomes of many individuals and provides a metric to assess the relationship of these population-level microbial distributions with large-scale patterns in human demographics.
The complex environment of municipal sewer systems receives water and the associated microbial mélange from multiple sources, including gray water, human stools, and in some systems surface runoff. In our data set, sequences that made up on average 78% of a stool sample comprised ~12% of a sewage sample. By scaling this ratio to 100%, we estimate that only 15% of the amplicons in a typical sewage sample originate from human stool. Analyses restricted to oligotypes from nonfecal community members exhibited strong community composition relationships to geography-related differences among cities (
Fig. 4). Without the human microbiome data sets to focus our analysis on human stool sequences in sewage and oligotyping to differentiate closely related sequence variants, the nonfecal community distribution patterns would have overprinted the signal from the stool samples.
Previous comparisons of individual gut microbiomes demonstrate high community composition variability among individuals and that no single core set of bacterial species dominates all human guts (
7,
26). By sampling sewage, we find that U.S. populations have a much less variable fecal bacterial community composition than that of individuals. This community composition convergence among populations suggests that a finite level of composition variability is present, at least among U.S. populations, and this variability can be overcome with large sample sizes to make meaningful inferences about the gut microbiome. From the sewage sampling, we also identified a set of “core” bacteria that are both common to and abundant in U.S. populations. Although no single species dominates the fecal microbial communities among individuals, our results demonstrate consistent differential abundance in human populations for some bacterial taxa over others. Previous attempts to classify core species, using a >50% occurrence among adult individuals as the definition of core, identified
Faecalibacterium prausnitzii,
Roseburia intestinalis,
Bacteroides vulgatus,
Bacteroides uniformis,
Eubacterium rectale, and
Ruminococcus bromii among other undescribed species as primary members (
4,
26). Except for
Roseburia intestinalis, each of these species matched one of our core oligotypes. We also defined another 21 oligotypes as core members, most of which resolved to various Bacteroides spp. or
Lachnospiraceae genera (see
Table S2 in the supplemental material). The high representation of Bacteroides in the sewage samples is consistent with reports that adults from the United States have higher abundances of the genus Bacteroides than do people from non-Westernized societies (
9). Since the core oligotypes were present and abundant in nearly all U.S. sewage samples, we hypothesize that these organisms represent a signature for U.S. populations that can differentiate between human gut communities from other parts of the world.
Dominance of a bacterial species in the human population may reflect its functional importance in the metabolic capacity of human guts. Despite the wide taxonomic range of the core organisms identified here and the high bacterial community variability among individuals, the functional gene composition among human gut microbiomes is fairly consistent (
4,
5). Niche overlap among various members of the gut community might explain this functional consistency without requiring nearly identical microbiome compositions (
7). Typically, the most abundant oligotype in a stool sample was one of the 27 core oligotypes (117 of 137 samples). The ubiquity of the core organisms in human populations and frequent dominance in individuals make these core organisms strong candidates for exploring the functional trade-offs among prominent gut bacteria and the differences that relate to human health or define stable community states.
Sewage sampling also described distinct community compositions among U.S. populations. Samples differed primarily by the increased representation of oligotypes from
Bacteroidaceae,
Prevotellaceae, or
Lachnospiraceae/
Ruminococcaceae over the other two family groups (see
Fig. S5 in the supplemental material). This result resembles earlier enterotype analyses (
8) and the concept that changes in dominance between taxa in these families play an important role in structuring gut communities (
7). Twenty-one of 51 cities were enriched for the same bacterial family group across all three sampling periods. Although not a majority, this level of community consistency signifies that human populations at the citywide scale can have characteristic microbial community compositions.
Although we did not identify the ultimate causes of the bacterial community composition differences among U.S. cities, our single measure of lifestyle differences for individuals in these cities (obesity percent) explained a significant, albeit small, proportion of the community variation. Lifestyle differences can reproducibly alter the human gut microbiome (
27), and microbial community composition is a known indicator of obesity (
28–30), with up to 90% predictive accuracy for individuals (
31). We observed that the obesity signal in an individual's gut microbial community composition scaled up, with nearly equivalent predictive capabilities (81 to 89% accuracy), to the level of human populations in cities. These community composition relationships to the population obesity gradient were driven in large part by increased representation of Bacteroides spp. and decreased representation of Faecalibacterium spp. in more obese populations (see
Fig. S6 in the supplemental material). Bacteroides spp. have been found to increase in abundance in humans consuming a high-animal-fat diet (
19) and are associated with low-diversity proinflammatory gut communities, while Faecalibacterium spp. are more prevalent in high-diversity anti-inflammatory gut communities (
32). Given the relatively minor difference in population obesity percentage (as low as 9%) between city populations considered lean and obese, the observed correlations between obesity and the microbial community in sewage might reflect other, more pronounced lifestyle differences in these cities, including the influence of diet on gut microbial communities (
6,
18,
19).
In summary, after filtering out overprinting sewer-associated taxa, sewage serves as a composite proxy for population-level human fecal microbiota. Comparative sewage analysis provides a unique opportunity to explore the relationship between human fecal communities and lifestyle or demographic differences in human populations. Combined with sensitive computational approaches to analyze microbial community data, sewage sampling provided a new approach that allowed us to move beyond the large individual-based sample collections that would be needed to compare microbiomes among 71 human populations.