INTRODUCTION
It has been estimated that nearly 100 trillion microbes colonize in different human body habitats, collectively composing the microbiota (
1). Host-microbiota interactions are deeply involved in various physiological and metabolic activities (
2) and hosts’ health (
3). The human mouth is heavily colonized by microorganisms (
1) and acts as a portal for microbes to gain access to the respiratory and digestive tracts (
4). It is well acknowledged that the oral microbiota affects two common oral diseases, dental caries and periodontal disease (
5), while recent studies also indicate that the oral microbiota may play roles in maintaining systemic health through nutrition absorption, metabolism, and immune system regulation (
4).
Studies have suggested that many host-related factors are associated with diversity and compositions of microbial communities, such as the host’s race/ethnicity (
6), genetic background (
7), and socioeconomic status (
8). Increasing evidence suggests that there are racial differences in microbial profiles (
6) of vaginal (
9), gut (
10), and skin (
11) microbiomes. Two studies have investigated the differences in oral microbiome across racial groups (
12,
13). In the earlier one (
12), two strategies, terminal restriction fragment length polymorphism (t-RFLP) and 16S rRNA gene pyrosequencing, were used to assess microbiota in plaque and saliva samples from 192 individuals of four ethnic affiliations. They found that the oral microbiota of African-Americans (AAs) had lower alpha diversity than that of European-Americans (EAs), Chinese, and Latinos. In the more recent one (
13), the 16S rRNA gene was sequenced for the saliva samples of 152 participants from three different climate zones (
13). The authors found that both alpha and beta diversity differed significantly among populations from Alaska, Germany, and Africa. However, both studies had a small sample size, and/or a limited number of microorganisms were investigated. Here, we investigated the oral microbiota of 1,616 participants of the Southern Community Cohort Study (SCCS), including 1,058 AAs and 558 EAs.
DISCUSSION
Previous studies have demonstrated the racial differences in the human microbiome, with most studies focusing on the microbiota of the gastrointestinal tract (
14,
15), skin (
16), and vagina (
9,
17). Several studies have also implied the racial differences in the oral microbiome (
6,
18,
19). However, these studies had limited sample sizes and the oral microbiome of AAs was not well studied. In the study presented here, we observed significant differences in overall microbial diversity and composition between AAs and EAs and found multiple bacterial taxa, including several preidentified oral pathogens, that showed a significantly different abundance or prevalence between the two racial groups.
In the present study, significant differences in overall microbial composition were observed between AAs and EAs, which were consistent with the results from two previous studies (
12,
13). In the first one (
13), the saliva microbiome was profiled from 74 native Alaskans, 10 Germans, and 66 Africans. The authors found that Africans had a significant different microbial composition from native Alaskans and Germans. Similarly, in the other study (
12) investigating the subgingival microbiome of AAs, EAs, Chinese, and Latinos, a significant difference in overall microbial composition was observed between AAs and EAs. We also found that AAs showed a higher alpha diversity than EAs. However, in both of those studies, AAs showed a lower alpha diversity. This inconsistency could have two potential explanations. First, both of those studies were conducted with very small sample sizes, including 10 to 74 individuals within each group, which is substantially smaller than that (1,058 AAs and 558 EAs) of the present study. Due to the very small sample size, many bacteria of low abundance/prevalence could not be detected, which affected the estimation of microbial diversity. On the other hand, in one study (
13), the average sequencing depth was only ∼441 reads per sample, which is much lower than that of the present study (75,021 reads per sample). In the other study (
12), microbiomes were profiled using two strategies, i.e., terminal restriction fragment length polymorphism (t-RFLP) and 16S rRNA gene pyrosequencing The latter one, which has higher resolution power, was used for only a portion of the participants, which would have affected the accuracy of taxonomic assignment and then the diversity estimation.
In addition to the difference in overall microbial composition, we found 13 common taxa showing a differential abundance between AAs and EAs. Especially, AAs had a higher abundance of
Bacteroidetes and a lower abundance of
Actinobacteria and
Firmicutes. In the above-mentioned saliva microbiome study (
13), several genera showed a significantly different abundance in comparing Africans with native Alaskans and Germans. Several of them, including the higher abundance of
Porphyromonas and the lower abundance of
Rothia and
Granulicatella among Africans, were consistent with results of the present study. In addition, the lower abundance of
Rothia among AAs was also reported by the above-mentioned subgingival microbiome study (
12). No studies have investigated the racial differences of the remaining taxa; hence, a comparison could not be made. Among these 16 taxa, several have been associated with diseases. For example,
Actinobacteria was reported to be associated with a decreased risk of type 2 diabetes (
20).
Granulicatella adiacens (
21) and
Streptococcus oligofermentans (
22) were found to be associated with infective endocarditis.
We also found 19 rare taxa that showed a significantly higher prevalence among AAs. Among them, four species,
Porphyromonas gingivalis,
Prevotella intermedia,
Treponema denticola, and
Filifactor alocis, have been established to be involved in the pathogenesis of a variety of forms of periodontal diseases (
23,
24). Studies have shown a racial disparity in periodontal disease, which is highly correlated with oral bacterial pathogens (
25). Several studies have reported that older AAs have more missing and decayed teeth than EAs (
26,
27). In addition, data from the National Health and Nutrition Examination Survey (NHANES) showed a 20% greater prevalence of periodontitis (
28) and 25% higher rates of dental caries (
29) among older AAs (aged 65 years or older) than among older EAs. The differential prevalence of these four oral pathogens may, to some extent, contribute to the disparity of oral health between AAs and EAs. In addition to these oral pathogens, another 15 rare taxa were more prevalent among AAs as well. An earlier study, using 16S rRNA gene cloning and sequencing, found several genera, including
Peptostreptococcus, associated with periodontitis (
30). In addition, one of the species of this genus,
Peptostreptococcus stomatis, was observed in peri-implantitis by two recent studies (
31,
32). Therefore, the overprevalence of these two taxa might also have contributed to the worse oral health status among AAs than among EAs. However, given that oral hygiene may also contribute to the oral health disparity between the two racial groups but oral hygiene data were not collected from study participants, we could not eliminate the possibility that the enrichment of these periodontal disease-related bacteria in AAs may be attributed to the differences in oral hygiene between AAs and EAs.
To the best of our knowledge, this study is the largest to explore racial differences in the oral microbiome. 16S rRNA gene sequencing was utilized to profile the oral microbiota, which has better resolution than traditional techniques, such as probe-based DNA-DNA hybridization, used in earlier studies. In addition, we adjusted for a variety of covariates among all statistical analyses, making the findings of this study reflect, to the greatest extent, the relationship between oral microbiota and racial affiliation. Further, the availability of genetic data for a portion of study participants made our study the first to evaluate the associations of hosts’ genetic African ancestry with the oral microbiome. A limitation of this study is that it lacks a comprehensive oral health assessment at the baseline examination during the enrollment. In addition, for each of the participants, only one mouth rinse sample was collected; hence, our findings may be impacted by the potential misclassification bias. Further, it is well acknowledged that though 16S rRNA sequencing can provide a stable and accurate resolution for microbiota at the genus level, the species-level profiling was not optimal. Future studies employing the shotgun metagenomic sequencing technology will be needed to fill this gap.
In summary, we found that there were significant differences of overall oral microbiota composition, as well as individual bacterial taxon abundance/prevalence, between AAs and EAs. These results suggest the potential role of oral microbiome in health disparity. The causal mechanisms and factors shaping this difference warrant further investigation in larger sample sizes and with better microbiome profiling techniques.
MATERIALS AND METHODS
Study population and data collection.
The SCCS is a prospective study designed to explore health disparities in low-income populations. Details of the study have been described elsewhere (
33). Briefly, more than 85,000 adults, aged 40 to 70, were recruited during 2002 to 2009 from 12 states in the southeastern United States, with two-thirds of the participants being AAs. At the enrollment, mouth rinse samples were collected from ∼34,100 participants. Written informed consent was obtained from all study participants. The SCCS was reviewed and approved by Vanderbilt University Medical Center and Meharry Medical College.
During enrollment, the baseline survey was taken by all participants through the filing of a comprehensive questionnaire to gather individuals’ basic information, including age, race/ethnicity, sex, education level, income, lifestyle, anthropometric features, disease history, and so forth. After recruitment, study participants were followed up by using record linkage and mail- or telephone-based surveys. Health-related outcomes were determined from National Death Index mortality records and/or through linkage with state cancer registries.
The present study included participants who provided mouth rinse samples during the study enrollment and were involved in four nested case-control studies to investigate the oral microbiome and incident cases of colorectal cancer, type 2 diabetes, lung cancer, and upper aerodigestive tract cancer. All participants were free of any diseases at the time of mouth rinse sample donation. After excluding participants with a self-reported history of antibiotic usage during the year before biospecimen collection, 1,616 individuals were included in the present study.
16S rRNA gene sequencing.
DNA was extracted from mouth rinse samples using Qiagen’s QIAamp DNA kit (Qiagen Inc., Germantown, MD, USA). The NEXTflex 16S V4 Amplicon-Seq kit (Bioo Scientific, Austin, TX, USA) was used to build a library to sequence 253 bp of the V4 domain of the 16S rRNA gene. The data were generated in two batches. For the first batch, 150-bp paired-end sequencing was performed using the Illumina MiSeq 300 at the Vanderbilt Technologies for Advanced Genomics (VANTAGe) Core. For the second batch, 250-bp paired-end sequencing was conducted via the Illumina HiSeq System at BGI Americas (Cambridge, MA, USA). For both batches, each 96-well plate, including an additional negative-control sample and two duplicated quality control (QC) samples, was sequenced. All duplicated samples showed comparable microbial profiles. For example, for the overall microbial richness (alpha diversity measured by Faith’s phylogenetic diversity [PD] index), the coefficient of variability (CV) among the repeated QC samples is 1.7%. For the relative abundance of individual taxa, the median of the Spearman correlation coefficients between the duplicated QC samples is 98.6%.
Sequencing data processing and quality controls.
For 16S rRNA sequencing data, Sickle (v1.33), BayesHammer, and PANDAseq (v2.10) were used successively to perform low-quality read trimming and removal, sequencing error correction, and paired-end read assembly (
34). Then, the merged high-quality reads were processed by Quantitative Insights Into Microbial Ecology (QIIME; v1.9.1). The Human Oral Microbiome Database (HOMD) was used as reference. UCLUST (v1.2.22q) was used for clustering with 97% sequence similarity as the threshold. Those operational taxonomic units (OTUs) observed in fewer than two samples were highly unreliable; hence, they were excluded. Then, the OTU table was summarized to microbial taxon levels.
Statistical analysis.
For the microbial richness, i.e., alpha diversity, Faith’s PD index was calculated. We first evaluated the associations of participants’ lifestyle factors with alpha diversity through linear regression analyses. Then, the difference of the alpha diversity between AAs and EAs was estimated by the Wilcoxon rank sum test. For the overall microbial composition, i.e., beta diversity, the weighted UniFrac distance, unweighted UniFrac distance, and Bray-Curtis dissimilarity matrices were generated. The beta diversity between AAs and EAs was evaluated through the regression-based kernel method, implemented in MiRKAT (
35) (v0.02). We also evaluated whether our data collection was representative enough for both alpha and beta diversity through estimating the changes in alpha and beta diversity along with the increase in number of samples within AAs and EAs, respectively.
For individual taxa, we tested the difference of the relative abundance and/or prevalence at the phylum, family, genus, and species levels between AAs and EAs. First, we investigated the taxa with a relative abundance of >0.10% among AAs, namely, common taxa, including five phyla, 15 families, 16 genera, and 29 species. For each sample, centered log-ratio (clr) transformation was used to normalize taxon read counts. Then, linear regression analysis was conducted with transformed abundance data as outcome and race as independent variables. For those taxa with a relative abundance of ≤0.10% in AAs, namely, rare taxa, we tested their differential prevalence between AAs and EAs via logistic regression. Due to the limited power for the very rare taxa, only those with a prevalence of >30% (with a non-zero read count in >30% of the participants) among AAs were included in the analyses, including four phyla, 19 families, 42 genera, and 102 species.
Genome-wide single nucleotide polymorphism (SNP) array data were available for 397 of the 1,616 individuals, including 324 AAs and 73 EAs, and were used to estimate the percentage of genetic African ancestry for these 397 participants in our previous studies (
36–38). Briefly, autosomal common SNPs (minor allele frequency > 0.05) with low linkage disequilibrium (pairwise
r2 < 0.10) were used to estimate the genetic African ancestry, utilizing ADMIXTURE (v1.3.0). We then evaluated the association of the genetic African ancestry percentage with taxon relative abundance or prevalence using linear regression. Among the 73 self-reported EAs, the average African ancestry was only 0.27%. Hence, we included the remaining 485 EAs without genetic data (assuming the African ancestry percentage being 0) in the analyses, resulting in 882 participants in total.
During all of the statistical analyses, we adjusted for the following variants: age, sex, body mass index (BMI), smoking, alcohol consumption, total energy intake, tooth loss, annual household income, state of enrollment, disease status during the first follow-up, and sequencing batch. Among them, BMI and age were treated as continuous variables, and all the other categorical factors were treated as dummy variables, including sex (men and women), smoking (current, former, and never-smoker), alcohol consumption (ever-drinker, never-drinker, and missing), total energy intake (first tertile, second tertile, third tertile, and missing), tooth loss (no teeth lost, loss of 1 to 10 teeth, loss of >10 but not all teeth, loss of all teeth, and missing), annual household income (<$15,000, $15,000 to $50,000, >$50,000, and missing), state of enrollment (12 U.S. states), disease status during the first follow-up (any diseases and no disease), and sequencing batch (first and second batch). Among the factors with missing values, the missing rate is high only for tooth loss (∼34%) and low for all of the others, i.e., ∼5% for total energy intake and ∼1% for annual household income. The microbial taxa at different taxonomic levels are highly correlated. Therefore, Bonferroni correction is too conservative to correct multiple testing. To address this, we used a method described by Galwey (
39), implemented in the R package “poolR” (v0.1-0) (
https://github.com/ozancinar/poolR/), to evaluate the number of effective tests for common taxa and rare taxa separately. All
P values were then corrected for multiple testing based on the estimated number of effective tests. For the alpha diversity index, beta diversity matrices, and bacterial taxa that were significantly associated with race, we further conducted stratified analyses by sequencing batch to evaluate the consistency of these associations between batches. All analyses in the present study were carried out using R (v3.3.1) and Python (v2.7.8).
ACKNOWLEDGMENTS
We thank all of the individuals who took part in the study and all of the researchers, clinicians, technicians, and administrative staff who enabled this work to be carried out. We thank Regina Courtney, Jie Wu, Jing He, and Marshal Younger for their help with sample preparation, statistical analysis, and technical support for the project. The data analyses were conducted using the Advanced Computing Center for Research and Education (ACCRE) at Vanderbilt University.
Sample preparation was conducted at the Survey and Biospecimen Shared Resources, which is supported in part by the Vanderbilt-Ingram Cancer Center (P30 CA68485). The SCCS was supported by NIH grant R01CA92447 and U01CA202979. This project was also supported by the development fund from the Department of Medicine at Vanderbilt University Medical Center and the NIH-supported grants R01CA207466, R01CA204113, and U54CA163072. The funders had no role in study design, data collection and analysis, decision to publish, or preparation of the manuscript.
The authors declare no conflict of interest.