Extracted DNA was transported on dry ice to the Fierer lab at the University of Colorado Boulder for PCR amplification and sequencing using the primers and methods of the Earth Microbiome Project (
113). For prokaryotes (bacteria and archaea), we targeted the V4 region of the 16S rRNA gene using the 515F/806R primer pair, modified to include the necessary Illumina adapters. For eukaryotes, we targeted the V9 region of the 18S rRNA gene using the 1391f/EukBr primer pair. Following PCR, DNA was pooled, normalized with the SequalPrep normalization plate kit (Invitrogen, Carlsbad, CA, USA), and then sequenced on the Illumina MiSeq platform using 2 × 150 bp chemistry at the BioFrontiers Institute (Boulder, CO, USA). Amplicon reads were demultiplexed using the open source “idemp” tool (
https://github.com/yhwu/idemp), and adapters were cut from the sequences using the open source “cutadapt” tool (
114) (
https://cutadapt.readthedocs.io/en/stable/) with default parameters and --minimum-length set at 50. Sequences were then quality filtered (16S parameters maxEE = 1, truncQ = 11, maxN = 0; 18S parameters maxEE = 2, truncQ = 2, maxN = 0), trimmed (150 and 145 bp for 16S, 103 bp for 18S) and merged (only 16S) using the DADA2 pipeline (
115) to then infer amplicon sequence variants (ASVs) (
116) and remove chimeras. 18S rRNA gene reads were not merged, and only the forward reads were used due to the variable length of the amplified region. Using the DADA2 pipeline, taxonomy was assigned using the SILVA database (
117) version 132 (
https://www.arb-silva.de/) for 16S rRNA sequences and the PR2 database (
118) (
https://pr2-database.org/) for 18S rRNA sequences. A phylogenetic tree of the 278 most abundant (>0.0625% mean relative abundance) and ubiquitous (present in >5% of samples) bacteria and archaea was constructed by using PyNAST (
119) to align sequences and FastTree (
120) to construct the tree using the QIIME program (
121). Trees were visualized using the
ggtree (
122) R package. Eukaryote, chloroplast, and mitochondrial sequences were removed from the 16S rRNA sequence data set, as well as any ASV not assigned to either the bacterial or archaeal domains. 16S rRNA gene sequence data were then rarefied to 8,000 sequences per sample. Taxonomic filtering and rarefaction were performed using the
mctoolsr (
123) R package. This sequencing depth is adequate to capture most of the richness of ASVs in each sample, which ranged from an average of 196 to 247 ASVs per sample depending on the region (see Fig. S10 at
https://doi.org/10.6084/m9.figshare.14390375). 18S rRNA gene sequence data were not rarefied, but samples with less than 3,100 sequences per sample before any taxonomic filtering were removed (samples should also have abundant plant and chimpanzee DNA). We identified 11 different ASVs that were present in at least 10% of samples, all of which were from known parasite species based on the literature (
134). We took a conservative approach (to avoid false-positive results) and defined presence as having ≥50 sequences in a sample. To avoid counting two ASVs that likely represent the same parasite taxon as separate taxa, we ran correlations on read abundances and used BLAST analysis of ASV sequences for any ASVs identified as belonging to the same family, combining those ASVs that were strongly correlated and identified as belonging to the same species. Two of the ASVs that were significantly and strongly correlated (
r = 0.82,
P < 0.001) and closely related (percent identity = 98.06) were combined. Furthermore, due to previous work (
83,
124) on known chimpanzee parasites from the
Blastocystis,
Strongyloides, and
Entamoeba genera, we included a
Blastocystis ASV, a
Strongyloides ASV, and two
Entamoeba ASVs that were present in at least 5% of samples, for a total of 14 parasite ASVs (see Table S3 at
https://doi.org/10.6084/m9.figshare.14390426). These most prevalent and known chimpanzee parasites are unlikely to be sourced from prey species (
93).