Primer analysis.
Amplification bias is a well-known phenomenon in fungal ITS primer sets (
10), but only a few efforts (
15,
16) have been made to design alternative primers with enhanced coverage, and no effort has been made to comprehensively evaluate the coverage, taxonomic resolution, and accuracy of existing ITS primers for very-short-amplicon HTS read lengths (<250 bp). We designed an alternative ITS primer set based on a multiple alignment of the UNITE database and computationally predicted its coverage of known fungal sequences in comparison with previously established ITS primers. To ascertain the usefulness of each primer for describing fungal communities, all primers were bioinformatically analyzed to assess overall taxonomic coverage of all fungal clades contained in the reference database. First, all primers listed in
Table 1 were tested for overall database matches using the Primer Prospector “analyze primers” module (
20). The weighted score calculated by this module is based on alignment against reference database sequences to predict overall coverage. Among ITS primers binding in the SSU rDNA, BITS exhibited the lowest weighted score, indicating increased coverage and diminished taxonomic bias through a lower mean count of gaps and mismatches across all species in the sequence database, but several other primers also scored very well, including ITS1 and ITS5 (
Table 1). Among primers targeting the 5.8S rDNA, most primers had very low weighted scores, with ITS2_KYO1, ITS2_KYO2, and B58S3 as the lowest-scoring reverse primers. Among primers targeting the LSU rDNA, ITS4_KYO1 and ITS4 displayed the lowest weighted scores. Several primers exhibited very low database hits (as high weighted scores) and were eliminated from further testing.
Next, we used the taxonomic coverage module of Primer Prospector to predict the coverage of all fungal clades in both reference databases for each primer.
Figure 1 presents the taxonomic coverage predicted for all fungal subphyla in the UNITE reference database, indicating very good coverage (≥60% relative coverage of most subphyla) by most primers matching the 5.8S rDNA (with the exception of fITS9).
Agaricomycotina,
Pezizomycotina, and
Ustilaginomycotina exhibit the highest coverage (>80%) by the 5.8S primers (except fITS9 and ITS2), and primers B58S3, 58A2, ITS2_KYO1, and ITS3_KYO1 score highest among all 5.8S primers for all subphyla. In the SSU rDNA, BITS and ITS1 demonstrate highest relative coverage (19.1 to 63.3%) of all subphyla compared to other primers in the same binding region, but all primers display low relative coverage (2.9 to 19.1%) of
Taphrinomycotina, suggesting potential amplification bias against this clade. In the LSU rDNA, ITS4 and ITS4_KYO1 showed the highest coverage (22 to 47%) for all subphyla, particularly the
Pucciniomycotina (47% coverage). No hits were predicted for any primers for bacterial SSU/LSU sequences (data not shown), ensuring specificity for
Eukarya. The complete results, presenting relative taxonomic coverage of all sequences at each hierarchical level from subphylum to genus, are provided in the supplemental material.
To assess the taxonomic usefulness of HTS reads from each primer, the classification accuracy of all primer hits predicted against the UNITE database was evaluated at each taxonomic level. Artificial 150-bp and 250-bp amplicons were generated from the primer hits predicted by Primer Prospector and taxonomically classified using BLAST against the UNITE reference database. The top BLAST hits were compared to the actual taxonomic lineage of each sequence at each taxonomic level. Net accuracy for the primers against all fungi is presented in
Fig. 2, and that for all fungal classes is presented in Fig. S2 in the supplemental material. All primer pairs evaluated exhibited genus-level accuracies of >90% (>95% for ITS1 primers) for 150-bp reads and >95% for 250-bp reads. Primer pairs targeting the ITS1 locus exhibited species-level accuracy of >90% with both fragment sizes, but those targeting the ITS2 had much lower species-level accuracy (70 to 85%) with 150-bp reads, though this recovered to ∼90 to 95% with 250-bp reads (
Fig. 2). The same trends could be observed when categorizing assignment accuracy by individual taxonomic classes (see Fig. S2 in the supplemental material). In general, ITS1 locus primer pairs displayed greater accuracy for most classes at the species and genus levels than did the equivalent read length of ITS2 amplicons, with the exception of NSI1/58A2. Species-level classification of
Basidiomycota performed more reliably than those of
Ascomycota and non-
Dikarya, in general, by a narrow margin (see Fig. S2). The primer pairs 58A1/NLB4 and 58A2/NLB4 performed poorly, with <70% accurate classification of
Dothidiomycetes,
Sordariomycetes, and
Glomeromycetes from 150-bp reads. These results demonstrate that ITS1 HTS sequences achieve a high species- and genus-level assignment success rate using BLAST for taxonomic classification of most fungal groups.
Amplicon length unevenness can promote preferential amplification of shorter sequences (
10), and shorter amplicons are preferential for maximizing read coverage of ultravariable DNA loci by the short read lengths employed by Illumina platforms. Short amplicon length is also crucial for efficient quantification with qPCR (
28). Thus, amplicon lengths were predicted for all database matches for each primer to determine which primer sets would yield the shortest and most even amplicons. In general, whole-ITS and ITS1 subdomain amplicons displayed a greater distribution range of amplicon lengths than ITS2, but ITS1 amplicons were substantially shorter than ITS2 or whole-ITS amplicons (
Table 2). This coincides with the findings of Bellemain and coworkers (
10), who concluded that ITS1 was a preferential target for HTS efforts, as the shorter amplicon length and distribution would minimize preferential amplification of
Ascomycota, which they found to contain systematically shorter ITS2 and whole-ITS regions than
Basidiomycota. Amplicons were routinely longer for
Basidiomycota than
Ascomycota by 15 to 40 bp (ITS1 locus) and 30 to 50 bp (ITS2 locus). The shortest mean amplicon lengths were yielded by the BITS/B58S3 primer pair designed for this study, with a mean length of 183.6 (±46.8) bp, suggesting that most of these amplicons would be fully covered by current short sequencing reads. This primer pair is located at the 5′ terminus of the SSU rDNA, maximizing the amount of hypervariable ITS target sequence covered by a short sequencing read. Additionally, the amplicon is short enough to be efficiently amplified and quantified by qPCR, unlike the longer amplicons generated by other ITS primers, enabling qPCR quantification of the target.
Mock community testing.
Computational analysis of primer coverage and specificity cannot adequately predict behavior under mixed biological conditions, so we tested the accuracy of these primers profiling a defined yeast community in a single 250-bp MiSeq run. Several highly scoring primer pairs were selected (representing ITS1, ITS2, and whole ITS) to directly compare their efficacies at reconstructing the taxonomic distribution of a known mock community comprising eight yeast strains mixed at known rRNA operon copy number ratios. No primer pair could reconstruct the known taxonomic distribution of this mock community with perfect accuracy (
Fig. 3; see also Table S2 in the supplemental material), whether due to primer bias/mismatch, amplicon length bias, or computational bias (e.g., clustering of similar species as a single OTU), but most adequately profiled the dominant community members. To evaluate the accuracy of dominant-member mock community reconstruction by each primer pair, abundance-weighted Jaccard distance was calculated between each sample and used to construct principal coordinate plots (PCoA) to visualize the relationship to the expected community structure (see Fig. S3A in the supplemental material). 58A2/NLB4, BITS/B58S3, and ITS1F/ITS4 clustered close together with the expected community, indicating that these primer pairs are most effective at detecting the most dominant community members. To evaluate the accuracy of unweighted community profiling, a binary Bray-Curtis dissimilarity PCoA was also calculated between each sample and the expected community structure (see Fig. S3A). Primer pairs clustered separately with this statistic, though ITS1/ITS4, 58A2/NLB4, and BITS/B58S3 all clustered closer to the expected community, compared to ITS1F/ITS4 and NSI1/58A2. These data suggest that in practice, these primer pairs deviate from the expected mock community through the generation (or nonamplification) of low-abundance OTUs. This is a significant criterion for evaluating primer efficacy, as false-positive or -negative OTUs will distort community diversity and richness estimates that are important for ecological comparisons. Relative OTU abundance may be expected to shift with any PCR-based method for mixed-culture profiling, as a function of amplification bias. Given these results, mock community sequencing should be performed routinely for evaluation of new HTS primers to assess amplification bias issues under actual working conditions.
Food fungal community profiling.
To assess the usability for HTS to describe food fungal communities, this method was used to profile beer (
1) and wine fermentations that were previously analyzed with ITS-TRFLP (
2). Results indicate some qualitative similarity between HTS with the BITS/B58S3 primer set and ITS-TRFLP using the ITS1/ITS4 primer set. In American coolship ale, the same dominant community members are observed, dominated by
Saccharomyces cerevisiae,
Brettanomyces bruxellensis, and several oxidative yeasts, principally
Pichia kudriavzevii (see Fig. S5 in the supplemental material). Correspondence is weaker for the wine samples, as filamentous fungi were not represented in the empirical TRFLP database developed previously (
2), and thus several fungal species detected by sequencing are either undetected or misidentified by TRFLP (
Fig. 4). Sample similarity was further evaluated via abundance-weighted Jaccard distance PCoA (see Fig. S6 in the supplemental material). No distinct clusters could be established between methods for beer samples. However, wine samples cluster in a method-dependent fashion, indicating that OTU assignment differed between the two methods (see Fig. S6B). TRFLP infers OTU identity from restriction fragment size comparison to an empirical database, as opposed to sequence information, possibly leading to this disparity and highlighting a potential disadvantage of TRFLP. However, as the true composition of these samples is unknown, it is impossible to assess whether these differences are due to greater accuracy or greater noise with either method, and further evaluation must be performed to fully compare these techniques.