INTRODUCTION
Microbial populations are the primary drivers of many ecosystem-level processes (
2,
13,
45) and can be exceedingly complex, with estimates approaching 10
5 individual taxa per gram of soil and abundances of up to 10
9 to 10
11 cells per gram (
53,
54). With such complexity at the heart of ecosystem processes, there is a need to assess this complexity reliably and affordably. Rapidly growing databases of small-subunit (SSU) ribosomal (rRNA) genes have helped pave the way for high-density microarray analysis as a well-established method for reproducibly analyzing many taxa in parallel (
11,
19). The PhyloChip has been developed specifically for querying small-subunit rRNA gene pools from complex environmental samples (
6,
18,
27). The resolution of the G2 PhyloChip is up to 8,471 individual taxa resolved at the subfamily level or better (depending upon the group), representing a significant portion of known bacteria and archaea (
6). PhyloChip cost and reproducibility enable researchers to faithfully analyze, with biological replication, microbial communities in changing environments (
44). Furthermore, microarray-based approaches are less susceptible to the influence of dominance in microbial communities, whereby sequences of more abundant members mask the presence of other numerically significant taxa and rare species (
18,
27).
Integral to most methods of microbial community analysis is PCR amplification of small-subunit rRNA genes, undertaken primarily to obtain a sufficient mass of genetic material for analysis. This initial manipulation has well-known inherent biases and potentially unknown effects. The biggest bias is associated with multitemplate PCR, in which the relative abundances of 16S rRNA gene signatures are distorted during PCR amplification (
22,
42); the choice of primer pairs as well as the number of amplification cycles strongly influences the ratios of amplicons in the final pool when mixed templates are amplified by PCR (
52). Uneven amplification of mixed templates precludes both accurate estimation of evenness in communities and estimates of fold change in response to perturbation or experimental manipulation. Other problems include formation of chimeric amplicons and deletion and point mutants and amplification of contaminating DNA (
1,
55). Universal primer sequences can cause preferential amplification of some sequences when nucleotide degeneracies are incorporated to capture a larger portion of the diversity; the result is unpredictable differential representation in the final amplicon pool (
37,
42). Because of these limitations and caveats to traditional PCR-amplified microbial community analysis, an amplification-free method is preferable.
RNA is the most appropriate molecular marker for faithful estimates of active microbial communities. DNA extracted from natural environments may include structural DNA from biofilms (
51), as well as DNA from populations that are dead, dormant, or otherwise not directly contributing to ecosystem function (
17). Small-subunit rRNA genes span large phylogenetic distances (
58), and because the SSU RNA gene encodes an integral structural component of the ribosome, it is tied directly to cellular activity. Though a linear relationship between cellular rRNA content and specific growth rate has been demonstrated for
Escherichia coli (
5) and is generally observed in other organisms (
3,
20,
31,
40,
43,
47), there are inconsistencies that arise in culture, especially when organisms are grown under nonoptimal conditions (
3,
47). Many instances that would cause false negatives, for example, where taxa show no change in rRNA abundance despite cellular activity (
3,
50), would not be reflected in analysis of dynamic taxa. An example is the marine
Synechococcus strain WH8101, which had consistently depressed rRNA content at low growth rates that decreased at the highest growth rates (
3). Likewise,
E. coli cells that experience unfavorable conditions, such as nutrient limitation, may experience rapid ribosomal degradation (
14). However, false positives may also arise; for example, the oxygen-sensitive “
Candidatus Brocadia anammoxidans” maintains a relatively high cellular 16S rRNA content (though decreasing intergenic spacer region RNA content) after 24 h of exposure to oxygen (
47). Additionally, dormant cells contain rRNA in concentrations dependent upon how the dormancy originated (
21,
36). In the face of these limitations, the PhyloChip is probably the best tool for analyzing changes in relative abundance in rRNA transcript and gene concentration levels due to its high specificity, sensitivity, and repeatability. In this way, examination of rRNA- and DNA-based microbial communities together should provide an indication of active and standing microbial communities.
In light of the advantages to be gained in a direct assessment of complex microbial communities, we have developed two new methods for PCR-independent microarray analysis of microbial communities: direct RNA hybridization (dirRNA) and double-stranded cDNA (dscDNA) generation and hybridization. While direct hybridization of rRNA to oligonucleotide microarrays is not new (see references
8,
48, and
60), this is the first application of these methods to the widely used, high-density 16S rRNA PhyloChip. To validate these methods, first a mock community consisting of a defined set of bacteria and archaea was analyzed to evaluate the dscDNA synthesis and hybridization method. Next, three case studies were evaluated that compare the results of the traditional PCR-amplified DNA (PCR-DNA) profile to the RNA-based community profiles. We use the combination of all four experiments to evaluate the quality of data derived from PCR amplicon hybridization compared to data derived from direct hybridization.
MATERIALS AND METHODS
Mock-community preparation.
One archaeal and seven bacterial cultures were grown to a cell density of approximately 10
8 to 10
9 cells ml
−1 overnight.
Desulfovibrio vulgaris strain RCH1 and
Pseudomonas stutzeri strain RCH2 were both grown anaerobically at 37°C overnight;
D. vulgaris was grown in minimal medium amended with 60 mM lactate and 50 mM sulfate under N
2 headspace, while
P. stutzeri was grown in minimal freshwater medium amended with 10 mM lactate and 20 mM nitrate under N
2/CO
2 (80:20) headspace.
Bacillus subtilis,
Burkholderia cepacia,
Arthrobacter chlorophenolicus, and
E. coli were grown aerobically in LB medium at 37°C overnight, with shaking at 150 rpm.
Caulobacter crescentus was grown aerobically in PYE (peptone, yeast extract) broth at 37°C overnight, with shaking at 150 rpm.
Sulfolobus solfataricus was grown aerobically at 80°C for 24 h with shaking at 150 in Brocks medium with tryptone, yeast extract, and sucrose. Cells were pelleted by centrifugation, the spent medium was removed, and cells were stored at −80°C. Cell pellets were resuspended in 50 μl of aluminum ammonium sulfate before RNA extraction using the methods described below. From each strain we quantified the RNA using Ribogreen (Invitrogen, Carlsbad, CA), created dscDNA (as described below), and quantitatively mixed the replicate mock communities separately (
Fig. 1A). Three replicates of six different mock communities differing in total abundance and evenness but the same richness (8 taxa) were constructed with either low or high total abundance (corresponding to 200 ng or 400 ng of dscDNA applied, respectively) and three levels of diversity, designated even, log-linear uneven, or reverse log-linear uneven (where the taxon abundances were reversed) communities. Phylogenetic composition of dscDNA derived from each isolate was verified by pyrosequencing of the small-subunit rRNA gene (see the supplemental material for methods).
Sample collection and preparation.
We inoculated anaerobic microcosms with groundwater from the Cr-contaminated Hanford 100H site (WA) and supplemented them with lactate (10 mM) and the electron acceptors nitrate (4 mM), sulfate (2.5 mM), amorphous ferric oxyhydroxide (20 mM), and chromate (250 ppb). Microcosms were destructively harvested at various time points during sequential terminal electron-accepting processes, and 15-ml aliquots were sampled into 30 ml of RNAlater. After an overnight incubation at 4°C, the samples were centrifuged (5,000 ×g for 15 min at 4°C), and the supernatant was discarded. The pellets were flash frozen in liquid nitrogen and stored at −80°C, and the DNA and RNA were extracted as detailed below. For this study, samples taken after 2 days of incubation following complete denitrification were analyzed.
Soil samples for the lab incubation were collected from the Luquillo long-term ecological research (LTER) experimental forest, located in the El Yunque National Forest in Puerto Rico. Soil samples were collected as cores from a depth of 0 to 10 cm and homogenized before being assembled into laboratory incubation slurries. Water was added to soil at a rate of 1 g of soil 1.5 ml−1 as degassed, deionized water, and the headspace was exchanged for an anaerobic atmosphere comprised of 90% N2, 5% CO2, and 5% H2. Switchgrass (Panicum virgatum) was air dried at 40°C, ground to a 0.42-mm particle size using a Wiley mill, and then added to half the soil lab incubations at a rate of 0.1 g per gram of soil. Negative-control lab incubation jars were inoculated with switchgrass, water, and no soil and displayed no activity under anaerobic conditions for the duration of the experiment. Lab incubations were carried out for 19 days on a MicroOxymax respirometer (Columbus Instruments, Columbus, OH), under ultrapure N2 headspace. At the end of 19 days, jars were harvested and frozen under liquid nitrogen, and DNA and RNA were extracted as detailed below.
For the sewage experiments, prechlorinated secondary sewage was obtained from East Bay Municipal District (Oakland, CA). Grab samples (330-ml) from three separate clarifiers were combined into a 1-liter bottle and mixed. One hundred milliliters of the secondary sewage was injected into the diffusion chamber lined with 0.22-μm-pore-size membrane filters (Supor-200 membrane filter; Pall Corp.) and incubated in filter-sterilized San Francisco Bay water for up to 14 days at 10°C. Aeration of sewage samples was achieved through transferring the total volume of the sewage in each diffusion chamber into a new diffusion chamber after 48 h of incubation. The experimental design included samples before chamber incubation at day 0 (T0 samples A, B, and C) and sewage that had been incubating in the diffusion chambers at day 3 (T3 samples A, B, and C). Samples were filtered onto 0.2-μm-pore-size membrane filters (Anodisc 47 mm; Whatman), placed in Whirlpak bags, and frozen in liquid nitrogen immediately. The filters were stored in a −80°C freezer until nucleic acid extraction.
Nucleic acid extraction from environmental samples.
All environmental samples were twice extracted following a previously published procedure (
15), with a few modifications. Briefly, samples were extracted in lysing matrix E tubes (MB Biomedicals) using a hexadecyl-trimethylammonium bromide (CTAB) extraction buffer, which is 5% CTAB, 500 mM phosphate buffer, and 1 M sodium chloride; a final concentration of 10 mM aluminum ammonium sulfate was also added (
4), along with phenol-chloroform-isoamylalcohol (25:24:1). For the chromium incubation samples, 1 g of Chelex-100 was also added to each pellet. Following bead-beating in a FastPrep FP120 (Bio101, Vista CA) at 5.5 m s
−1 for 30 s, extracts were purified with chloroform. Nucleic acids were precipitated in 30% polyethylene glycol (PEG) 6000, washed once with 70% ethanol, and reconstituted in water. These extracted nucleic acids were then cleaned using an AllPrep kit (Qiagen, Valencia CA), and the RNA was cleaned of DNA on the column using an RNase-free DNase set (Qiagen, Valencia CA). RNA was checked for contaminating DNA by PCR amplification. At this point, DNA was ready for PCR amplification, and RNA was ready for either direct RNA microarray hybridization or for synthesis of double-stranded cDNA (dscDNA).
PCR amplification of DNA.
For application to PhyloChip, DNA was PCR amplified as previously published (
6,
15). Briefly, bacterial PCR product was amplified using the 8F and 1492R primers (
28,
57) and cleaned using a Qiagen MinElute PCR kit (Qiagen). Archaeal PCR product was amplified using the 4Fa and 1492R primers (
30), gel purified over a 2% agarose gel, and cleaned using a Qiagen MinElute kit (Qiagen).
Double-stranded cDNA production.
Double-stranded cDNA was generated in three steps: first-strand synthesis, second-strand synthesis, and RNA digestion. The first strand was generated from 2,000 ng of total RNA using Superscript Reverse Transcriptase (Invitrogen) and incubated at 42°C for 50 min, followed by incubation at 72°C for 15 min. The gene-specific primer used was 1492R (
57); random hexamers were also used for one set of samples (Invitrogen). The second strand was generated by incubating the first-strand reaction product with Second Strand Reaction Buffer (Invitrogen), DNA ligase (10 units), DNA polymerase I (40 units), and RNase H (2 units) for 2 h at 16°C; 20 units of T4 DNA polymerase was then added, and this step was followed by incubation at 16°C for another 5 min (New England BioLabs). The dscDNA was cleaned of RNA by adding RNAse III (Shortcut; New England BioLabs) and incubating at 37°C for 10 min, and then the sample was cleaned by phenol-chloroform extraction. At this point the dscDNA was ready for fragmentation and labeling for microarray hybridization.
PhyloChip microarray analysis.
Microbial communities were analyzed using the G2 PhyloChip, which is capable of distinguishing 8,741 taxa (
6). For PCR products, bacterial and archaeal PCR-amplified DNAs were quantified using gel electrophoresis; then nucleic acids were fragmented, labeled, and hybridized as previously described (
7). For RNA hybridization, we estimate that about half of the total RNA was 16S rRNA, so extra RNA was fragmented, labeled, and hybridized. Total dscDNA generated from gene-specific primer 1492R or 1,000 ng of dscDNA generated from random hexamers was applied to each PhyloChip. After quantification of the dscDNA, this DNA was fragmented, labeled, and hybridized using the same methods as for PCR-amplified DNA (PCR-DNA).
For RNA direct hybridization, 16S rRNA was enriched from total RNA by gel extraction. Direct analysis of rRNA was achieved using a modification of the protocol of Cole et al. (
12). To account for technical variation between hybridizations, a set of internal RNA spikes was added to each sample preparation. These spikes consisted of transcripts generated by T7- or T3-mediated
in vitro transcription using the Ambion MEGAscript T7 kit linearized plasmids pGIBS-LYS (containing LysA; ATCC 87482), pGIBS-PHE (containing
Bacillus subtilis Phe gene; ATCC 87483) and pGIBS-THR (containing
Bacillus subtilis Thr gene; ATCC 87484); the T7 transcripts were then purified with RNeasy MinElute columns. To each RNA fragmentation reaction mixture, 1.35 × 10
10, 3.13 × 10
10, and 3.13 × 10
11 transcripts of Lys, Thr, Phe, respectively, were added in a volume of 1 μl. Combined sample RNA (1 μg) and spike mix were fragmented and dephosphorylated simultaneously using 0.1 U of RNase III/μg of RNA, shrimp alkaline phosphatase (USB, OH), and 0.2U/μg of RNA in a buffer containing 10 mM Tris-HCl, 10 mM MgCl
2, 50 mM NaCl, and 1 mM dithiothreitol ([DTT] pH 7.9) in a final volume of 20 μl. The mixtures were then incubated at 37°C for 35 min, followed by inactivation at 65°C for 20 min. RNA labeling with multiple biotin residues utilized an efficient labeling system that employs T4 RNA ligase to attach a 3′ biotinylated donor molecule (pCp-Biotin3; Trilink Biotech, San Diego, CA) to target RNA (
11). Labeling was performed with 20 μl of fragmented/dephosphorylated RNA, 20 U of T4 RNA ligase (NEB, MA), and 100 μM pCp-Biotin3 in a buffer containing 50 mM Tris-HCl, 10 mM MgCl
2, 10 mM DTT, 1 mM ATP (pH 7.8), and 16% (vol/vol) PEG 8000. The final volume was 45 μl. The reaction mixture was incubated at 37°C for 2 h and inactivated at 65°C for 15 min. The mixture was then prepared for PhyloChip hybridization and was processed according to the manufacturer's standard Affymetrix expression analysis procedures for cDNA (see also references 6, 7, 15, and 18). The internal controls (spike-in mix) for DNA and dscDNA arrays were previously described (
7) and ranged from 4.10 × 10
8 to 8.87 × 10
10 transcripts applied.
All PhyloChips were hybridized at 48°C for DNA or dscDNA and at 50°C for RNA in an Affymetrix hybridization oven for 16 h at 60 rpm. Microarrays were stained according to the Affymetrix protocol and then immediately scanned using a GeneChip Scanner 3000 7G (Affymetrix, Santa Clara, CA). To process captured fluorescent images into taxon hybridization scores, images were background corrected, and probe pairs were scored as previously described (
6), with the following exceptions. There were some differences between analysis of RNA and DNA arrays due to differences in hybridization chemistry that were required. For the DNA and dscDNA arrays, the noise is squared to increase stringency; the noise was not squared for the RNA arrays to increase sensitivity. Noise is derived from the standard deviation of pixels within the lowest 2% of features in each of 16 sectors of the array (these are the same features averaged to yield the background calculation); the RNA background was so much higher than that for DNA that it required a decreased stringency. Hybridization scores on the direct RNA (dirRNA) arrays were scaled to 4,000 while the DNA and dscDNA arrays were scaled to 2,500, which were based on the different internal spike mixes added for RNA or DNA as described above.
Statistical analysis.
All statistics were performed using the programs R (R Development Core Team, Vienna, Austria, 2005) or JMP (SAS Institute, Cary, NC). The microbial community as defined by hybridization scores for individual taxa was analyzed using nonmetric multidimensional scaling (NMDS) with a Bray distance measure; this method should be the most robust for data on discontinuous or nonlinear scales and provides the most faithful model of relative relationships between sites (
34,
35). Mantel tests were performed with 1,000 permutations to examine the degree to which the distance matrices derived from different methods (PCR-DNA, dirRNA, or dscDNA, as described above) were correlated, and results were evaluated to a significance level of a
P of <0.05, where the null hypothesis is that of no correlation between the matrices. All Mantel correlations are calculated using the Pearson product moment coefficient. A multiresponse permutation procedure (MRPP) was used to test the null hypothesis that the ordination contained distinct subgroups that were statistically separate from one another (a significant MRPP
A statistic indicates difference between groups). Previously, the probe fraction was found to correlate well with richness patterns displayed by clone library analysis (
18), so to estimate richness a probe fraction value of 0.9 was employed, below which taxa were deemed absent. For analysis of the effect of methods on richness estimates, either analysis of variance (ANOVA) with a Tukey's honestly significant differences (HSD) pairwise comparison or a Student's
t test was used. To avoid an inflated estimate of false positives due to multiple comparisons, all
P values were adjusted by the Benjamini-Hochberg method, and then statistics were performed using the adjusted
P value cutoff of 0.05. All means reported are accompanied by standard error.
RESULTS
Analysis of quality in direct hybridization methods.
Four experiments were devised to evaluate the behavior of direct RNA (dirRNA) and dscDNA hybridization in a variety of environmental sample types: (i) a mock communities, where eight known taxa were mixed to six different community structures in three replicates; (ii) microcosm communities derived from chromium-contaminated aquifer groundwater, where three replicate extractions were analyzed by PCR-amplified SSU DNA (PCR-DNA) and direct RNA (dirRNA) hybridization; (iii) anaerobic lab incubations of Puerto Rican tropical forest soils with or without switchgrass added, where three replicate incubations were analyzed by PCR-amplified SSU DNA, dscDNA, and direct RNA; (iv) sewage sludge diffusion chambers incubated in seawater, where three replicate chambers and two time points were analyzed by PCR-DNA, dscDNA, and dirRNA (
Table 1).
Combined quality control data from all experiments suggest high integrity of both direct analysis methods. We found a significant positive relationship between yield of generated dscDNA and the starting amounts of RNA across all experiments (R = 0.784, P < 0.05). As measures of the quality of the PhyloChip hybridization, background and noise were 3 orders of magnitude less than median signal intensity and well within acceptable values for all methods though there were differences. Noise (ANOVA, P < 0.0001) and background (ANOVA, P < 0.0001) were significantly different by method, with RNA microarrays as the highest, PCR-DNA microarrays as intermediate, and dscDNA microarrays as significantly lower than the other methods (see Fig. S1A and B in the supplemental material). When corrected for mass of nucleic acid hybridized, background and noise values are still significantly higher for RNA than for dscDNA or PCR-amplified DNA hybridization (see Fig. S1C and D). Positive correlations between the amount of nucleic acids hybridized and background (R = 0.391, P < 0.01) and noise (R = 0.515, P < 0.001) were strongly driven by RNA; exclusion of RNA from this analysis reduced the trend substantially (see Fig. S1E and F). RNA hybridization is likely hampered by the viscosity of polyethylene glycol (PEG) in the direct RNA labeling reaction and subsequent hybridization, which reduces washing efficiency in this protocol.
Mock microbial community to test the dscDNA method.
A series of mock communities was created from mixtures of one archaeal and seven bacterial isolates; dscDNA was generated and then combined into six separate communities, with two masses of total dscDNA hybridized (low abundance, 200 ng; high abundance, 400 ng) and three levels of diversity (even, log-linear uneven, and reverse log-linear uneven) (
Fig. 1A). The PhyloChip results agreed extremely well with the mock-community model (
Fig. 1B and C) and accurately distinguished the six different microbial communities (for mass hybridized, MRPP
A = 0.261 and
P < 0.001; for diversity, MRPP
A = 0.4117 and
P < 0.001).
To evaluate detection of mock communities, we examined the responses of target taxa as well as target classes. All taxa displayed a significant positive correlation of the amount of dscDNA hybridized to hybridization scores of mock-community members (
Table 2). The mock-community data set has a richness of 226 taxa, which is a result of trace contamination in some isolates (see Table S1 in the supplemental material) and may also be due to cross hybridization; richness of classes also increased significantly with increasing DNA applied from that particular group, especially in the
Actinobacteria and
Betaproteobacteria (see Table S2), which has been observed previously (
18). Across all taxa, hybridization of larger amounts of dscDNA resulted in significantly higher richness, which may be due to either the increased absolute amount or relative amount hybridized. The archaeon
Sulfolobus solfataricus was not detected using our established taxon cutoff criteria (
Table 2), which is likely due to a combination of factors: this rRNA gene is AT rich and thus does not hybridize as efficiently, and this taxon's random assignment in the mock community resulted in 50 ng as the largest amount of dscDNA hybridized, which was the lowest of all the taxa (
Fig. 1A). Though it was not detected, the correlation between the amount hybridized and the hybridization score of the target taxon was significant across the range of dscDNA hybridized (1 to 125 ng of dscDNA;
P < 0.01) (
Table 2). This indicates that beyond detection, knowing the changes in the relative abundances of community members still gives important information regarding community dynamics, an element that can be captured by this method.
Chromium-contaminated microcosm experiment.
Three replicate subsamples were taken from one microcosm containing chromium-contaminated groundwater from the Hanford 100H site (Hanford, WA). After stimulation by lactate (added as the electron donor and carbon source), samples were taken when both nitrate and chromate were reduced to below detectable levels, as determined by ion chromatography and inductively coupled plasma-mass spectrometry, respectively (methods are detailed in reference
26). A dendrogram of microbial community composition by PCR-amplified DNA and by dscDNA methods showed that communities differed significantly at 90% confidence by these two methods (MRPP
A = 0.601,
P = 0.096) (see Fig. S2 in the supplemental material). Total richness was significantly higher in the PCR-DNA community than in the dscDNA community (see Table S3). Among the dominant phyla detected, there was significantly more richness in the PCR-DNA communities than in the dscDNA communities in the phyla
Acidobacteria,
Crenarchaeota, and
Firmicutes (see Table S3). This suggests that these taxa were present but not active, which is unsurprising, given that these communities were sampled following denitrifying conditions stimulated by the addition of organic carbon.
Tropical forest soil lignocellulolytic lab incubation experiment.
Using Puerto Rican tropical forest soil as an inoculum, we performed a lab incubation to examine the activation of the soil microbial community by adding switchgrass as the sole carbon source. We observed increased CO
2 respiration, H
2S production, and CH
4 production in the jars containing switchgrass versus the jars without an additional carbon source, as well as significantly higher concentrations of RNA (
Fig. 2A). Low RNA yields from the low-activity (soil only) samples precluded dscDNA direct hybridization microbial community analysis, so, instead, dscDNA was constructed from the switchgrass-amended, high-activity samples using only gene-specific primers and random hexamers (GSP-dscDNA and N6-dscDNA, respectively). Comparing the GSP-dscDNA communities to N6-dscDNA showed that the profiles were separated in ordination space but not significantly different (
Fig. 2B) (MRPP
A = 0.4411,
P = 0.104). Because there is a higher concentration of SSU sequence in the pool of dscDNA generated from gene-specific primers, all further dscDNA generation was done with gene-specific primers. Direct hybridization of RNA detected a significant difference at 90% confidence between amended and nonamended communities (
Fig. 2C) (MRPP
A = 0.801,
P = 0.093). A Mantel test comparing the microbial community distance matrices derived from RNA direct hybridization compared to PCR-DNA hybridization revealed that these two methods produced well-correlated community profiles (Mantel
r = 0.5568,
P = 0.095).
Based on hybridization scores in the RNA communities, almost all of the taxa in switchgrass-amended microcosms were significantly enriched relative to nonamended microcosms (1,845 out of 1,851 total observed taxa); in the PCR-amplified communities, zero taxa were significantly different. There were 35 taxa 100-fold enriched in switchgrass-amended compared to unamended microcosms by dirRNA PhyloChip: 1 Bacteroidales, 10 Enterobacteriales, 14 Clostridia, and 10 Bacilli. Comparing richness in switchgrass-amended versus nonamended microcosms in the RNA samples revealed that taxa in the classes Actinobacteria, Bacilli, Betaproteobacteria, Cyanobacteria, Gammaproteobacteria, Methanomicrobia, Planctomycetes, and Spirochaetes were all activated by switchgrass addition; the same analysis comparing richness based on PCR-DNA in amended versus nonamended samples did not reveal any of these differences (see Table S4 in the supplemental material). By combining the data from dscDNA hybridization and PCR-amplified DNA hybridization, we can calculate dscDNA/DNA hybridization score ratios as indicators of activity in the active (amended) samples only; these ratios ranged from 0.01 to 11.3. There were only 25 taxa that had dscDNA/DNA ratios higher than 2, and identifications of the active taxa matched well with those identified by dirRNA and dscDNA, with the exceptions of the archaea, which remained undetected by this ratio method (see Table S5).
Sewage diffusion chamber experiment.
Secondary sewage was incubated in diffusion chambers and immersed in bay water for the observation of microbial community temporal dynamics following a simulated sewage spill or storm water runoff. Triplicate chambers were sampled preincubation and after 3 days and analyzed by each of the three methods (PCR-DNA, dscDNA, and dirRNA) to profile microbial communities. Mantel tests indicated no significant correlation between communities detected by the three methods (
Table 1). Hierarchical clustering demonstrated that the samples grouped by microcosm in the PCR-DNA and dirRNA samples and by time in the dscDNA samples (
Fig. 3A and B) although the difference in dscDNA community compositions between time points was not significant. The dirRNA method also detected significantly lower total richness (891 taxa) than the dscDNA and PCR-DNA methods (1,180 and 1,110 taxa, respectively); the latter two were not different from each other (ANOVA,
F = 18.5 and
P < 0.0001).
To examine the differences between PCR-amplified and RNA-based hybridization methods, we examined taxa that decreased significantly over time as measured by hybridization score. Between T0 to T3 there were 1, 41, and 55 taxa that decreased significantly for the PCR-DNA, dscDNA, and dirRNA methods, respectively (see Table S6 in the supplemental material). There was good agreement at the family level among the taxa detected with decreasing richness over time for both dirRNA and dscDNA direct hybridization. Two taxa in the family Comamonadaceae and one in Porphyromonadaceae were found in both dscDNA and dirRNA data sets, which exhibited significant changes in relative abundance. There were several taxa that decreased over time in the dscDNA and dirRNA communities compared to the one taxon that decreased in the PCR-DNA community.
A comparison of the two RNA-based methods showed that there were more active taxa detected by dirRNA (
55) than by dscDNA (
41), including many families typically prevalent and active in sewage communities, such as
Enterobacteriaceae and
Clostridiaceae (
23,
59) (see Table S6 in the supplemental material). Most taxa (858) were detected by all three methods. Within the taxa detected by PCR-DNA, there were 128 taxa detected by the dirRNA method but not the dscDNA method, mostly in the families of
Bacillaceae,
Enterococcaceae, and
Lachnospiraceae (see Table S7). This is in contrast to zero taxa that were detected by dscDNA and PCR-DNA but not by the dirRNA method, suggesting that direct RNA hybridization is more sensitive in detection of certain community members than the dscDNA method.
DISCUSSION
Here, we present two broadly applicable methods for direct detection of microbial communities in natural environments, both of which provide cost-effective alternatives to PCR-amplified microbial community analysis. Based on tests with samples from groundwater, soil, and secondary sewage in seawater, these methods should be applicable to many diverse types of environmental or clinical samples, with good resolution of the different sample types that do not parse by method (
Fig. 4). Starting with a genetic pool that is free from bias and distortion introduced from PCR amplification and combining this with the rapid, reproducible, and cost-effective microbial community analysis of high-density microarrays create a powerful method for efficiently querying myriad microbial communities with replication, as well as analysis tuned for maximum statistical power and fidelity of microbial community dynamics.
The mock-community analysis contained evidence of cross-hybridization, an issue which has been previously observed (
18), and in both studies, false positives were largely limited to the family level. While this could pose problems for low-richness samples, overinflated richness of closely related sequences is a problem for many kinds of community analysis (
9,
24), and increased sampling tends to result in increased apparent richness based on nonasymptotic accumulation curves (
6,
9,
46,
54). Though more diversity should not be captured in a community of limited richness, contamination is possible in strains without selective markers and was detected with pyrosequencing; even culture collections of isolates can carry trace impurities that would be detectable by this sensitive method. Both a denoising algorithm and a new version of PhyloChip analysis that eliminates cross-hybridization seem to reduce this problem (
27,
46).
We observed decreased taxon richness detected by the RNA-based methods compared to the PCR-DNA method, with the exception of the dscDNA sewage samples, which suggests that only a subset of the standing community is active. That the T0 sewage samples had equal levels of richness in active (by dscDNA) and standing (by PCR-DNA) communities suggests that the active community is as diverse as the standing reservoir community. We observed fidelity in community profiles within an environment across different methods of microbial community detection, evidenced by ordination or hierarchical clustering that separated dscDNA and PCR-DNA communities. Though the dirRNA and PCR-DNA communities grouped by replicate while dscDNA communities grouped by time point (
Fig. 3), we detected more taxa with decreasing metabolic activity over time in the dirRNA and dscDNA communities than in the PCR-DNA communities (see Table S6 in the supplemental material). This could be because the RNA-based methods are more sensitive in detecting changes in active microbial populations. These observations are consistent with the properties of the sewage community, where the inactive bacterial DNA could persist in the environment for prolonged periods of time (
29). Likewise in the unamended tropical soil-derived microcosms, there were more phyla with significantly lower richness in the RNA communities than in the PCR-DNA communities (see Table S4). There are some instances where the RNA methods detected organisms not detected by PCR (see, e.g., Table S7), which could represent highly active organisms, or relatively low concentration organisms, or both. The PCR-DNA method is thus more a measure of composition, where the bias of amplification magnifies the contribution of rare species, whereas the dscDNA data reflect the metabolically active populations.
Examination of microbial communities based on both rRNA and DNA (rRNA genes) revealed different members in each pool, with some taxa detected by one method but not the other. Taxa detected by the RNA methods but not the PCR-DNA method could be rare species that are highly active, that contain high concentrations of rRNA, or that contain sequences not amplifiable by our primers. Taxa detected only by PCR (DNA only) are likely largely inactive but abundant (
38). While rRNA gene copies are generally constrained within a given species, rRNA abundance can vary by up to 5 orders of magnitude, depending upon the activity of the population (
32,
56), and may change rapidly, even in a few hours (
39,
43). Variations in 16S rRNA gene copy numbers between species are at least partially responsible for differences in growth rates (
32,
33,
39), as could be polyploidy in rapidly multiplying cells, but because the rRNA concentrations change over shorter time scales due to changes in activity, the RNA/DNA ratio (or dscDNA/DNA, as we used here) may be used as an indicator of cellular activity for single species (
40,
43) and has been applied to microarray data (
16). The few known exceptions to the linear relationship between activity and rRNA abundance seem to be at least partially related to unusual growth conditions not typical of a natural environment (
3,
47) though all measures of this relationship so far have been in culture. The microarray method has a low detection limit of 10
7 copies, or about 0.01% of the total nucleotides applied (
7), so the lack of detection of certain taxa in one nucleic acid pool over the other is interpreted to mean that the rRNA dipped below the detection limit rather than being actually absent, allowing us to infer activity status for an even wider swath of the microbial community.
The two methods presented, direct hybridization of RNA (dirRNA) and direct hybridization of double-stranded cDNA (dscDNA), are both valid ways of exploring microbial community structures, each with advantages and disadvantages. RNA direct hybridization involves the least manipulation of environmental nucleic acids, but the short half-life that makes RNA useful for encoding messages in cells also makes it difficult to work with. This was a limiting factor in the tropical forest soil lab incubations, where low activity resulted in low RNA yields and precluded some analyses (
Fig. 2A). The dscDNA method employs essentially one round of amplification, which should remain free of the amplification biases that impact analysis of communities by PCR amplification, which typically requires 25 to 35 amplification rounds before analysis; primer binding efficiency may play a role though we assume that it is the same per sequence across different environments. The dscDNA method has an additional advantage in that the chemistry in the labeling and hybridization reactions is the same as for the PCR-amplified DNA reaction, which has been employed successfully since 1996 (
10), has been in use with the PhyloChip since 2003, and now claims 41 publications at the time of this writing (
6,
18,
27). The different labeling reagent and hybridization conditions required for RNA direct hybridization result in a slightly lower quality scan (see Fig. S1 in the supplemental material) and, while still within the limits of quality control, likely results in lower reproducibility and higher variability between samples.
There is some precedent for direct hybridization of rRNA onto microarrays, which have been in development for detection of rRNA and rRNA genes derived from complex microbial communities for over 15 years (
25). Proof-of-principle for reproducible and specific detection of microbial communities from the environment was established by Small et al. with pure cultures of
Geobacter chapellei and
Desulfovibrio desulfuricans by performing direct rRNA hybridization onto a custom microarray fabricated for the detection of these two metal-reducing strains with bioremediative potential (
49). The work of Small et al. supported the specificity obtainable with direct hybridization but ran into problems with reproducibility, variability, and background. Our method of dscDNA direct hybridization offers a solution to this problem with minimal additional sample manipulation and at a lower cost than PCR amplification (see Fig. S1 and Table S7 in the supplemental material). Direct hybridization of large-subunit (LSU) rRNA developed for microbial community analysis by microarray has also been recently presented, and the authors experienced problems with background and nonspecific hybridization (
41). The dscDNA direct hybridization method presented is a novel solution to the minor issues involved with direct rRNA hybridization, with both methods conforming to established quality control measures. Ultimately, it is left to the investigators to choose the method of direct hybridization based on the unique qualities of their own experimental design and samples, knowing that PCR amplification-free microbial community analysis provides a faithful and cost-effective representation of naturally occurring, metabolically active microbial communities.
Conclusion.
Direct hybridization of rRNA (dirRNA) and rRNA-derived double-stranded cDNA (dscDNA) to high-density microarrays represents a simple and economical way to directly query microbial communities in natural environments. Microbial communities revealed by direct hybridization display different profiles from the corresponding PCR-amplified DNA communities, likely reflecting both PCR amplification biases and differences in the microbiology of active versus inactive populations in environmental samples. Both methods offer viable alternatives to PCR-amplified microbial community profile methods and offer the additional advantage of detecting activity in populations rather than the simple presence of organisms. Specifically targeting active members of microbial communities facilitates identification of microbial taxa that attenuate toxins in the environment, drive biogeochemical cycles in natural systems, or proliferate in disease states.
ACKNOWLEDGMENTS
This work was conducted in part by the Joint BioEnergy Institute, the Sustainable Systems Science Focus Area in Subsurface Biogeochemical Research Program, and by ENIGMA Scientific Focus Area, a Genomics Foundational Science Program. These programs are part of the Office of Science, Office of Biological and Environmental Research, of the U.S. Department of Energy under contract DE-AC02-05CH11231 to Lawrence Berkeley National Laboratory (LBNL). This work was also supported in part by the California State Water Resources Control Board Proposition 50 Clean Beaches initiative grant, a Seaborg Fellowship to K.M.D., and an LBNL contractor-supported research grant to C.H.W.
T.Z.D. is employed at both LBNL and Second Genome, Inc. LBNL has patented the PhyloChip, and Second Genome provides PhyloChip laboratory services. G.L.A. is on the Scientific Advisory Board of Second Genome.
We also gratefully acknowledge Dominique Joyner for cell materials, Yvette Piceno for helpful discussions in analyzing the data and preparing the manuscript, Ken Vogel of the USDA for switchgrass used in this study, and the East Bay Municipal Utility District for providing sewage samples.
K.M.D., E.L.B., and G.L.A. designed the method development. K.M.D., C.H.W., H.R.B., E.L.B., T.C.H., and G.L.A. designed the validation experiments. K.M.D., C.H.W., H.R.B., R.C., E.L.B., J.L.F., S.R.O., M.E.S., and L.M.T. performed the work. K.M.D., C.H.W., T.Z.D., and L.M.T. analyzed data. K.M.D., C.H.W., E.L.B., and G.L.A. wrote the paper.