# Measurement Error and Resolution in Quantitative Stable Isotope Probing: Implications for Experimental Design

## ABSTRACT

**IMPORTANCE**One of the biggest challenges in microbial ecology is correlating the identity of microorganisms with the roles they fulfill in natural environmental systems. Studies of microbes in pure culture reveal much about their genomic content and potential functions but may not reflect an organism’s activity within its natural community. Culture-independent studies supply a community-wide view of composition and function in the context of community interactions but often fail to link the two. Quantitative stable isotope probing (qSIP) is a method that can link the identity and functional activity of specific microbes within a naturally occurring community. Here, we explore how the resolution of density gradient fractionation affects the error and precision of qSIP results, how they may be improved via additional experimental replication, and discuss cost-benefit balanced scenarios for SIP experimental design.

## INTRODUCTION

^{18}O-labeled “heavy water” (H

_{2}

^{18}O) as a universal substrate, since cells incorporate oxygen from water during nucleic acid synthesis, quantitatively reflecting cell division (DNA synthesis) and metabolism (RNA synthesis) (14, 15). Similarly, cell mortality rates may be quantitatively related to the degradation of unlabeled nucleic acids. In a similar manner, qSIP can be used to calculate enrichment of specific taxa due to incorporation of substrate

^{15}N or

^{13}C (16, 17). By normalizing relative abundance to the total number of organisms per fraction (estimated by quantitative PCR [qPCR] of the 16S-rRNA gene), the sensitivity of qSIP has been shown to be less susceptible to taxon abundance, the fraction of the community that incorporates isotope, and level of enrichment, compared to other SIP methods (10).

_{2}

^{18}O, they can be used to infer specific growth rates (13, 15, 19–21). However, even the most basic SIP experiment (e.g., one type of substrate with labeled and unlabeled versions, 2 treatments, 2 time points, 3 replicates, 10 density fractions per sample) can easily generate nearly 250 samples for processing and sequencing. Thus, it is critical to plan an experimental design that will ensure high-quality data at sustainable costs. Doing so will become even more important as the field of environmental microbiology transitions to more ambitious applications, such as metagenomics qSIP (MG-qSIP) and SIP-metatranscriptomics (22), since shotgun sequencing adds substantially higher costs relative to amplicon sequencing and the amount of data per sample may become a computational limitation.

*in silico*) density fractions and measured the effects of lower gradient resolution on per-taxon density shifts and unlabeled weighted mean density. We also calculated the repercussions of reducing the number of density fractions in replicated and unreplicated data sets from both marine and terrestrial microbial communities. Our results show that reducing the gradient resolution from an average density fraction size of 0.002 g ml

^{−1}(∼50 fractions) down to 0.011 g ml

^{−1}(∼9 fractions in a 5.1-ml tube) yields comparable shift detection and a detection limit of 0.005 g ml

^{−1}(equal to 9% enrichment with

^{13}C). We discuss using the small inherent variability between replicates as a way to define a shift detection limit and show that this inherent variability is comparable between replicates centrifuged together (within spin) and between replicates centrifuged separately (between spins). Finally, we stress the need for internal standards that can be spiked into each sample, and evaluate the statistical and financial benefits of different experimental design options.

Data set | Source | No. of replicates | No. of density fractions |
---|---|---|---|

1 | [^{13}C]naphthalene-enriched seawater | 1 | 50 |

2 | SPRUCE peatland H_{2}^{18}O addition | 5^{a} | 18 |

3 | Grassland soil H_{2}^{18}O addition | 3 | 9 |

4 | E. coli and P. putida | 4–6 | 30 |

5 | Mock community | 9 | 48 |

^{a}

## RESULTS

### Variability of taxon density shifts is greatest for rare OTUs.

*In silico*qSIP analyses have shown that these shifts are detectable in moderately to highly abundant operational taxonomic units (OTUs) (>0.1% relative abundance) (10). Using unlabeled sample experimental data from data set 2, we first examined variation in qSIP-derived density estimates for both rare and common OTUs. We found that OTU abundance (log

_{10}transformed) was positively correlated with the proportion of density fractions in which that OTU was detected (Pearson’s

*r*= 0.704,

*P*< 0.0001). In other words, the most abundant OTUs tended to occur in all (or nearly all) fractions of the density gradient, whereas the least abundant OTUs tended to occur in fewer density fractions. Variability of the unlabeled WMD was lowest for common OTUs and greatest for rare OTUs (Fig. 1). WMD variation (expressed as 2× standard deviation of the WMD for a given OTU) correlated negatively with the proportion of density fractions in which that OTU was detected (Fig. 1; Pearson’s

*r*= −0.560,

*P*< 0.0001). Similarly, WMD variation also correlated negatively with log

_{10}-transformed OTU abundance (Pearson’s

*r*= −0.217,

*P*< 0.0001).

### Density shifts are consistent across medium-to-high gradient fraction resolution.

*in silico*(every 2, 3, 4 fractions, etc.) to represent a range of fraction sizes spanning 0.002 to 0.028 g ml

^{−1}(4 to 50 fractions, assuming a standard 5.1-ml tube). We found that the estimated magnitude of OTU density shifts remained consistent at fraction sizes from 0.002 up to 0.011 g ml

^{−1}, expanding previous results that demonstrated this trend with fractions of 0.003 to 0.008 g ml

^{−1}(Fig. 2A) (10). This indicates that fractionating the sample into more than nine fractions had negligible additional effects on OTU density shifts.

*r*(

*r*< original number of fractions) compared to the density shift realized with maximum resolution. As relative error increases, power declines, so the likelihood that taxa are misassigned as nonisotope incorporators increases. We found a positive linear correlation between fraction size and mean relative error (

*R*

^{2}= 0.95; Fig. 2B). Additionally, the increase in the mean relative error is accompanied by an increase in its variation.

*n*= 5, preincubation temperature of 15°C); here we found the relationship is not linear. Instead, between 2 and 11 fractions, every additional fraction reduced the standard deviation exponentially, but for 12 fractions or more, the proportional differences are much smaller and linearly correlated with the number of fractions (Fig. 3A).

### Low variance influences minimum detectable density shift.

*n*= 10, time zero). Reducing the resolution of the density gradient may impact this variability. We evaluated this by merging and averaging data from an increasing number of consecutive fractions. Our initial analysis with medium resolution (fraction size 0.007 g ml

^{−1}; e.g., 18 fractions in a 4.7-ml tube) revealed that the WMD of abundant taxa varied little between replicates (Fig. 3B). The WMD per OTU was calculated over 10 replicates. The resulting WMDs varied around a median of 0.004 to 0.007 g ml

^{−1}at gradient resolutions varying from 0.007 to 0.034 g ml

^{−1}(3 to 18 fractions). With progressively fewer fractions (decreasing gradient resolution), variation in WMD remained statistically indistinguishable for fraction sizes between 0.007 and 0.027 g ml

^{−1}(4 to 18 fractions) but increased significantly (analysis of variance [ANOVA], Tukey 95% confidence level) with a fraction size of 0.034 g ml

^{−1}(3 fractions in a 4.7-ml tube) (Fig. 3B). This variability of the WMD as a function of fraction size can be used to choose an acceptable minimum shift in density which a researcher would like to detect with a desired confidence level.

^{−1}(9 fractions), using a shift detection threshold of 0.005 g ml

^{−1}or higher. Specificity was >95% for all gradient resolutions at a threshold of >0.003 g ml

^{−1}, but sensitivity was more impacted by a gradient resolution of >0.013 g ml

^{−1}(Fig. 4).

### Replication, statistical power, and detection thresholds in SIP experiments.

### Variation in mean weighted density is comparable within and between spins.

^{13}C-labeled Pseudomonas putida was aliquoted into replicates, ultracentrifuged in CsCl, and fractionated in an automated pipeline. The known genome differences in %GC of these organisms (i.e., distance between their peak densities) allowed us to calculate their WMD, even when both were unlabeled. When we compared the WMD mean of E. coli replicates between spins, we found the variation was comparable to the range of WMDs measured within a spin (up to 0.004 g ml

^{−1}) (Fig. 6).

^{−1}, whereas within-spin variation ranged up to 0.0056 g ml

^{−1}(see Data Set S1 in the supplemental material [raw data]).

### Using genomic mock communities to explore density-to-GC content conversion.

^{12}C) weighted mean. There is a canonical equation describing this function (31–33), but it has also been determined empirically in the past with a ladder of three bacterial taxa with different GC contents (9). If this equation were identical for each gradient in a SIP experiment (as is usually assumed), unlabeled replicates of the same organism should have the same weighted mean in every run. However, as shown here and previously, this is not the case (9, 16). To address this variation, we hypothesized that an external standard could help to calibrate the conversion of %GC to the mean weighted density for each sample. To test this, we ran nine replicates of a mock community (data set 5) with a wide range of known GC content, generated a calibration curve from each, and fitted a linear equation to it. In this mock community, which consisted of high-molecular-weight genomic DNA from four bacterial taxa, we found mean weighted density and %GC were highly correlated with a linear relationship (

*n*= 9,

*R*

^{2}> 0.94) but that the slopes and intercepts for each individual replicate varied (Table 1, Fig. S3). We found similar results when using the peak (highest) buoyant density per genome rather than the WMD. There was a significant difference between the WMD as calculated according to the canonical equation $\rho =\left(0.098\left[G+C\right]\right)+1.66$ (32) and the observed mean of WMDs per genome over all replicates (

*n*= 9, paired

*t*test,

*P*= 0.02). For example, using the canonical equation, the GC content of the four mock community members in replicate 1 are 38%, 46.5%, 62%, and 72.1%, respectively. These values deviate by several percent from the known GC content of these genomes (Table 2).

^{a}

Parameter or replicate | GC value (%) | Slope | Intercept | R^{2} | |||
---|---|---|---|---|---|---|---|

T. pseudethanolicus | B. licheniformis | B. longum | S. violaceoruber | ||||

Known value | 34.5 | 45.9 | 60.1 | 72.7 | |||

Replicate | |||||||

1 | 37.8 | 46.9 | 62.2 | 72.4 | 0.090 | 1.665 | 0.995 |

2 | 37.8 | 52 | 64.3 | NA | 0.099 | 1.664 | 0.984 |

3 | 40.8 | 48 | 64.3 | 75.5 | 0.093 | 1.666 | 0.993 |

4 | 39.8 | 48 | 64.3 | 76.5 | 0.098 | 1.664 | 0.995 |

5 | 40.8 | 50 | 66.3 | 78.6 | 0.099 | 1.665 | 0.997 |

6 | NA | 45.9 | 53.1 | 68.4 | 0.082 | 1.666 | 0.943 |

7 | 39.8 | 50 | 63.3 | 74.5 | 0.090 | 1.668 | 0.999 |

8 | 40.8 | 51 | 63.3 | 75.5 | 0.089 | 1.669 | 0.999 |

9 | 42.9 | 52 | 65.3 | NA | 0.087 | 1.671 | 0.999 |

Mean | 39.8 | 49 | 63.3 | 74.5 | 0.090 | 1.668 | 0.999 |

^{a}

*R*

^{2}for linear plots of observed versus expected values. The top row shows the known %GC in each genome. NA, not available.

## DISCUSSION

^{−1}in unreplicated

^{13}C experiments with would result in low type I and type II errors. This cutoff applies when a DNA density gradient has been divided into at least four density fractions, but we note that precision will increase if additional fractions are utilized. Adding additional replication would lead to increased statistical power using the same detection limit. However, we found that density shift estimates of taxon-specific isotope incorporation are broadly robust across a wide range of fraction sizes. For example, the relative error of high incorporators only varied by an average of 0.02% in shift from fraction size 0.002 to 0.011 g ml

^{-1}(

*n*= 5, 50 to 9 fractions in a 5.1-ml tube).

^{−1}) (10). When we examined the relative specificity and sensitivity of qSIP using real unreplicated data over an even wider range of fraction sizes, we found that gradient resolution and shift detection limit strongly affect specificity and sensitivity. However, the reliability of qSIP remained extremely high as long as the detection limit was 0.005 g ml

^{−1}or higher (sensitivity > 90% and specificity > 95%) regardless of gradient resolution. This detection limit is comparable to 2 standard deviations of unreplicated unlabeled WMD, further supporting this threshold. This means that if one decides to add experimental replication, even as few as 3 replicates, one can increase the power of this analysis to have very low type II error (assuming a 0.005 g ml

^{−1}detection threshold). Thus, our analysis can be used for experimental design based on the desired statistical power.

^{−1}), the increase in error compared to higher resolution is minor.

^{−1}) and keep to a relative error lower than 0.005 g ml

^{−1}. We show that for fraction sizes below 0.011 g ml

^{−1}, the mean relative error increases, as does its variability (in both replicated and unreplicated data sets). However, this increase in variability can be mitigated by reallocating effort toward replication. Indeed, reducing the number of fractions by just a factor of 2 allows for doubling the number of replicates without additional costs, while increasing the statistical power of downstream analyses.

*R*

^{2}but various slopes and intercepts. Instead, researchers may need to create a calibration curve of %GC/WMD (or peak density) per tube, using an internal standard. Accurate conversion between WMD and %GC is particularly important for MG-SIP experiments where only the heavy fractions of labeled samples are sequenced. Once genomic bins are assembled, their %GC can be converted into a theoretical unlabeled WMD which can be used to calculate the density shift, and thus the enrichment level of those bins. It is conceivable that a reliable calculation like this could obviate the need to analyze most of the unlabeled controls and is thus another experimental design customization that can save substantial labor and costs.

*post hoc*, we would still recommend the use of an internal standard.

### Conclusions.

## MATERIALS AND METHODS

*in silico*analyses: a high-resolution unreplicated SIP study of [

^{13}C]naphthalene in seawater, two medium-resolution replicated SIP experiments where H

_{2}

^{18}O was added to soils (SPRUCE [Spruce and Peatland Responses Under Changing Environments] peatland from northern Minnesota, Mediterranean grassland from northern California), replicated genomic DNA from pure cultures of Escherichia coli and Pseudomonas putida, and a replicated genomic mock community comprised of high-molecular-weight genomic DNA of Thermoanaerobacter pseudethanolicus, Bacillus licheniformis, Bifidobacterium longum subsp.

*infantis*, and Streptomyces violaceoruber, purchased from ATCC. Table 1 contains the number of density fractions and replicates per data set. As these experiments were performed by different laboratories, using slightly different protocols, we describe their SIP pipelines separately (see Text S1 in the supplemental material). All postsequencing steps were performed identically for all 16S rRNA operational taxonomic units (OTUs).

### Data set 1. [^{13}C]naphthalene-enriched seawater.

### Data set 2. SPRUCE peatland H_{2}^{18}O addition.

_{2}

^{18}O. For full details, see Text S1.

### Data set 3. Grassland soil H_{2}^{18}O addition.

_{2}

^{18}O. For full details, see Text S1.

### Data set 4. E. coli and P. putida cultures.

*n*= 8) or labeled (

*n*= 12) P. putida was processed on an automated SIP pipeline at the Joint Genome Institute. For full details, see Text S1.

### Data set 5. Genomic mock communities.

*infantis*, and Streptomyces violaceoruber (see Table S1 for accession numbers), and these genomes were selected for their distinct %GC content (34.5%, 46%, 60% and 73%, respectively). For full details, see Text S1.

### 16S rRNA gene-based microbial community composition in individual fractions.

### Density shifts.

*r*minus the shift per OTU in the original high-resolution

*r*

_{max}.

### Sensitivity analysis.

*n*= 10) samples from data set 2 time zero (SPRUCE soil, with five subsamples from each of two preincubation temperatures [15°C and 45°C]), we calculated the WMD for each OTU in each replicate and the standard deviation of the 10 resulting means for each of the 100 most abundant OTUs. We also assessed the effect of OTU abundance on WMD variability using all 320 OTUs from the same data set (Text S1). The use of standard deviation has an underlying assumption of normality. Consistent with this assumption, we observed only slight leptokurtosis in the distribution of taxon relative abundance as a function of density that is not thought to strongly influence inference (Fig. S1).

### Sensitivity to number of fractions.

*in silico*to simulate the results of these experiments had they been run with fewer fractions (

*f*), with the following principles. (i) Only adjacent fractions were combined. (ii) Fraction combinations were conducted to create new, combined fractions that were approximately equal in size and sequencing depth (i.e., with minimal variation in the range of densities represented by each simulated fraction). For example, to simulate an experiment where only two density fractions were collected instead of 18, we ran three possible scenarios using the original empirical data: combining the lightest 8, 9, or 10 fractions into a simulated fraction and combining the remaining heaviest fractions into a second simulated fraction. Similarly, to simulate an experiment with 10 density fractions instead of 18, we ran all 45 possible scenarios of combining eight pairs of adjacent density fractions with two uncombined single density fractions. In this way, we simulated data sets across the full range of possible fraction numbers, from 2 to 18. We did this to simulate typical approaches to SIP experiments, where fractions that span similar density ranges are usually combined. We computed the weighted-average density for each taxon in each replicate of each simulated data set using the midpoint of each density fraction (combined or not) and weighting each of those density values by the area under the curve for that fraction, where the curve is delineated by the plot of 16S rRNA gene copies versus density. The sum of those weighted densities, divided by the total area under the curve, yielded the weighted-average density metric used in subsequent analyses. For nonpermuted analyses, fractions were combined from preincubation samples (temperature, 15°C and 45°C [

*n*= 10]). For each permuted combination (using the replicated data set 2), we ran the qSIP code (13, 46) and estimated the atom percent excess

^{18}O for each replicate sample and then calculated the standard deviation of that estimate across all replicates (

*n*= 5, preincubation temperature of 15°C). Finally, we calculated the relative standard deviation as a function of increasing number of simulated fractions compared to the original number of fractions.

### Power analysis.

_{2}

^{18}O. An unlabeled control sample was sampled at day 0. For the power analysis, we omitted 27 taxa that did not occur in all 15 samples (

*n*= 5 for control, H

_{2}

^{18}O at day 5, and H

_{2}

^{18}O at day 10), 5 rare taxa (relative abundance of <0.026%), and 7 taxa identified as outliers for variance of the estimate of weighted average density (

*P*< 0.05, Grubb’s test). Excluding taxa with high variance will inflate power, but the effect is likely negligible given the small number of taxa excluded for this reason. Of the remaining 236 taxa, 211 were included in the power analysis. We used the observed variation among taxa in weighted average density shift, which ranged from −0.003 to 0.033 g ml

^{−1}. This captures a wide range of possible values of isotope uptake, from ∼0 to ∼60 atom% excess

^{18}O. We used

*in silico*resampling with replacement to estimate power. For each taxon at each sample date,

*n*random samples were drawn (with replacement) from each of the

^{18}O-labeled and unlabeled data sets, a

*t*test was performed, and the

*P*value was recorded. This was repeated 1,000 times, and power was estimated as the frequency of significant

*t*tests among the 1,000 simulations (47, 48).

*N*(number of sample replicates) was varied between 2 and 6 to simulate experiments with different replication by pruning or duplicating replicates from the original data set. Average power was calculated across all taxa. The upper 10th percentile of power was also calculated to estimate power typical for more dominant taxa.

### Data availability.

## ACKNOWLEDGMENTS

## Supplemental Material

## REFERENCES

*Nature*403:646–649.

*Appl Environ Microbiol*68:5367–5373.

*Curr Opin Biotechnol*14:296–302.

*Nat Protoc*2:860–866.

*Trends Microbiol*18:157–163.

*Stable isotope probing: methods and protocols*. Springer, New York, NY.

*Environ Microbiol*6:73–78.

*In*Timmis KN (ed),

*Handbook of hydrocarbon and lipid microbiology*. Springer, New York, NY.

*Appl Environ Microbiol*81:7570–7581.

*Front Microbiol*9:570.

*In*Dumont MG, García MH (ed),

*Stable isotope probing: methods and protocols*. Springer, New York, NY.

*BMC Bioinformatics*21:37.

*Ecosphere*9:e02090.

_{2}

^{18}O.

*Appl Environ Microbiol*73:2541–2546.

^{18}O incorporation from H

_{2}

^{18}O into soil microbial DNA.

*Microb Ecol*61:911–916.

*Front Microbiol*7:1932.

*Environ Microbiol*20:1112–1119.

*In*Dumont MG, Garcia MH (ed),

*Stable isotope probing: methods and protocols*. Springer, New York, NY.

*Ecology*95:1162–1172.

*ISME J*12:3043–3045.

*ISME J*14:1520–1532.

*Methods Mol Biol*2046:163–174.

*Appl Environ Microbiol*75:5501–5506.

*ISME J*13:413–429.

*Nat Biotechnol*26:1029–1034.

*Environ Microbiol*8:1240–1250.

*Nat Microbiol*1:16057.

*ISME J*13:1814–1830.

*Stable isotope probing and related technologies*. ASM Press, Washington, DC.

*bioRxiv*doi:

*In*Birnie GD, Rickwood D (ed),

*Centrifugal separations in molecular and cell biology*. Butterworth-Heinemann, Oxford, United Kingdom.

*J Mol Biol*4:430–443.

*Appl Environ Microbiol*73:3189–3195.

*Landscape Urban Plan*37:137–146.

*World J Microbiol Biotechnol*22:363–368.

*ISME J*10:1925–1938.

*Microbiome*6:122.

*Front Microbiol*7:703.

*Bioinformatics*30:2114–2120.

*Bioinformatics*26:2460–2461.

*Appl Environ Microbiol*75:7537–7541.

*Bioinformatics*28:1823–1829.

*Appl Environ Microbiol*79:5112–5120.

*R: a language and environment for statistical computing*. R Foundation for Statistical Computing, Vienna, Austria. https://www.R-project.org/.

*Methods Mol Biol*2046:137–149.

*Statistical power analysis for the behavioral sciences*. Academic Press, Cambridge, MA.

*Annu Rev Ecol Syst*23:405–447.

## Information & Contributors

### Information

#### Published In

#### Copyright

#### History

#### Keywords

### Contributors

#### Editor

## Metrics & Citations

### Metrics

**Note: **There is a 3- to 4-day delay in article usage, so article usage will not appear immediately after publication.

Citation counts come from the Crossref Cited by service.

### Citations

If you have the appropriate software installed, you can download article citation data to the citation manager of your choice. Simply select your manager software from the list below and click Download.

## View Options

### View options

#### PDF/ePub

PDF/ePub### Get Access

#### Purchase Options

Subscribe and get full access to this article