INTRODUCTION
Severe acute respiratory syndrome coronavirus 2 (SARS-CoV-2) RNA is commonly shed in the feces of individuals infected by the virus (
1 - 3). This has led to the widespread adoption of wastewater-based epidemiology (WBE) for tracking coronavirus disease 2019 (COVID-19) trends in communities. SARS-CoV-2 RNA concentrations in wastewater correlate with measures of COVID-19 incidence (
4 - 8), hospitalizations (
6,
9,
10), and deaths (
8,
11). Although these correlations provide proxies for the relative levels of transmission in a sewershed over time, a mechanistic link between fecal shedding of SARS-CoV-2 RNA and wastewater monitoring data would strengthen the utility of WBE methods. For example, epidemiological models estimating total infections in a community (i.e., prevalence) (
12,
13) or effective reproductive number (Re) (
14) rely on the integration of fecal shedding and wastewater transport models. Establishing a link between epidemiological models and fecal shedding would also aid in identifying desirable sewershed sizes (
15), choosing sampling frequencies (
16), knowing when and how to normalize measurements (
17), and probing the differences sometimes observed between wastewater and clinical trends (
8,
10,
18).
WBE models require quantitative and longitudinal fecal shedding data so that the RNA sequences shed by individuals are accurately integrated into bulk wastewater concentrations. To date, there have been numerous published studies on SARS-CoV-2 RNA presence and abundance in stool, including several reviews (
1 - 19 - 25 - 25). These important studies demonstrated that SARS-CoV-2 RNA is shed in feces; however, most of the data in these studies lack the external validity necessary to generate accurate WBE models. In other words, they do not include sufficient methodological and sample information for the data to be directly compared and integrated with other studies. For example, several studies have not specified the precise times that samples were collected in the infection (e.g., days after initial symptom onset). Quantitative data by quantitative PCR (qPCR) have often been reported in terms of cycle threshold (Ct) rather than absolute abundance or have not included methodological details that build confidence in the data (e.g., data recommended by MIQE guidelines such as method detection limits and negative controls) (
26,
27). Additionally, many studies have either not specified the amount of stool analyzed or reported the stool volume analyzed rather than mass analyzed. No study to date has reported SARS-CoV-2 concentrations on a dry mass basis. Dry mass concentrations lead to more accurate estimates of genome copies shed per day as dry stool production rates vary less than wet stool production rates (
28). These missing details in fecal shedding data have limited the epidemiological interpretation of wastewater data through modeling (
12 - 14).
CrAss-like phage (crAssphage) and pepper mild mottle virus (PMMoV) are viral indicators of human fecal contamination and are often measured alongside SARS-CoV-2 nucleic acid concentrations in wastewater samples (
4,
29). The measured pathogen viral nucleic acid concentrations are normalized by the crAssphage or PMMoV viral nucleic acid concentrations to account for differences in wastewater fecal strength and viral nucleic acid recovery. To date, there is very little quantitative data available on the levels and temporal trends of these biomarkers in feces. These quantitative data are important for mechanistically linking normalized wastewater pathogen measurements with community disease burdens and for identifying the scenarios in which biomarker normalization is appropriate. Models that have incorporated PMMoV fecal shedding data to estimate the fecal strength of wastewater have relied on very limited data on PMMoV fecal shedding (
12,
29). A quantitative understanding of biomarker shedding by individuals would help inform the scenarios that biomarkers serve as accurate, representative measures of community fecal loads.
In this study, we present externally valid quantitative fecal shedding trajectories of SARS-CoV-2 RNA and commonly used biomarkers PMMoV and crAssphage from 48 individuals who tested positive for COVID-19. The results show a highly individualized course of SARS-CoV-2 shedding over the first 30 days after initial onset of symptoms (ASO). Shedding of PMMoV and crAssphage was also highly variable between individuals over the same sampling period, and the two indicators exhibited distinct shedding patterns. Together, these results provide critical data for advancing the utility of WBE as a public health tool.
DISCUSSION
Here, we report quantitative and externally valid data on the fecal shedding of several targets of interest for wastewater-based epidemiology. This study features a large sample size and good sampling coverage over approximately 30 days. The resulting novel data set provides important information about the presence, magnitudes, and trends of viral nucleic acid fecal shedding among individuals. Included in our data set is evidence of SARS-CoV-2 fecal shedding in pre-symptomatic and vaccinated individuals.
There has been a sustained interest in the fraction of infected individuals that shed SARS-CoV-2 in feces, and we observed that approximately 22% of individuals who provided multiple samples did not shed SARS-CoV-2 in their feces up to day 28 ASO. A recent meta-analysis of fecal shedding of 38 individuals saw 52% of individuals did not shed SARS-CoV-2 RNA in feces; inspection of the included studies, however, revealed that the median number of samples collected per individual was 2 (
3). Presumably, if more samples were collected from each study participant, this percentage may decrease. The observed 22%, reported in our study, included only those individuals with more than three samples collected over >15 days of their infection; it is likely that increasing the resolution of samples collected over the first 30 days of infection (e.g., daily) would result in an even lower percentage of individuals without positive SARS-CoV-2 samples.
Our high-resolution data and relatively large study population provided a unique description of the prevalence of shedding through the first 30 days ASO. Approximately 80% of samples collected within the first 5 days were positive for SARS-CoV-2, and this percentage dropped to 10% of samples at 28 days ASO. Natarajan and colleagues (
2) collected samples from a large number of individuals (120) and collected three samples per individual over the 30 days ASO. They found that approximately 75% of individuals were shedding SARS-CoV-2 RNA in samples collected between days 0 and 7 and less than 25% of individuals were shedding SARS-CoV-2 RNA in samples collected between days 22 and 35 ASO. Although our data set ends at 28 days ASO, Natarajan et al. also observed shedding in a small fraction of individuals up to 288 days ASO.
The quantitative SARS-CoV-2 data presented here highlights the large variability in fecal shedding trajectories and magnitudes of SARS-CoV-2 between individuals. For example, of those who shed SARS-CoV-2 RNA at some point during their infection, peak shedding varied from as soon as 2 days ASO to 27 days sampling period (
Fig. S2) and peak values span approximately four orders of magnitude (
Fig. 1b). A study by Wölfel and colleagues (
1) contained the most comprehensive fecal shedding trajectories prior to our study, with a set that included 60 stool samples from nine participants with mild-to-moderate COVID-19. Wölfel et al. also observed peak shedding values that ranged from four orders of magnitude. They reported RNA copies on a wet stool mass basis and did not report the percent solids for their samples; therefore, we made assumptions about the solid contents of their samples to directly compare our quantitative data (see
Fig. S6 and S8 in the supplementary information). Overall, the SARS-CoV-2 quantities in our study are higher than those reported in the Wölfel study (
Fig. S8). Specifically, the Wölfel et al. maximum shedding value of 2.4 × 10
5 gc/mg-dry weight was an order of magnitude lower than the maximum concentration observed in our sample set (2.8 × 10
6 gc/mg-dry weight). Likewise, the geometric mean from Wölfel et al. (157 gc/mg-dry weight) was two orders of magnitude lower than our geometric mean (1.6×10
4 gc/mg-dry weight). These large differences in peak and mean concentrations may be due to the larger number of individuals included in our study (48) compared to the Wölfel study (
9).
Interestingly, some of the stool samples measured in our study that were positive for SARS-CoV-2 RNA at 25+ days ASO had relatively high concentrations (i.e., greater than 105 gc/mg-dry weight). Although the quantities reported in the Natarajan study cannot be directly compared with other studies due to missing data on stool masses, the relative abundance between samples suggests that the maximum fecal concentrations at 25+ days after symptom onset are nearly as high as the maximum concentrations measured at 0–7 days after symptom onset. Combined, the data from our study and the study by Natarajan demonstrate that although fewer infected individuals are shedding SARS-CoV-2 in their feces by 28 days after symptom onset, some can be excreting high levels of SARS-CoV-2 RNA.
These data and observations on SARS-CoV-2 fecal shedding have particular value for advancing the field of WBE. For example, the pre-symptomatic shedding helps explain why wastewater measurements sometimes precede COVID-19 clinical cases (
18). Likewise, observations of high levels of shedding weeks into an infection suggest that some individuals several weeks into their infection contribute substantially to measured wastewater signals. We anticipate that these data will be especially impactful for informing mechanistic models. Indeed, the need for high-quality fecal shedding data sets has been highlighted in published studies that use wastewater data in epidemiological models (
12 - 14). The idiosyncrasies of the shedding trajectories highlight the potential complications of using models to directly predict epidemiological parameters such as community prevalence. Such attempts have often relied on static distributions rather than trajectories; however, recent work with polio and SARS-CoV-2 has demonstrated methods for incorporating time-varying shedding into mechanistic WBE models (
10,
31). Using a static distribution for fecal shedding assumes a uniform likelihood of being at any stage in the shedding period. However, early in an outbreak, the majority of infected individuals are likely in the early stages of infection, while late in an outbreak, infected individuals are more likely a mix of early- and late-stage infections. We see that different individuals shed at dramatically different rates at different stages of the first 4 weeks after symptom onset. The impact of these patterns can be explored in the future by integrating our data into time-varying fecal shedding models of WBE systems.
This work also fills a critical research gap on the fecal shedding of two commonly used fecal indicator organisms, namely PMMOV and crAssphage. WBE studies routinely present pathogen nucleic acid concentrations on a per PMMoV or crAssphage nucleic acid concentration basis to normalize for differences in wastewater fecal strength and the analytical recovery of viral nucleic acids. Biomarker fecal concentrations have been applied in WBE models (
12,
14), but the data have been limited by the number of individuals observed, the external validity of the measurements, or both. The PMMoV RNA and crAssphage DNA quantities presented here for 48 individuals over time therefore significantly improve available information on the absolute abundance and variability of these biomarkers in the stool.
We observed PMMoV in nearly all samples (96%) and in at least one sample from all individuals, with a median concentration of 1.0 × 10
5 gc/mg-dry weight and a maximum concentration of 5.0 × 10
8 gc/mg-dry weight. The limited previous studies on PMMoV RNA in the stool have detected it less frequently, with 40% of samples from five individuals detected by metagenomic techniques (
32) and 67% of samples from nine individuals detected by PCR (
33 ). Our PMMoV concentrations are within the range of three stool samples that were quantified in a previous study by RT-qPCR (
33). That study reported concentrations of 2.3 × 10
4, 3.6 × 10
6, and 2.0 × 10
5 gc/mg-stool. Assuming a 20% dry mass in their samples, the equivalent dry mass concentrations would be 4.6 × 10
3, 7.3 × 10
5, and 3.9 × 10
4 gc/mg-dry weight, respectively.
CrAssphage DNA was detected in 48% of our samples and in at least one sample from 79% of all individuals. The maximum fecal concentration of crAssphage was on the same order of magnitude as the maximum PMMoV fecal concentration, namely 2.4 × 10
8 gc/mg-dry weight; however, the median concentration of crAssphage, 2.1 × 10
3 gc/mg-dry weight, was nearly two orders of magnitude lower than the median concentration of PMMoV. We identified only one previous study that reported crAssphage shedding quantities. That study reported their concentrations with respect to wet stool mass and collected only one sample from each individual (
34). They observed 70% crAssphage shedding prevalence in 60 individuals infected with norovirus, 48% shedding prevalence in 96 healthy adults, and 69% shedding in 77 healthy children. They reported a wide range in crAssphage concentration, between 6.3 × 10
2 and 2 × 10
10 gc/g-stool. Other previous studies have documented crAssphage DNA in feces, but without quantification (
35 - 37).
The contrasting distributions of crAssphage and PMMoV shedding could have implications for their use as a normalizing factor for WBE results. The presence of PMMoV RNA in stool is largely due to the consumption of pepper products (
38), so the variability of PMMoV RNA fecal shedding likely relates to the range of diets between individuals and for individuals over time. These variations may complicate the practice of using PMMOV as a normalizing measure of fecal strength in some situations, such as for small sewersheds or for applications that assume similar PMMOV shedding between communities with different diets and in a single community over time. On the other hand, the bimodal fecal concentration distribution of crAssphage suggests that crAssphage DNA measurements may also not be a useful normalizing tool in some circumstances. Namely, large inconsistencies in the crAssphage concentration could occur depending on which individuals are contributing to the sample. These data are critical for understanding the utility of applying biomarker-based normalizing approaches to WBE, and we anticipate this biomarker shedding data will be used to assess the feasibility of such approaches in different contexts.
One of the most important aspects of the SARS-CoV-2, PMMoV, and crAssphage data sets presented here is that they are externally valid. In other words, the laboratory data were collected and reported in a way that makes the reported quantities useful beyond the context of this study. For example, we report absolute abundances of SARS-CoV-2 RNA. Relative abundance data, such as those reported as qPCR Ct are not able to be converted to precise absolute abundance measures without a standard curve calibrated for the methodological context. Likewise, we measured precise sample mass in our extractions and report our data on a per mass basis. Gene copy data reported per PCR reaction or nucleic acid extract volume without sample mass are not generalizable without making assumptions about sample collection. Finally, by reporting our results on a dry mass basis, we improve the ability to accurately estimate the quantity of SARS-CoV-2 RNA that is shed in an event or over a period of time. This is because the dry mass production of feces among populations is less variable than the wet mass production of feces (
28). Consequently, the conversion of SARS-CoV-2 measurements in fecal samples to the SARS-CoV-2 generated in feces by an individual will carry less uncertainty when the fecal data are reported on a per dry mass basis.
We note that the identified limitations of the available SARS-CoV-2 fecal shedding literature are likely due to different priorities between fields. For WBE, accurate fecal shedding data that are reported as absolute abundance per dry fecal sample mass is critical for linking observations in wastewater with population epidemiological measures, such as infection prevalence or R0 values. In other applications, SARS-CoV-2 fecal shedding has been investigated to better define COVID-19 disease and help in the identification and treatment of participants. In these cases, measurements that identify the presence/absence of the target or the relative abundance of the target between samples, as opposed to externally valid absolute abundances, have been sufficient. Nonetheless, externally valid and accurate quantitative data will have the broadest utility and benefit a wider range of fields interested in fecal shedding. We therefore encourage future fecal shedding studies to pursue external validity of fecal shedding measurements and incorporate dry mass data.
There are several limitations to our data set as well as opportunities for future work. First, the subjects included in this study were from a relatively small geographic area–all were residents of the San Francisco Bay area. Additionally, the age and demographic distributions, including vaccination status, of the sample population were not large and diverse enough to identify demographic effects on the fecal shedding of SARS-CoV-2, PMMoV, and crAssphage nucleic acids. Furthermore, the samples measured in this study were collected between September 2020 and April 2021, prior to the emergence of the delta or omicron variants and we do not yet know how fecal shedding dynamics are affected by different variants. A limitation of the biomarker data is that these measurements were only made in individuals who had tested positive for SARS within 30 days and future work should quantify crAssphage and PMMoV shedding in healthy individuals. Despite these limitations, these quantitative data fill knowledge gaps on SARS-CoV-2 and viral biomarker fecal shedding and are critical for the advancement of WBE data interpretation. Our data can be directly compared and consolidated with future SARS-CoV-2 shedding studies, including those focused on the effects of demographics, vaccine status, and variants on fecal shedding.