Assembly and analysis of ToBRFV genomes and design of hydrolysis probe RT-PCR assays.
In order to design ToBRFV-specific primers and probes for hydrolysis-probe RT-PCR assays, all ToBRFV genomes available in February 2021 were obtained. These were supplemented with new genomes assembled from stool samples processed and sequenced in this study (
Table 2).
In February 2021, all nearly complete genomes (
n = 70) of ToBRFV were downloaded from NCBI GenBank. In the same month, raw reads from the only publicly available wastewater metatranscriptomics data set (obtained from wastewater in the Bay Area, collected between May and July 2020; BioProject accession no.
PRJNA661613) were also downloaded. Using these reads, five ToBRFV genomes were assembled as outlined below.
In addition to using existing sequencing data and genomes, RNA from three human stool samples obtained longitudinally from one individual were also sequenced; the first two samples were collected 10 days apart, and the third was collected 93 days after the second sample. The samples were obtained from an individual with laboratory-confirmed COVID-19 and were collected under an Institutional Review Board (IRB)-approved protocol (Stanford IRB protocol 55619). Total RNA was extracted from these samples, rRNA was depleted, and libraries were prepared and sequenced using NextSeq 550 as outlined in the supplemental material.
The following bioinformatic methods were used to assemble genomes from both the existing (from wastewater) and newly obtained (from stool) metatranscriptomics reads. Reads were trimmed with Trim Galore (version 0.4.0) using Cutadapt (version 1.8.1) (
28) set to flags –q 30 and –illumina. SPAdes (version 3.14.1) set to -meta was used to assemble genomes
de novo (
28,
29). Contigs belonging to ToBRFV were classified using One Codex (
30). Genes were annotated using Prodigal (version 2.6.3) set to -meta (
31). If all genes were predicted on the negative strand of the contig, the entire contig was reverse complemented. The completeness of potential ToBRFV genomes was assessed using CheckV (version 1.0.1) (
32), and genomes that were >90.0% complete were selected for subsequent analyses.
To assess strain diversity of ToBRFV in the longitudinal stool samples, RNA sequencing reads from stool samples were aligned to the ToBRFV reference genome (NCBI accession no.
NC_028478) using Bowtie (version 2.4.2) (
33). The resulting bam files were used as input into inStrain (version 1.0.0) (
34) to calculate population-level average nucleotide identity (popANI) between genomes.
To assess abundance of ToBRFV relative to other viruses in the RNA sequencing (RNA-Seq) data, reads were classified against the Viral Kraken2 database (
https://benlangmead.github.io/aws-indexes/k2) (
35) using default parameters. Counts from the classification were used to calculate relative abundance of viral reads.
A multiple-sequence alignment of all nearly complete genomes of ToBRFV, including genomes downloaded from NCBI GenBank in February 2021 (70 genomes) and those we assembled from wastewater and stool (8 genomes), was performed using Geneious Alignment (Geneious Prime version 2021.0.3) (
36) with default settings, global alignment with free end gaps, and cost similarity matrix set to 65.0%. SNPs were called from the multiple sequence alignment using SNP-Sites (version 2.5.1) (
37). A phylogenetic tree was built using Geneious Tree Builder (version 2021.0.3) with default settings and a Tamura-Nei genetic distance model with the neighbor-joining method. Primers and probes were designed to be specific for ToBRFV using Geneious Primer (version 3 2.3.7) (
38) based on the 78 genomes we had access to in February 2021 with near-default settings, requiring product size to be between 95 and 125 bp and primers to be based on the consensus with 100.0% identity across all ToBRFV genomes. Primers and probe sequences were screened for specificity,
in silico, using NCBI BLAST.
Processing of animal stool samples for RNA quantification.
One stool sample each was collected from (i) a single animal (cat, dog, horse, pig, and rabbit) raised as a pet, (ii) a group of cohabiting animals of a single kind (chicken, cow, goat, mouse, and sheep) from Deer Hollow Farms (California, USA), (iii) a group of cohoused ducks and geese at Deer Hollow Farms, and (iv) wild animals (bear and deer). Samples were collected in a sterile clinical stool collection container by individuals wearing gloves and using a spatula. Samples were transported at room temperature, aliquoted into cryovials, and stored at −80°C within 12 h from collection. Samples were further processed within a month of storage and did not go through any freeze-thaw cycles prior to the current work.
A single, defined solid volume of sample of each animal stool was acquired using Integra Miltex biopsy punches with a plunger system (Thermo Fisher Scientific; catalog no. 12-460-410) and placed in independent microcentrifuge tubes. Five hundred microliters of RNAlater (Ambion; catalog no. AM7023M) was added, and samples were processed using a previously validated methodology (
24) as follows. A stock BCoV vaccine was prepared by adding 3 mL of 1× phosphate-buffered saline (PBS; Fisher Scientific; catalog no. BP399-500) to one vial of lyophilized Zoetis Calf-Guard bovine rotavirus-coronavirus vaccine (catalog no. VLN 190/PCN 1931.20) to create an undiluted reagent as per the manufacturer’s instructions. Ten microliters of this attenuated BCoV vaccine was added to every sample as an external control and vortexed for 15 min. BCoV is an RNA virus that was previously found to be a reliable positive control for RNA extraction from stool (
24). Samples were processed immediately after addition of the BCoV control.
Quantification of viral RNA sequences by ddRT-PCR.
The CP gene encoding the coat protein from PMMoV, the genes encoding the movement protein (Mo) and RNA-dependent RNA polymerase (RdRP) from ToBRFV, and the gene encoding the membrane (M) protein from BCoV were quantified using ddRT-PCR. We chose ddRT-PCR instead of RT-qPCR for nucleic acid detection and quantification because of its superior sensitivity and resistance to PCR inhibitors (
24,
25).
Templates derived from non-human animal stool were assayed for the PMMoV and ToBRFV gene targets in their undiluted and 1:10 diluted formats. Templates derived from human stool were assayed for the PMMoV and ToBRFV gene targets in their undiluted format. However, 30 of these templates yielded ddRT-PCRs where all the droplets were positive. This is not ideal, because ddRT-PCRs rely on a Poisson distribution of the template across droplets to accurately quantify gene targets. Therefore, these templates were diluted 1:10,000 and reassayed for the relevant gene targets. Additionally, templates from eight samples were randomly chosen, diluted 1:10, and assayed for the PMMoV and ToBRFV gene targets to detect any inhibitors. Templates derived from wastewater samples were diluted 1:10,000 before assaying for PMMoV and ToBRFV gene targets. In cases where these gene targets were undetectable at this high dilution, we assayed templates at 1:100 and 1:10 dilutions and in an undiluted format. Templates derived from stormwater samples were assayed for the PMMoV and ToBRFV gene targets at three dilutions: 1:10,000, 1:1,000, and 1:10. Results reported are from the 1:10 dilution, since the gene targets were undetectable at higher dilutions.
Human participants in this study were enrolled and hospitalized during the first year of the COVID-19 pandemic. We tested their stools for genes encoding the envelope (E) and a nucleocapsid (N2) protein from the SARS-CoV-2 genome as previously described (
24), in order to assess occurrence of COVID-19 during hospitalization at Stanford Hospital. However, we did not find any presence of COVID-19 RNA in these samples. Sequences of the newly designed primers and probes targeting ToBRFV Mo and RdRP genes are listed in
Table 1. Previously published primers and probes targeting BCoV, PMMoV, and SARS-CoV-2 RNAs are listed in Table S4.
The droplet digital PCR application guide for QX200 machines (Bio-Rad) (
45) and digital minimum information for publication of quantitative real-time PCR experiments (dMIQE) guidelines (
46) inform this methodology. The experimental checklist recommended by dMIQE is available at the Stanford Digital Repository (
https://purl.stanford.edu/nf771cs9443). A Biomek FX liquid handler (Beckman Coulter) was used to prepare the ddRT-PCR by adding 5.5 μL of eluted RNA to 5.5 μL supermix, 2.2 μL reverse transcriptase, 1.1 μL of 300 nM dithiothreitol (DTT), 1.1 μL of each of the 20× custom ddPCR assay primer-probe mixes (Bio-Rad; catalog no. 10031277), and 5.5 μL of nuclease-free water (Ambion; catalog no. AM9937, lot 2009117). The supermix, reverse transcriptase, and DTT were from the one-step ddRT-PCR Advanced kit for probes (Bio-Rad; catalog no. 1864021). A QX200 AutoDG droplet digital PCR system (Bio-Rad) was used to partition the samples into droplets of roughly 1 nL using the default settings, and the template was amplified using a Bio-Rad T100 thermocycler with the following thermocycling program: 50°C for 60 min, 95°C for 10 min, 40 cycles of 94°C for 30 s and 55°C for 1 min, followed by 1 cycle of 98°C for 10 min and 4°C for 30 min with a ramp speed of 1.6°C per s at each step (
47).
A multistep approach was adopted to calculate the raw RNA concentrations, as previously described (
24). Every plate in the ddRT-PCR assays included appropriate positive and negative controls, including synthetic target genes (PMMoV CP gene, and ToBRFV Mo and RdRP genes) cloned in the pIDT vector, RNA extracted from reconstituted attenuated BCoV vaccine, water, and RNAlater. The signal threshold corresponding to every plate was manually set between the mean positive and negative amplitudes of these controls such that the number of detected copies in the negative controls was minimal and those from the relevant positive controls most closely matched the expected RNA concentration. Next, the difference between the mean negative amplitude and the threshold amplitude in the negative-control reactions was calculated and added to the mean negative amplitude for every sample on that plate. Applying this threshold yielded the raw RNA concentrations.
In order to derive the limit of blank (LoB) and limit of detection (LoD) of our assays to further process the raw RNA concentrations we adopted the following steps. (i) The LoB indicates the highest background RNA concentration registered from control samples that are confidently negative for the relevant gene targets. In order to determine the LoB, water, RNAlater, and synthetic genes discordant with the target gene (e.g., the ToBRFV Mo gene is a negative control in an assay of the ToBRFV RdRP gene) were assayed in duplicate. The highest RNA concentration measured in these LoB samples for each of the primer/probe sets was set as the relevant LoB. All samples in which we detected an RNA concentration equal to or less than the LoB were set to zero. (ii) The LoD is defined as the lowest concentration of RNA that can be reliably detected. To determine the LoD, duplicate serial dilution series of the synthetic target genes at 1, 2, 5, 10, 100, and 1,000 copies/μL of template were assayed for the corresponding target gene (Fig. S1). The synthetic target genes were acquired from Integrated DNA Technologies and cloned in their standard backbone, pIDTSmart. These plasmids were transformed into E. coli, isolated using the QIAprep Spin miniprep kit (Qiagen; catalog no. 27104) and quantified using Qubit. The LoD for a primer/probe set was defined as the lowest concentration of the standard at which both replicates had a detectable RNA concentration. All viral RNA concentrations below the LoD were set to zero.
Finally, after these data processing and analysis steps, the samples were assigned a final viral RNA concentration in copies per microliter of template. “Eluate” refers to the 100 μL of sample acquired from the RNA extraction. Viral RNA concentrations from animal and human stool samples are expressed in copies per microliter of template, those from wastewater samples are in copies per gram of wastewater, and those from stormwater samples are in copies per liter of stormwater.
In the case of all non-human animal stool, wastewater and stormwater samples, RNA was quantified using singleplex reactions. For the human stool samples, which were limited in quantity, the detection of the BCoV M gene and PMMoV CP gene were multiplexed with the detection of the SARS-CoV-2 E and N2 genes using orthogonal fluorescent probes. After extensive optimization (outlined in the supplemental material), we paired the detection of the E gene (SARS-CoV-2) with that of the CP gene (PMMoV) and detection of the N2 gene (SARS-CoV-2) with that of the M gene (BCoV) in two independent reactions using the carboxyfluorescein (FAM) and hexachlorofluorescein (HEX) fluors, respectively.