The safety of vaccines and other biological products is a critical goal for industry and regulatory agencies. To achieve this, extensive and sometimes redundant testing is performed using various
in vitro and
in vivo assays (
7). The assays currently used for demonstrating product safety have been generally effective in mitigating the risk of adventitious virus introduction during manufacturing, thereby enhancing confidence in their use for human health. However, some recent incidences of virus detection in biological materials have highlighted the limitations of the assays and the need for considering new virus detection technologies (
12). With the recent emergence of advanced broad virus detection technologies (
1), such as high-throughput sequencing platforms, there is a new generation of powerful tools for the detection and characterization of nucleic acids, with higher throughput and no need for prior sequence knowledge. Although HTS is recognized as a powerful tool for detection of known and novel viruses that can potentially enhance safety of biologics, regulatory applications of the technology need method standardization to evaluate the performance of these technologies for detection of adventitious agents.
The aim of the present study was to evaluate independent high-throughput workflows, including sample and library preparation, sequencing methodologies, and bioinformatic pipelines for their capacity to detect four or five viruses belonging to different virus families with distinct structural features, including nucleic acid composition and topology, as well as capsid/envelope conformation. Such selection criteria are similar to those used in other virus spiking studies performed to evaluate clearance by the manufacturing process for product safety (
13). Although the three participating groups used different extraction techniques and sequencing platforms, the sensitivities of virus detection were very similar between the three participating groups (summarized in
Table S3 in the supplemental material).
Unexpectedly, all three pipelines were able to detect all tested viruses in complex cellular matrices but with some differences in the sensitivity of detection, which highlighted the importance of improving several stages of sample preparation and processing. The results from all three labs showed similar trends for the viruses detected. EBV was readily detected by all three labs. EBV is a relatively large (172-kb), enveloped, double-stranded DNA virus that belongs to the herpesvirus family. With its large genome size, it was not surprising that EBV showed potential for sensitivity better than 0.1 copies per cell. In fact, using lab C’s pipeline, the sensitivity of EBV detection can be extrapolated as low as 0.005 copies per cell (with about 30 to 35 reads). The greater sensitivity for the EBV genome may also reflect the absence of the need for reverse transcription and cDNA synthesis processes, which are needed for RNA viral genomes.
For RSV and FeLV, all three labs’ pipelines resulted in a higher number of reads detected for RSV than for FeLV. This was unexpected, as the number of genomes spiked in was identical for RSV and FeLV, and both viruses are enveloped and contain a single-stranded RNA genome of relatively similar size (about 9 kbp for FeLV and 15 kbp for RSV). This result potentially indicates that factors aside from genome size and genomic structure could also influence the sensitivity of detection by HTS. Perhaps the difference might be sequence dependent and due to secondary RNA structures resulting in inefficient reverse transcription and cDNA synthesis.
REO1 was detected in the RNA analysis from both lab A and lab C with limits between ∼1 and 0.1 copies per cell. However, for lab B, whose pipeline was not optimized for specific viruses (see discussion below), REO1 was detected only at three copies per cell in the cell substrate background. It should be noted that for double-stranded RNA viruses, the RNA denaturation temperatures recommended in some of the cDNA synthesis kits may not be sufficiently high to denature the double-strand RNA and allow primer and enzyme binding, thus affecting virus detection. The results from lab A and lab C suggest that the enrichment of RNA (by either nuclease digestion of host DNA or rRNA depletion) enhances the sensitivity of detecting dsRNA viruses such as REO1.
PCV1 was included in testing by lab A. Although it was not detected in the nucleic acid spiked sample, the virus was detected with almost full genome coverage in the virus spiked samples, single or mixed, at one copy per cell.
Sensitivity of HTS without enrichment steps.
Lab B’s sample preparation method was a nonspecialized pipeline that can capture both DNA and RNA viruses. This was designed to preserve the broadest specificity but at the expense of higher sensitivity of specialized sample preparation methods. Enrichment steps can be an issue when some viruses are not polyadenylated or if they share similarity with rRNA sequences. To keep sample manipulation to a minimum, lab B extracted both DNA and RNA molecules together as one fraction with no further nuclease digestion or ribosomal depletion. In addition, no nuclease treatment was employed to remove any host nucleic acid, and only a short, low-speed centrifugation was used to treat the sample. Thus, lab B’s sample preparation, compared to sample preparation methods used in labs A and C, represents an extraction technique that potentially contains a higher level of host nucleic acids impacting the viral sequence recovery.
The viruses used for this spiking study have an almost 1-to-1 ratio of genome equivalence to viral particles, indicating that there are very few nonencapsidated viral genomes. In less-purified samples, there might be 10 to 100 viral sequences per viral particle, leading to a bias in the detection where the majority of the detected viral sequences resulted from naked nucleic acids and not the actual viral particles. As there are very few naked nucleic acids, lab B’s extraction technique starts with a number of viral genomes that is similar to those used by labs A and C. In combination with the higher level of background nucleic acids, lab B’s sample preparation can be considered the baseline sensitivity achievable by HTS for viral adventitious agent detection.
As demonstrated by lab A and lab C, additional steps for viral extraction by separating the DNA and RNA fraction and the implementation of rRNA depletion and nuclease treatment can potentially enhance the signal-to-background ratio.
HTS comparison with other virus detection methods.
A major challenge in implementing HTS for the detection of adventitious agents in biologics is to understand how HTS technologies perform compared with other methods currently in use. This is difficult since the general virus detection assays, such as
in vitro (infection of specific cell lines) and
in vivo (infection of host animals) models are compendial tests that have not been characterized for sensitivity of virus detection. However, recently, studies were undertaken to obtain data for comparing virus detection in the
in vitro and
in vivo assays (
18). Other broad virus detection assays have also been evaluated for virus detection such as microarrays and broad-range PCR-electrospray ionization mass spectrometry (PCR/ESI-MS) (
19). To date, qPCR stands as a gold standard technique for sensitive detection of known adventitious viruses mainly because of its specificity and sensitivity, and the use of PCR-based assays can be seen as a preferred alternative to current methods that have shown their limitations, that need improvements, or that need to be replaced in light of the 3R policy aiming at replacing, refining, or reducing the use of animal models (
5,
18,
20). Nevertheless, the need for specificity is also the main drawback of qPCR. Indeed, new virus strains and genomic variants are continually identified (
21,
22), reflecting the need to continually mine sequence databases to update qPCR assays targeting adventitious agents and precluding the use of universal primer/probe designs. While the genetic sequence of the target has to be known to develop a qPCR assay, HTS workflows detect and identify any DNA molecule regardless of its intrinsic sequence (
23,
24).
Due to their throughput, HTS technologies also easily allow detection of multiple targets in multiple samples at the same time, while qPCR requires developing multiple assays that should not self-interfere if they are to be multiplexed or that result in as many runs as assays. In this study, five viruses with contrasting genome topologies described in
Table S4 (including double/single stranded, RNA/DNA, segmented/linear/circular, and sizes from 1.1 kb for small segments of REO1 to 170 kb for EBV) could be readily detected, singly or mixed, and the percentage of genome covered could be determined, while several virus-specific qPCR assays would have been necessary to detect and confirm the viruses individually. This has also been evidenced in another collaborative study led by the National Institute for Biological Standards and Control (NIBSC) in which a pool of 25 viruses was analyzed using different HTS workflows, which for some of them, detected all 25 viruses, while as many qPCR assays were needed to achieve less information on the virus pool since some of the qPCR assays were not successful (
25,
26).
Even if HTS detection/identification capacity outperforms qPCR, few studies have addressed how HTS performs in terms of sensitivity compared with other molecular techniques and qPCR in particular. The sensitivity of an analytical method is commonly characterized by the concentration, known as the limit of detection (LOD), that can be detected with a high (most often 95%) probability. Theoretically, the lowest LOD of PCR has been determined as three copies per reaction (
27). This holds true if the qPCR assay has an efficiency of 100% (i.e., each PCR cycle produces exactly two copies of the DNA template) and targets a single-copy gene, in addition to the assumption of a Poisson distribution and a 95% chance of pipetting at least one copy in the PCR (
27). Nevertheless, qPCR assays most often have lower efficiencies (because of poor primer/probe design and reagent shortage as the reaction progresses over time) and the lowest LOD is rarely met (
28,
29). Moreover, the overlooked topology of the DNA template also has an impact on the efficiency of the qPCR, and the same target in different topological environments can result in divergent qPCR outputs (
30,
31). Since HTS workflows rely on fragmented DNA molecules as the template and a limited number of amplification cycles, these constraints likely have less impact on the HTS outcome.
Several reports indicate that HTS can be a method combining sensitivity with the ability to identify known and unknown viruses. For instance, Greninger and coworkers (
32) showed that HTS could detect influenza A virus H1N1 titer near the detection limit of a specific RT-PCR assay under conditions where a DNase treatment was included prior to extraction of nasopharyngeal swab samples containing between 10
5 to 10
9 viral particles per ml. In their evaluation of the performance of HTS in detecting viruses, using MRC-5 cells spiked with various concentrations of viruses near the LOD of validated RT-qPCR, Cheval and coworkers (
33) showed that the sensitivity of HTS is near those of qPCR-based assays. In their work, Cheval and coworkers also compared the earlier 454 Roche sequencing technology with Illumina’s genome analyzer II (GAII) and showed that, although fewer than 10
7 reads were generated by the GAII, producing more reads increased the sensitivity of the HTS workflow. Additionally, Wylie et al. (
34) showed by comparing the viromes of febrile and afebrile children that HTS can possibly reach a sensitivity comparable with that of qPCR (especially for mastadenoviruses, roseoloviruses, and enteroviruses), but that depending on the virus family (polyomaviruses and bocaviruses), optimization of the HTS workflow was required. Of note, this study generated small numbers of reads (between fewer than 3 million reads and 50 million reads, while nowadays more than 300 million reads can be obtained routinely) and suggested that there was a correlation between the number of reads generated and the probability of finding reads mapping to the targeted viruses (
34). In another study, Prachayangprecha and colleagues (
35) compared the outputs of a sequence-independent HTS approach and routinely used diagnostic RT-qPCR assays for the presence of respiratory viruses in nasopharyngeal aspirates from Thai children with respiratory disease. Prachayangprecha and colleagues concluded that HTS was at least as sensitive as diagnostic RT-qPCR for the detection of rhinovirus and human metapneumovirus but that HTS was less sensitive toward bocavirus and enterovirus. This study was conducted using a 454-GS Junior instrument, which generated between more than 400 and fewer than 35,000 reads, depending on the sample, stressing once again the potential of HTS even when few reads are available. A clear asset of HTS is the capacity of this technology to detect truncated forms of viruses, which might occur during cell passages. Timm and colleagues demonstrated the added value of HTS in quantifying those defective particles in a population of vesicular stomatitis virus (a prototypical nonsegmented negative-sense RNA virus) that was cultured for three passages on BHK cells (
36), highlighting the power of HTS in detecting genome variants that might escape qPCR-based assays. It was also suggested in this study that HTS can be more sensitive than other complementary techniques (i.e., RT-qPCR, transmission electron microscopy [TEM], and biological activity assay). Fischer and colleagues showed that in influenza virus-positive respiratory samples, there was a high correlation between HTS results (generated on a MiSeq or HiSeq 2000 instrument, with a maximum number of reads of 3.5 or 45 million, respectively) and quantification cycles (
Cq) obtained with RT-qPCR, both reaching similar sensitivity but with HTS providing information about additional viral pathogens and bacterial species associated with superinfections in influenza patients (
37). Finally, using a pool of 25 viruses prepared by the NIBSC, Li and coworkers (
25) demonstrated that all viruses that were detected by qPCR (i.e., with
Cq of <37) were detected by HTS and that four out of the six viruses that were not detected by qPCR were identified by HTS. This was further corroborated in the collaborative study led by the NIBSC with the same virus pool, where some HTS workflows were able to detect and identify all the viruses present in the sample, even those that were not detected by qPCR (
26). It is important to note that the virus pool was prepared on the basis of titers that were determined by qPCR before dilution and pooling of the 25 viruses (
26), indicating that the absence of qPCR signal was not due to a problem of specificity but rather to a problem of sensitivity. Here, we did not specifically determine the limit of detection of HTS with regard to the detection of adventitious viruses. Although we showed that some HTS workflows are able to detect at least 1 virus genome (and likely less) in the equivalent of 10 host cell genomes, a specific experimental design needs to be set up to clarify what is the true performance of HTS, including considerations about enrichment strategies (
38,
39) and impact of the metabolism of infected cells, in order to validate the use of HTS for adventitious agent testing of biologics and guide further investigations.
The advantages of HTS for unexpected or novel virus detection make this technology a promising tool for supplementing or perhaps even replacing some of the conventional virus detection assays. There are examples of discovering virus and other microbial agents in a variety of samples, including clinical, environmental, and biological samples. The most impactful was the discovery of PCV1 in a licensed rotavirus vaccine (
40). This also highlighted the limitations of the currently used virus detection assays since PCV1 was not a novel virus and brought HTS into consideration by industry and regulatory agencies for further investigations of its applications in biologics. Additionally, a novel rhabdovirus was discovered in the Sf9 cell line that was not detected by extensive testing using conventional assays, although it was expressed at a high level (
9). In the present study, we have shown similar levels of detection with two HTS technologies using four different virus types in different matrices representing different types of samples relevant to biologics (summarized in
Table S3). The results using the same starting virus stocks in independently designed and conducted experiments support the idea that development of well-characterized reference virus materials can facilitate standardization of HTS for applications in biologics.