INTRODUCTION
Since the beginning of the COVID-19 pandemic, variants have emerged that have the potential to evade vaccines, cause diagnostic test performance issues, or cause more severe disease (
1–5). Monitoring and surveillance of the genetic lineages of SARS-CoV-2-positive samples are critical to the timely identification of emerging variants (
6–8). In November 2021, Omicron, a highly divergent variant harboring multiple mutations, was reported in South Africa (
9). While data suggest that Omicron may be associated with reduced risk of hospitalizations compared to Delta, the increased transmissibility and rapid global spread of Omicron highlighted the need for a fast and cost-effective approach for early detection of such variants (
10–12).
SARS-CoV-2 genetic lineages in the US are primarily monitored by next-generation sequencing (NGS) on a random selection of approximately 5% percent of SARS-CoV-2-positive samples (
13,
14). While extremely accurate at detecting existing circulating variants, NGS does not allow for the timely identification of emerging variants. More focused approaches, such as genotyping using single-nucleotide polymorphism (SNP) assays, offer significant advantages in terms of cost, throughput, and efficient result reporting. Several reports have shown that SNP assays provide rapid turnaround times compared to NGS (
15–17). Harper et al. (
18) previously demonstrated the utility of a SNP genotyping panel to inexpensively identify SARS-CoV-2 genotypes circulating between March and May 2020. However, previous methods required a two-step approach to genotype a sample: (i) identify a positive sample using reverse transcription PCR (RT-PCR) or an antigen assay; and (ii) genotype the positive sample using a SNP panel. The documented approach seeks to expand upon earlier efforts by developing an assay which simultaneously confirms a SARS-CoV-2-positive sample and identifies its genotype. SARS-CoV-2-positive samples with undetermined genotyping variant classification could provide a prescreened sample set for NGS and potentially allow for early identification of emerging variants.
The National Institutes of Health (NIH) Rapid Acceleration of Diagnostics (RADx) initiative created a Variant Task Force (VTF) in January 2021 to assess the impact of emerging SARS-CoV-2 variants on
in vitro diagnostic testing (
19). In July 2021, the NIH RADx VTF also initiated an effort to develop a SARS-CoV-2 RT-PCR assay for variant agnostic detection of SARS-CoV-2, as well as early detection and monitoring of SARS-CoV-2 variants. Its aims were as follows: (i) identify SARS-CoV-2 markers useful for the detection of SARS-CoV-2-positive samples across all variants; (ii) develop a panel of SNP markers that can be used to accurately assign lineages to SARS-CoV-2-positive samples; and (iii) implement a genotyping approach and platform for the early detection of new and re-emerging variants which signals when markers need to be updated.
This paper outlines the results of the performance validation of the initial genotyping assay and associated marker sets. It also describes how this approach was rapidly adapted to develop a targeted panel of four mutations—three for Omicron and one for Delta—for the purpose of identifying Omicron, and how this Omicron genotyping panel was implemented in several diagnostic labs. The data from this study, as well as associated statistics and trends, are available on a publicly accessible dashboard (
20).
DISCUSSION
The SARS-CoV-2 virus continues to mutate at an unprecedented scale. NGS is the main method used to track the emergence of new variants; however, NGS technology is expensive and current reporting to GISAID and national regulatory authorities typically takes several weeks (
15–17). As of January 27, 2022, there were 7,457,886 sequences in GISAID. Nearly 80% of these sequences were deposited by 10 countries (US, United Kingdom, Germany, Denmark, Canada, France, Japan, Sweden, Switzerland, and India), and 90% by two continents (Europe and North America) (
21–23). With approximately 400 million COVID-19 cases reported since the start of the pandemic, this translates to geographically limited sequencing of less than 2% percent of available samples (
38). Clearly, there is a need for a more widely accessible and efficient method to detect the emergence of new variants.
The use of a high-throughput, low-cost RT-PCR genotyping panel was shown by previous authors to enable rapid identification of circulating SARS-CoV-2 variants. Neopane et al. (
39) demonstrated good concordance between their assay and sequencing for variants circulating between March and July 2021. Out of 150 SARS-CoV-2-positive specimens, 69 (46%) were B.1.617.2, 49 (32.7%) were B.1.1.7, 4 each were P.1 and P.2 (2.7% each), 3 were B.1.526 (2%), and 2 each were B.1.351 and B.1.427 (1.3% each). An additional 17 (11.3%) had a mutation only in D614G. However, 13 of the 14 SNPs used in the panel by Neopane et al. were from the S gene, and many of these mutations are now found in different SARS-CoV-2 lineages (
40).
Korukluoglu et al. (
41) described a one-step RT-qPCR assay to detect N501Y and HV69-70del using allele-specific forward primers, reserving ORF1ab as an internal control. This was a relatively small study with 165 specimens, and the authors observed 100% concordance with results of Sanger sequencing and NGS. Vogels et al. (
42) reported on a RT-qPCR assay to detect ORF1a SGF3675-3677del and spike HV69-70del. This assay was concordant with 76 sequenced specimens. Perchetti et al. (
43) utilized a two-step approach combining the CDC-based laboratory-developed RT-qPCR and the Thermo Fisher TaqPath COVID-19 RT-PCR to identify B.1.1.7 variants. However, the Perchetti et al. approach employed labor-intensive droplet digital PCR (ddPCR) and depended on the S gene dropout, which is now known to occur in multiple variants (
44–48).
Harper et al. (
18) developed a genotyping panel to detect variants identified from SARS-CoV-2 sequences surveyed between March and May 2020 and tested this on 50 stored qRT-PCR positive SARS-CoV-2 clinical samples. They initially identified 22 SNPs that could discriminate 15 different genotypes, but subsequent analysis on a larger sequence data set indicated that their approach required 51 markers to maximize sample discrimination. The largest nucleic acid amplification test-based genotyping series thus far was a national effort in France recently reported by Haim-Boukobza et al. (
49) The authors used two separate assays to screen for the HVdel69 to 70 and N501Y mutations in 35,208 samples. However, this approach was unable to genotype 19% of the samples.
We sought to expand upon these previous studies by analyzing more than 7.5 million sequences from GISAID to select target SNPs. Our study goals were 3-fold: (i) identify SARS-CoV-2 markers useful for the detection of SARS-CoV-2-positive samples across all variants; (ii) develop the smallest set of SNP markers that can be used to accurately assign lineages to SARS-CoV-2-positive samples (PPA ≥ 95% compared to NGS); and (iii) implement a genotyping approach and platform for the early detection of new and re-emerging variants that signals when markers need to be updated. There may be a bias in our results given that the retrospective and prospective clinical samples used to validate assays originated primarily from Washington, Florida, and California; however, subsequent studies using samples from a broader swath of states showed similar results (data not shown), and national and global knowledge of circulating strains at the time of this study suggest that any bias would be minimal.
This report identifies three variant agnostic markers that can detect SARS-CoV-2-positive samples with high PPA and NPA compared to NGS. These markers are present in almost all the SARS-CoV-2 samples that were sequenced and should be considered in the development of new assays. This study also demonstrates that some marker combinations are highly specific for certain variants. Routine use of these genotyping markers could provide early warning that a new or re-emergent variant is circulating. Importantly, genotyping with this assay is quick and efficient, enabling result reporting in 1 to 2 days, compared to 10 to 14 days with NGS, and for a fraction of the cost. As such, genotyping can be used to monitor a higher percentage of SARS-CoV-2-positive samples than the 5% percent random sampling by sequencing currently practiced in the US. Samples which cannot be assigned to a known variant would be prime candidates for sequencing. Finally, the study demonstrates that the Omicron variant can be identified with high precision with three markers. Incorporating Omicron-specific markers with the markers defined to detect previous variants can provide a framework for the detection of the next new variant.
The genotyping markers for Omicron effectively highlighted the transition from Delta to Omicron as the dominant variant. As illustrated in this report, a static marker set with the Delta-specific markers omitted experienced a significant decline in accuracy within four months of the emergence of Delta in the US. To prevent a loss of marker accuracy, this study demonstrated an approach for detecting new and emerging variants using a classifier algorithm for recurring analysis of active marker sets across regional and global GISAID sequence data. As emerging variants develop, anomalies in classifier calls and the resulting discordance with sequencing classification will continue to highlight the need for marker modifications. With the genotyping assay described herein, addition or subtraction of markers is straightforward. Each SNP marker, including variant agnostic positivity markers, is interrogated in a separate, individual well so that changing one marker has no impact on the performance of other markers in the set. The genotyping assay can be customized to include one or two positivity markers, as well as lineage assignment markers appropriate for the current variant landscape, to simultaneously confirm a SARS-CoV-2-positive sample and identify its genotype. Standardized performance metrics, such as limit of detection (LoD), sensitivity and specificity of custom successor assays, will be easy to establish. A formal LoD was outside the scope of this study, but positivity and genotype calls were obtained for samples with as few as 10 RNA copies per well.
A retrospective review of the emergence of Delta in the US showed that as this variant grew in prevalence, so too did the number of undetermined calls returned by the classifier algorithm. Thus, the classifier algorithm effectively assesses the accuracy of current marker sets based on daily analysis of new viral sequences added to GISAID, creating an adaptive and closed-loop process for low-cost, rapid monitoring of circulating variants and detection of emerging variants. Indeed, the authors recently created a free, live dashboard of a real-time genotyping platform illustrating the symbiotic nature of using genotyping markers in conjunction with targeted sequencing (
20). An uptick in SARS-CoV-2-positive samples with undetermined genotyping variant classification will trigger targeted sequencing of this subset of samples to determine whether they represent a novel variant for which genotyping markers should be developed and incorporated. This real-time tracking tool will become increasingly powerful as more public health and private testing labs adopt this genotyping approach and contribute data.
ACKNOWLEDGMENTS
The research reported in this publication was supported by the National Institute of Biomedical Imaging and Bioengineering of the National Institutes of Health (award numbers 75N92022P00030, 75N92021P00116, U54EB015408, and U54EB027690) as part of the Rapid Acceleration of Diagnostics (RADx) initiative, launched to speed innovation in the development, commercialization, and implementation of technologies for COVID-19 testing. The funders had no role in the decision to submit the work for publication, and the views expressed herein are the authors’ and do not necessarily represent the views of the National Institutes of Health or the US Department of Health and Human Services.
We gratefully acknowledge Biocomx contractors Dale Gort, Gail E. Radcliffe, Brian Walsh, and Marianne Weinell; Thermo Fisher Scientific employees Manohar Furtado, Anshu Gupta, Elvis Huarcaya Najarro, Paul Sportmann, and Rui Yang; Helix OpCo employees Marc Laurent, William Lee, and James Lu; and Aegis Sciences Corporation employees Cyndi Clark and Matthew Hardison for their support of this study.
Finally, we thank the originating laboratories responsible for obtaining the specimens, the submitting laboratories where genetic sequence data were generated and shared via the GISAID Initiative, and the GISAID EpiCov Data Curation Team.
E. Lai and R.S. Creager are co-principal investigators and secured funding for this study. E. Lai conceived the experiment and contributed to interpretation of resulting data. R.S. Creager was the lead author and editor of the manuscript. R.S. Creager is the guarantor of this work and, as such, had full access to all of the data in the study and takes responsibility for the integrity of the data and the accuracy of the data analysis. E.B. Kennedy was the secondary author and editor of the manuscript and led the graphic design of tables and figures. K.M.C. O’Donovan was the tertiary author and editor of the manuscript. J. Lozach devised methods for marker selection and contributed to data analysis and interpretation. J. Lozach and J. Davis-Turak devised methods for lineage assignment. J. Lozach led design and J. Davis-Turak, C. Wesselman, and T. Wesselman oversaw development and deployment of the ROSALIND Tracker for COVID-19. C. Wesselman, T. Wesselman, K. Hayashibara, M. Gandhi, and S. Williams provided manuscript writing and editing assistance. K. Hayashibara and P. Brzoska led, and E. Diamond supported, genotyping assay development, general lab protocol development, and data acquisition. Y. Yu also supported genotyping assay development via retrospective bioinformatic analysis. K. Hayashibara, E. Diamond, M. Gandhi, J.M. Nguyen, J.M. Ramirez, and S Williams oversaw implementation of the protocol at partnering labs, and supported data analysis and interpretation. D. Becker, T. Cassens, N.A. Leonetti, and E. Sandoval contributed to Helix OpCo workflow design and implementation. T. Cassens and S. White managed genotyping and sequencing data acquisition and analysis at Helix OpCo. T. Peck and D. Wong co-designed and developed the genotyping assay robotics program. A.L. Greninger led UW Medical Center workflow design and implementation, and A.L. Greninger, P. Hajian, and P. Roychoudhury supported genotyping and sequencing data acquisition and analysis at UW Medical Center.