INTRODUCTION
The global COVID-19 pandemic is caused by infection with the virus SARS-CoV-2 (
1). Analysis of whole-genome sequences from global viral samples shows ongoing changes in the composition of viral populations. RNA viruses have high mutation rates, so sequence change is expected in the viral genome due to random genetic drift (
2). However, selection for efficient immune evasion and transmission between humans now seem likely to be major drivers of SARS-CoV-2 diversification (
3–5).
Widespread vaccination against SARS-CoV-2 was introduced in the United States in the winter of 2020–2021. First to be implemented were vaccines based on modified mRNAs, followed by adenovirus vector delivery. Vaccines are highly protective against infection, severe disease, and death. However, infection of vaccinated individuals has been widely detected, albeit typically with much milder disease course compared to that experienced by unvaccinated individuals (
6–8). Thus, interest turns to the question of which viral features are associated with vaccine breakthrough infections (
7–9).
Several criteria can be applied to assessing whether sequence changes in a new variant have likely evolved to promote infection and vaccine breakthrough. Substitutions found in viral spike proteins can be tested in laboratory experiments to determine whether they promote more efficient replication in human cells or reduce binding of human antibodies (
10–18). Other substitutions may alter epitopes targeted by the cellular immune system (
6,
19,
20). Viral lineages with diverse combinations of these substitution have been identified and designated variants being monitored and variants of concern (VBM/VOC) by the Centers for Disease Control and Prevention (CDC). Some of the substitutions in these variants have been detected arising independently on multiple genetic backgrounds, such as the spike substitutions N501Y, E484K or the 69–70 deletion (
4,
21–24), supporting a model of convergent evolution.
One indication of selection for increased transmission in humans is that several new variants have spread globally and rapidly displaced preexisting strains. This was first documented for the D614G substitution, which spread around the world in the Spring of 2020 and displaced most strains lacking this substitution (
25–28). More recently, variants first identified in the UK (B.1.1.7 or alpha) (
29,
30), South Africa (B.1.351 or beta), Brazil (P.1 or gamma), California (B.1.427 and B.1.429 or epsilon), New York (B.1.526 or iota) (
31), and India (B.1.617.2 or delta) (
8) have been suggested to be spreading at the expense of preexisting viral types. Against this background, intense interest focuses on whether particular variants are more efficient at infecting vaccinated individuals.
We have investigated viral genomic evolution in the Delaware Valley, which encompasses the city of Philadelphia, in a sample that includes vaccine breakthrough cases. Our initial report on the first wave of infection in this area (
32) revealed that lineages in Philadelphia most closely matched sequences derived from New York City, approximately 100 miles away, which is larger than Philadelphia and had an earlier peak in infection. We also found that in some cases different viral sequence polymorphisms could be found in the same patients from different body sites or in longitudinal samples, suggestive of ongoing evolution within infected individuals (
32).
In this study, samples were collected from 2621 surveillance samples from infected individuals and 159 vaccine breakthrough cases through September 2021, and the representation of different variants compared. We found that several waves of variants rose and fell in prevalence over the course of our sampling period, by the end of sampling all genomes were identified as VOC delta lineages, and the delta lineage B.1.617.2 was potentially 3-fold enriched among vaccine breakthrough samples compared to surveillance samples. The amino acid substitution N501Y, found in the alpha, beta, and gamma variants, was also notably enriched in vaccine breakthrough samples. We introduce a rigorous statistical approach, Bayesian autoregressive moving average logistic multinomial regression, that should be widely useful for assessing enrichment of viral variants while controlling for the changing background of circulating strains.
DISCUSSION
Vaccination against SARS-CoV-2 has been highly effective at preventing infection, hospitalization and death. However, infections occasionally take place despite vaccination. It has been suggested that certain lineages of SARS-CoV-2 are more prone to evade vaccination, however, interpretation of counts of vaccine breakthrough is difficult due to variation over time in circulating lineages, the numbers of vaccinated individuals, times since vaccination and the application of nonpharmaceutical interventions. Here, we introduce modeling based on Bayesian autoregressive moving average multinomial logistic regression, which estimates the proportions of lineages in the surveillance data over time, allowing comparison of lineage counts in populations of interest to time-matched surveillance estimates. Multiple studies have investigated whether certain lineages are more likely to appear in vaccine breakthrough infections (
7,
8,
39–43). However, few studies have as large numbers of sequenced samples for both background surveillance and for vaccine breakthrough cases within a matched geographical region as reported here. Thus, our contribution provides targeted data on the nature of breakthroughs at nucleotide resolution.
Using this method, we found that there was evidence for enrichment in the odds of variant lineages appearing in the vaccine breakthrough population relative to the general surveillance population, with the delta variant B.1.617.2 showing the strongest signal with an estimated enrichment of 3-fold (95% credible interval 0.89 to 11). Several previous studies have also noted delta variants to be enriched among vaccine breakthroughs (
7,
8); our data provides further support based on careful statistical analysis.
Using a similar Bayesian autoregressive moving average logistic regression model, we interrogated enrichment of point substitutions among vaccine breakthrough cases. The N501Y substitution stood out as particularly associated with vaccine breakthroughs. N501Y is found in several of the VBM, and this substitution is well known to increase affinity of binding for the viral spike protein to the ACE2 receptor (
23,
37,
38), as well as reduce antibody binding to promote immune evasion (
24). Recently the N501Y substitution was suggested to be a central feature in a meta-signature of up to 35 mutations which recur in alpha, beta, gamma and other lineages, and which mark a viral fitness peak reflecting optimization for spread in humans (
44). In contrast, several substitutions were estimated to be less likely to appear in vaccine breakthrough samples, including nucleocapsid mutation P199L (detected in B.1.2, B.1.526, and B.1.596; odds ratio of 0.5; 95% credible interval of 0.24–0.88), and spike mutation D253G (detected in B.1.526; odds ratio of 0.5; 95% credible interval of 0.23–0.89). The spike D253G substitution has been implicated in MAb evasion (
45), and nucleocapsid P199L may alter the assembly of SARS-CoV-2 VLPs (
46), functions possibly linked to their depletion in vaccine breakthrough.
This study has several limitations. For the vaccine breakthrough samples, we did not have paired immunological data, so it is unknown whether the vaccinations provoked protective immune responses. Our surveillance data were from a mixture of hospitalized patients, symptomatic subjects tested in clinical diagnostic laboratories, and asymptomatic subjects tested weekly at an academic institution, and so our sampling may not be entirely representative of viruses circulating in the community. Several different amplification and sequencing methods were used to acquire data. Documentation of vaccination status may be incomplete. For all the viral genome sequencing, only samples achieving a threshold level of viral RNA could be sequenced (roughly Ct <28 for swab-based testing, Ct <20 for saliva), so it is possible that other variants predominate in subjects with lower viral loads. Studies have been initiated to address some of these concerns.
In summary, similar to other regions in the US and around the world, VBM/VOC and other lineages waxed and waned in the Delaware Valley, with delta lineages ultimately comprising all recent samples in the fall of 2021. Widespread vaccination was introduced in the region in winter 2020–2021, and increasing numbers of breakthrough infections have since been detected. To compare the lineages found in breakthrough with those observed in general surveillance, we introduce analysis based on Bayesian autoregressive moving average multinomial logistic regression and show that delta variant B.1.617.2 showed 3-fold enrichment in vaccine breakthroughs. The N501Y substitution stood out among point substitutions for enrichment in vaccine breakthroughs. We expect that these modeling methods will be useful in monitoring the effectiveness of vaccination programs going forward as novel variants of SARS-CoV-2 continue to emerge.
ACKNOWLEDGMENTS
We are grateful to individuals and their families who volunteered to provide specimens, members of the Bushman and Collman laboratories for help and suggestions, and to Laurie Zimmerman for artwork and help with manuscript preparation. We acknowledge help from all the staff of the Philadelphia Department of Public Health. Funding was provided by a contract award from the Centers for Disease Control and Prevention (CDC BAA 200-2021-10986 and 75D30121C11102/000HCVL1-2021-55232), philanthropic donations to the Penn Center for Research on Coronaviruses and Other Emerging Pathogens, and in part by NIH grant R61/33-HL137063. B.J.K. is supported by NIH K23 AI 121485. Additional assistance was provided by the Penn Center for AIDS Research (P30-AI045008) and the Women’s Committee of CHOP. This project has been funded in whole or in part with Federal funds from the National Institute of Allergy and Infectious Diseases, National Institutes of Health, Department of Health and Human Services, under Contract No. 75N93021C00015.
A.D.M., J.E., S.S.-M., R.G.C., K.R., B.J.K., and F.D.B. designed the study; S.A.W., J.G.-W., L.A.K., A.S.F., A.D.M., C.B., S.R., J.H., R.D., L.D., A.A., E.K., S.C., C.N., J.C.M., P.P., C.C., K.S., A.G., M.F., R.G.C., K.R., A.J.-B., L.C., S.L., A.G., and B.J.K. managed acquisition of clinical specimens; A.D.M., C.B., S.R., L.D., A.A., J.C.M., P.P., N.B., B.R., Z.-XW., P.H., A.M.R., A.G., A.M., and F.D.B. carried out sequencing; A.D.M., A.M.M., C.B., S.R., A.A., J.C.M., P.P., S.S.-M., B.J.K., J.E., and F.D.B. carried out bioinformatic and statistical analysis; A.D.M., S.S.-M., J.E., K.R., B.J.K., R.G.C., and F.D.B. wrote the article.
RGC reports support to his lab for COVID-19 work unrelated to the current manuscript from OraSure, Inc, and ongoing collaborations with Resilient Biotics, Inc.
We have no other conflicts of interest to report.