INTRODUCTION
The emergence of SARS-CoV-2 rapidly became one of the most challenging public health situations for national health agencies around the world. Initially, diagnostic capacities were limited and had to be established. As it became clear that the pandemic was not going to be contained quickly, the evolution of the virus became an important focus of attention as new variants with different worrisome properties emerged, such as higher transmission rates or reduced vaccine effectiveness. Diagnostic assays had to be adapted to distinguish these new emerging viral variants. Classic epidemiology based on incidence data does not provide sufficient resolution and information to direct public health policies in the face of an ever-changing virus (
1). Genome sequencing, meanwhile, can help guide responses on multiple fronts: it allows the analysis of pathogen evolution and transmission and the detection of outbreaks, as well as informs the development and optimization of diagnostic assays and vaccines—all of which in turn inform public health interventions (
1–11).
Consequently, significant effort was directed toward capacity building and creating a fast and reliable surveillance system that allowed the tracking and tracing of (novel) SARS-CoV-2 variants. In Switzerland, individual laboratories quickly stepped up to the challenge of genomic surveillance, but without official targets. This changed in March 2021 with the establishment of the national genomic surveillance program by the Federal Office of Public Health (FOPH), which aimed to sequence 2,000–3,000 samples per week. Due to changed circumstances such as the wide availability of vaccines, this target was adjusted in April 2022 to 500 samples per week with a focus on hospitalized patients (Fig. S1). To date, Switzerland has shared almost 168k genome sequences with GISAID (
https://gisaid.org/submission-tracker-global/, last accessed on 10 February 2024), making it the eighth country in the world by number of submitted sequences when normalized by population. Nationally, the Swiss Pathogen Surveillance Platform (SPSP) has become the officially mandated Swiss SARS-CoV-2 Data Hub by the FOPH, both providing a national platform and sharing data with other databases such as GISAID and ENA (
12,
13).
The number of sequences influences how early emerging viral lineages can be detected (
14). Using genomic data from SPSP collected between the onset of the pandemic in February 2020 and the beginning of August 2022, we aimed to describe the national surveillance effort and explore how the amount of sequencing affects key surveillance outcomes: (i) first detection of variants of concern (VOCs), (ii) speed of introduction of VOCs, (iii) diversity of lineages, (iv) first cluster detection of VOCs, (v) density of active clusters, and (vi) geographic spread of clusters.
RESULTS
Genomic surveillance in Switzerland
A total of 143,260 sequences were available for a population of 8.7 million inhabitants over the observed period in our data set (cf. Materials and Methods). The median percentage of cases that were sequenced each week was 7% (
Fig. 1). In total, this constitutes 3.6% of all cases (
n = 3.95 million) for the whole study period.
The first year of the pandemic was characterized by the emergence and persistence of multiple different viral lineages that led to an increase in nucleotide diversity and the lineage diversity index (LDI, as measured by Shannon diversity index, cf. Materials and Methods) (
Fig. 1C; Fig. S2 to S6). This diversity on the nucleotide and lineage level was reduced with the first VOC, Alpha, rising to dominance (Fig. S3). All subsequent waves were dominated by a single VOC, with a slow rise in genetic diversity over time and a peak when a new, usually more divergent VOC was introduced (Fig. S3A). In contrast to Alpha, the Pango nomenclature accommodated the diversification of Delta and Omicron by introducing many more sublineages. This leads to a high LDI when all Pango-designated sublineages are considered, but an overall low LDI when these sublineages are consolidated to VOC labels (Fig. S3B). Moreover, as indicated by the arrows in
Fig. 1C, Alpha and Delta were already present in Switzerland several weeks before they increased in frequency and truly emerged as the dominant variant. For Omicron, however, this initiation period was much shorter. The pandemic context during the emergence of each VOC was also very different. For example, case numbers were high at the time of introduction of Alpha and Omicron, but low for Delta (Fig. S4). Delta’s rapid growth phase started briefly after a period of decreasing Alpha cases, which coincided with relaxed health and safety measures in Switzerland. This—in addition to its intrinsically higher transmissibility (
15)—likely played a role in its rapid increase at the end of June 2021 (calendar weeks 2021-25 onward in
Fig. 1C).
Effect of downsampling on VOC lineage diversity index
To investigate how the observed LDI would have changed with lower sequencing intensity, we downsampled our data set (cf. Materials and Methods).
Naturally, downsampling had an influence on the absolute number of lineages detected (Fig. S6). However, downsampling had only a minimal effect on the LDI during VOC-dominated periods, especially for Alpha and Delta (Fig. S5). The impact of downsampling was stronger during the initial period of the pandemic when the number of available sequences was low. There was also a stronger effect during the later Omicron wave (i.e., once Delta had stopped circulating widely in Switzerland) when downsampling to less than a third of the original effort (less than 50k sequences).
Effect of downsampling on first detection of VOCs and speed of introduction
We investigated how the first detection of the VOCs would have changed with a reduced sequencing effort, simulated by downsampling our data set. As expected, the delay of detection increased when fewer sequences were available (
Fig. 2A). Surprisingly, downsampling to around a third of the available sequences (50k sequences) only led to a 1-day median delay for Delta and Omicron and no delay for Alpha. In general, we note that Delta was more sensitive to downsampling than Alpha and Omicron, as indicated by the longer tail of the delays upon repeated downsampling.
We reasoned that differences in the impact of downsampling might be due to differences in how steeply each VOC rises shortly after introduction. Thus, we further investigated the speed of introduction of each VOC, defined as how quickly a newly introduced VOC was established and grew in frequency. While the speed of introduction was similar for Alpha and Delta, it was much faster for Omicron (
Fig. 3A). We observed that the speed of introduction also depended strongly on downsampling size, used as a proxy for sequencing effort. A higher sequencing effort was most accurate in capturing the respective growth of the emerging VOCs. Interestingly, the estimates of the downsampled data sets converged with increased sampling size toward the value calculated for the complete data set (red line in
Fig. 3), suggesting that the actual sequencing effort captured true VOC dynamics accurately.
We then examined how long it takes for a VOC to reach dominance or 50% prevalence (
Fig. 3B). In the full data set, Alpha and Delta required 88 and 103 days, respectively, while Omicron required only 31 days. With increased downsampling, the time needed to reach 50% prevalence decreased. In other words, lineages were seen as dominant earlier when fewer sequences were available. Delta was very sensitive for this measure as the spread of data and also the median time in reaching prevalence strongly increased with each downsampling. However, downsampling to around a third (50k sequences) accelerated prevalence only by 8 days median (4 days for Alpha and 1 day for Omicron), recapitulating the original data still well.
To summarize, we observed that the impact of downsampling on surveillance outcomes was very VOC-dependent (Fig. S7). Downsampling affected Alpha least with regard to the delay in detection and speed of introduction, whereas Delta was affected for both measures. For Omicron, the effect of downsampling on its detection was insignificant, while it was very strong for observing its growth.
These observations mean that differences in the impact of downsampling were not due to differences in growth during the introductory period, as Alpha and Delta displayed a similar speed of introduction for the complete data set but were differently impacted by downsampling. We could observe, however, that Delta had a unique feature. Its introductory period covered the phase in mid-June 2021 (around calendar week 2021-24) when cases (and sequencing) were very low (
Fig. 1) and health and safety measures relaxed, giving Delta an additional advantage and leading to its rapid rise. This “gap” might be the reason why different surveillance measures were so strongly affected by downsampling for Delta.
Effect of downsampling on VOC cluster detection
To understand how the transmission information changes when having fewer sequences, we analyzed the effect of downsampling on the detection of clusters.
We first looked at the delay between the detection of the first cluster of each VOC and the first sequence of that VOC. A cluster is defined as a transmission event between three or more people (cf. Materials and Methods). When considering all available sequences, the delay in cluster detection was very VOC-dependent: 49 days for Alpha, 35 days for Delta, and 10 days for Omicron (
Fig. 2B).
Figure 2B shows the effect of downsampling on the delay to detect clusters. A commonly asked question during the pandemic was whether new cases were due to endemic transmission or new introductions. This is of epidemiological interest as it can affect the backward and forward tracing strategy. Delay upon downsampling was again more pronounced for Delta. When downsampling to 50k sequences, the delay in the detection of the first cluster was much shorter for Alpha (54 days) and Omicron (14 days) but more accentuated for Delta (63 days). The first Delta clusters appeared in May 2021, but it was not until July 2021 that the VOC gained importance, making it difficult to detect those early clusters with a modest sequencing strategy.
The distribution of the normalized density of clusters is shown in
Fig. 4 for the whole data set. Upon downsampling to 50k sequences, we observed a similar pattern of peaks and valleys as with the complete data set, with some additional time windows now being without any clusters (e.g., beginning of July 2020 with two additional weeks without active clusters and in May 2022 with three additional weeks without active clusters) (
Fig. 4).
Effect of downsampling on VOC geographic spread and cluster duration
Cluster characteristics such as the cluster duration and geographic spread help public health organizations understand the spread of an outbreak, e.g., a superspreading event, and thereby argue on the introduction of countermeasures. Thus, it is important to study how these cluster characteristics change when reducing the sequencing effort. For all the available genomes, a total of 3,014 clusters were found of which 54 were long-distance virus movements (LDVM) (1.8%), defined as events that spread a distance >200 km. The median cluster size included three cases (interquartile range (IQR) = 2), while for LDVM events, the median included five cases (IQR = 4.75). On the other hand, for a downsampling size of 50k sequences, 738 clusters were found, from which 17 were LDVM (2.3%). The median number of cases per cluster was three cases (IQR = 1), while it was four cases (IQR = 2) for LDVM events.
Downsampling had little effect on the duration of clusters, a result consistent for all VOCs. The median duration ranged from 10 days for the complete data set to 14 days for 2,500 sequences (Fig. S8).
Downsampling resulted in a reduction of the maximum distance within cases of a cluster (Fig. S9). Most clusters were found to be localized in a single canton. Clusters spreading up to five cantons were less frequent, and clusters appearing in six or more cantons were sporadic. A small yet significant correlation was found between the number of samples in a cluster and the number of affected cantons (Fig. S10). While the absolute number of captured LDVM decreased upon downsampling, their proportion increased, meaning that LDVM got enriched upon downsampling (
Fig. 5).
DISCUSSION
Within a surveillance program, sampling strategy and sample selection are important factors for pathogen sequencing as they influence the ability to detect emerging lineages (
14) as well as epidemiological parameters and phylodynamic inferences (
16–18). For surveillance of SARS-CoV-2, the European Centre for Disease Prevention and Control (ECDC) recommends a representative sampling approach across geographic locations and demographics for general surveillance for both situation awareness and rare or novel variant detection. This was the chosen approach for Switzerland’s surveillance program. The recommended minimum prevalence to be aimed for in lineage detection is 2.5% (
19). In Switzerland, the surveillance program targeted for much of the study period around 2,000 sequences/week, which was just above the ECDC recommendation (1,522 sequences/week) for very rare lineages (1% prevalence) in periods of high case load (>100,000 cases/week) (
19). Another study estimated that 5% of cases needed to be sequenced to detect emerging lineages at 0.1%–1% prevalence (
20). This threshold was met for most of the period before and during the official national surveillance program until the beginning of 2022 (on average 9.7%,
Fig. 1B). This means that Switzerland forms part of the few 6.8% of countries that have sequenced at least 5% of cases in the first 2 years of the pandemic. Forty-five percent of countries have sequenced less than 0.5% of cases (
14), highlighting the privileged situation for sequencing-based surveillance in Switzerland.
Concurring with this, we find that simulated sequencing effort by downsampling to around a third of the actual extent had overall only a marginal effect on four of the six surveillance outcomes studied here, especially the first detection date of VOCs as well as their clusters, although this was slightly different for certain lineages based on their unique characteristics and the epidemiological backgrounds within which they emerged (in particular, Delta). Interestingly, this also holds true for cluster duration (i.e., active transmission chains of highly identical strains). The diversity (LDI) was also still recapitulated well with a third of the original sequencing effort during the VOC-dominated periods. Further reducing the sequencing, however, strongly affected specific periods such as the later Omicron wave in terms of LDI. This might be explained by the fact that Omicron has a highly skewed sublineage distribution with a few widespread sublineages co-circulating alongside an assortment of rare lineages, whereas Delta sublineages are more evenly distributed (
Fig. 1), leading to the LDI being more sensitive to the disappearance of rarer sublineages with increased downsampling in Omicron.
However, two other outcomes, namely, the geographic spread of clusters (as measured by LDVM) and estimates of the introduction speed and growth of a VOC, especially for Omicron, were more sensitive, and reduced sequencing came at the cost of sensitivity. The latter is because strongly downsampled data might not behave in a linear fashion anymore (cf. Materials and Methods), leading to arbitrary speed estimation values driven by sampling bias. Rather than being a fault, this highlights that accurately estimating the speed of growth of a VOC becomes difficult with sparse data.
In this study, we conducted a retrospective analysis of all available data, rather than simulating real-time surveillance after the fact. This pandemic was the first application of genomic surveillance on such a scale, necessitating a delay in political decision-making and the establishment of surveillance infrastructure. This setup period introduced a discrepancy between the actual collection and submission dates of data due to the retrospective submission of data from this period. Consequently, our analysis specifically focused on evaluating the potential reduction in sequencing efforts under the assumption of real-time data availability, to avoid the confounding effects of the delayed setup period on real-time simulation outcomes.
For this reason, a critical factor for timely lineage detection not assessed in our study is turnaround time (TAT, i.e., the time lag between sample collection and submission of the genome sequence to a surveillance database such as SPSP or GISAID) (
21). The median TAT in Switzerland was 18 days (with IQR of 14 days) over the period of the official national surveillance program. Assuming a TAT of 21 days, the probability of lineage detection before it reaches 100 cases was estimated as 0.51 and 0.96 if 1% or 5% of cases were sequenced, respectively, as simulated with data from Denmark (
14). This highlights that other important factors than just the amount of sequencing should also be targets of surveillance optimization, which are harder to assess and influence in non-centralized countries such as Switzerland where sequencing was performed by many different public and private laboratories.
We showed that the effect of sequencing effort differs for different surveillance outcomes but that many outcomes could have been recapitulated with a reduced sequencing effort assuming real-time data availability. Such a study was possible due to Switzerland’s high per capita sequencing effort. However, we believe that the results are still transferrable to other European countries as the overall viral dynamics were similar across Europe with overall similar circulating lineages. Indeed, we believe our results are encouraging for countries with less resources than Switzerland. A national surveillance program needs to strike a balance between societal benefits and program costs, and as we have shown, some outcomes require more sequencing effort than others. To achieve a cost-effective program, the desired outcomes of surveillance should be clearly defined and sequencing targets set accordingly. The SARS-CoV-2 pandemic was unprecedented, necessitating swift adaptation and learning amid the crisis, inevitably leading to mistakes. However, it also served as a vital learning opportunity, ensuring that for any future epidemic or pandemic, we are now better equipped to establish surveillance systems in a more cost-effective and efficient manner.
We were able to conduct this study thanks to the coordinated SARS-CoV-2 surveillance program led by the National Reference Lab for Emerging Viral Infections and the centralized data collection approach via the Swiss Pathogen Surveillance Platform. Such a platform served to coordinate the data sharing following FAIR principles (findable, accessible, interoperable, and re-usable) (
12,
13), reduced burden on laboratories with a single point of entry prior to data re-sharing to international archives, and efficiently liaised with public health authorities. A centralized infrastructure for data collection and processing of genome and epidemiological data is of crucial importance during a public health crisis.
ACKNOWLEDGMENTS
We want to thank the following scientists and technicians at the following institutions: University Hospital Basel: Dr. Helena Seth-Smith, Dr. Madlen Stange, Dr. Alfredo Mari, Elisabeth Schultheis, Daniel Gander, Magdalena Schneider, Rosamaria Vesco, and Valerie Courtet for contributing to the SARS-CoV-2 sequencing program; Institute of Medical Virology, University of Zurich: Stefan Schmutz, Gabriela Ziltener, Nadine Rist, Andrea Hafner, Dr. Maryam Zaheri, and Dr. Kevin Steiner for contributing to the SARS-CoV-2 sequencing program; all individuals involved in the ETH-led sequencing effort (
https://bsse.ethz.ch/cevo/research/sars-cov-2/swiss-sars-cov-2-sequencing-consortium.html); all individuals involved in the sequencing effort at the University Hospitals Geneva and Lausanne, the University of Bern, and the hospitals of Ticino, Wallis, Langenthal, and St. Gallen; and the Risch, Genesupport, Biolytix, and Synlab laboratories.
The Swiss SARS-CoV-2 sequencing program was financed by the Federal Office of Public Health. The SPSP was financed by multiple grants: NRP72 program (407240_177504; A.E.), swissuniversities openscience (A.E./A.N.), Swiss National Science Foundation (310030_192515; A.E.), Bangerter Rhyner Foundation (A.E./A.N.), SERI (A.N.), and Federal Office of Public Health (A.N.).
F.W. and B.C.G. analyzed the data. B.C.G. developed the cluster detection tool. A.E. and A.N. designed and supervised the study. F.W., B.C.G., A.E., and A.N. contributed to writing the manuscript. All other authors contributed data as part of the Swiss surveillance program and reviewed the manuscript. Except for co-first and co-last authors, names are sorted alphabetically.