INTRODUCTION
A novel coronavirus, severe acute respiratory syndrome coronavirus 2 (SARS-CoV-2), has caused outbreaks of coronavirus disease 2019 (COVID-19) globally beginning in mid-December 2019, with an epicenter in Wuhan, China (
1–3). As of 9 April 2021, SARS-CoV-2 had infected 133 million people worldwide and caused 2.8 million deaths, with an estimated fatality rate of 2.2% (
4). This on-going pandemic of COVID-19 has become the most serious threat to public health in this century.
The origin of SARS-CoV-2 remains elusive. However, the initial cases were largely associated with a seafood market, which indicated potential zoonotic transmissions (
1,
5,
6). Although bats and pangolins are most likely the reservoir and the intermediate hosts in the wild, more evidence is needed to support zoonotic transmission and to track the origin of this new coronavirus (
5,
6).
Angiotensin-converting enzyme 2 (ACE2) is the host cellular receptor for the SARS-CoV-2 spike glycoprotein, which is similar to its counterpart in SARS-CoV. The receptor-binding domain (RBD) of the spike protein subunit S1 interacts directly with ACE2, providing for tight binding to the peptidase domain of ACE2 (
7–10). Therefore, RBD is the critical determinant of virus-receptor interaction and reflects viral host range, tropism, and infectivity. Although the RBD sequences of different SARS-CoV-2 strains circulating globally are largely conserved, mutations have appeared; these may account for differences in viral infectivity and also contribute to its spread (
10–14).
Meanwhile, S glycoprotein participates in antigenic recognition and is expressed on the virion surface, likely to be immunogenic, carrying both T-cell and B-cell epitopes. The potential antibody binding sites that have been identified indicate RBD has important B-cell epitopes. The main antibody binding sites substantially overlap the RBD, and the antibody binding to these sites is able to block viral entry into cells (
14–16).
To investigate whether mutations in RBD that emerged during the early transmission phase have altered the receptor binding affinities and whether these isolates may have been selected for higher infectivity, the binding dynamics and the infectivity between the SARS-CoV-2 RBD mutants and human ACE2 receptor were modeled and assessed computationally. The evolutionary trajectory of SARS-CoV-2 during the early transmission phase under negative selection pressure was also analyzed. In addition, experimental validation of the enhanced affinity and infectivity of the V367F mutant was performed.
(This article was submitted to an online preprint archive [
17].)
DISCUSSION
Due to the challenging and on-going pandemic and given the evolving nature of the SARS-CoV-2 virus globally, identifying changes in viral infectivity has proved crucial to restraining the COVID-19 pandemic. Quarantine policies need to be adapted with respect to the viral infectivity change. It is always a dilemma of quarantine and economy. Any government would balance the lost due to the quarantine lockdown versus the lost due to the disease. Numerous models have been raised to estimate the costs. For example, if the variants have evolved to be more infectious, more stringent lockdown measures would be expected.
In our study, we analyzed 32,123 genomes of SARS-CoV-2 isolates (December 2019 through March 2020) and identified 302 nonsynonymous RBD mutants, which clustered into 96 mutant types. The six dominant mutations were analyzed by applying molecular dynamics simulations. The mutant type V367F continuously circulating worldwide displayed higher binding affinity to human ACE2 due to the enhanced structural stabilization of the RBD beta-sheet scaffold. The molecular dynamics simulations also indicated that it would be difficult for bat SARS-like CoV to infect humans. However, the pangolin CoV is potentially infectious to humans. The increased infectivity of V367 mutants was further validated by receptor-ligand binding ELISA, surface plasmon resonance, and pseudotyped virus assays. Phylogenetic analysis of the genomes of V367F mutants showed that during the early transmission phase, most V367F mutants clustered more closely with the SARS-CoV-2 prototype strain than the dual-mutation variants (V367F+D614G) which emerged later and formed a distinct subcluster.
Compared to that of the prototype strain Wuhan-Hu-1, the ΔG of the V367F mutation decreased ∼25%. The variant binds to ACE2 more stably due to the enhancement of the base rigidity of the S protein structure. Along with the ongoing adaptation to transmission and replication in humans, some critical mutations in the RBD may boost the binding affinity and lead to an increase in the basic reproduction number (R0).
A previous study found some RBD mutations that enhanced ACE2 affinity but were not selected in pandemic isolates (
23). For example, through deep mutation scanning, an assumed N501F mutation was confirmed to have enhanced binding efficiency while not emergent, but later, the N501Y mutation in the same position with a similar enhancing effect was frequently detected in emerging lineage B1.1.7 variants (
24). This may partly explain why V367F has not been dominant in the subsequent epidemic. The V367F mutation was also confirmed with higher protein expression level and enhanced stability due to the higher melting temperature in mammalian-expressed purified RBD than the prototype RBD (
23).
The V367F mutants were frequently found during the early transmission phase (
Table 1). As RBD is conserved in SARS-CoV-2, the coincidence of V367F mutants across large geographic distances indicates that this mutation is more robust and that these variants originated as a novel sublineage, given the close isolation dates (22 and 23 January 2020, respectively). The asymptomatic individuals with the same mutation were “superinfecting” travelers. Along with the epidemiological data, mutation surveillance is of critical importance, as it can reveal more exact transmission routes of the epidemic and provide early warnings of additional outbreaks. Emergence of SARS-CoV-2 RBD mutants with higher binding affinity to the human ACE2 receptor in Hong Kong, France, and other countries during the early transmission phase suggests an increased risk of emergence of more and more variants with “high affinity” and increased infectivity during a sustained pandemic of COVID-19, particularly if effective precautions are not completely implemented.
By tracking the V367F mutation type in the SARS-CoV-2 circulating strains, most of these mutants were first detected within the prototype D614 strains, then together with the G614 variants. This is possibly due to multiple individual recombination events between the D614 strains and G614 mutants, although it is difficult to determine the exact recombinant positions or time point due to the high sequence identities among SARS-CoV-2 variants. D614G is distinct from the RBD mutations: it is not located in the RBD but enhances viral infectivity by elevating its sensitivity to proteases and increasing stability (
12,
13,
21). Furthermore, D614G and V367F may function independently and have synergistic effects on viral infectivity. Recombination is known to play an important role in natural coronavirus evolution, which may contribute to the convergence of dual enhancing mutants (D614G+V367F). The phylogenetic and substitution analyses indicated that the divergent clade of dual-mutation mutants was possibly from multiple individual recombination events which introduced multiple mutations. Whether these introduced multiple mutations may produce more adaptive variants needs further study. More attention should be paid to the risk of the accumulation of evolutionary advantages through recombination among the variants.
By tracking dominant RBD mutants up to 31 March 2020, multiple mutations from more than 10 isolates putatively related to host receptor binding and affinity were detected in this study. The equivalent amino acid positions in more than 500 SARS-CoV and Middle East respiratory syndrome coronavirus (MERS-CoV) genomes were compared. Among them, V483A in MERS-CoV and N439K in SARS-CoV resulted in reduced host receptor binding (
23,
25). Given the alanine shares similar chemical and structural properties with serine, the A344S variant was expected to have similar affinity to human ACE2 as the prototype strain, which was later confirmed through scanning (
23). By tracking these recent variants, we find that only the V367F mutation type was detected continually; the other enhanced mutations have disappeared. This may be because they failed to compete with the dominant D614G variants (
12,
21). Interestingly, we discovered some V367F variants combined with the D614G mutation. The recent emerging variants B.1.1.7, B.1.351, and B.1.429, which are currently threatening the world, also partially harbor the V367F mutation, possibly deriving from other recombination events. Whether these multiple recombinations cause the variants to be more infectious, more stable, or better at immune escape requires more study.
The origin of this virus has been of considerable interest and speculation since the outbreak. Due to the high sequence similarity of the bat SARS-like CoV genome and the pangolin CoV RBD sequence to the SARS-CoV-2 sequences, these hosts were thought to have initiated the infection in humans (
5,
6). Our results provide more clues to support this hypothesis: the binding energy of the bat SARS-like CoV RBD is too high to bind human ACE2 effectively (
KD in millimolar range). In contrast, the pangolin CoV showed a binding
KD for human ACE2 at the micromolar range, much higher than that of SARS-CoV-2 and just ∼6× higher than that of human SARS-CoV (
KD = 0.326 μM), which indicates that the pangolin CoV has the potential to infect humans in unprotected close contact. Alignments of the genomic sequences of SARS-CoV-2 and pangolin CoVs suggest recombination events, particularly in the RBD domain, between pangolin and bat viruses.
In summary, we have identified 302 RBD mutants clustering into 96 dominant mutant types during the early transmission phase. Among the six dominant mutation types, V367F mutants that emerged in Asia and Europe displayed enhanced structural stability of the spike protein along with higher binding affinities to the human ACE2 receptor. V367F mutants were circulating along with the D614G mutants. This suggests that the V367F mutants are stable and have acquired increased infectivity for humans during the COVID-19 pandemic. The emergence of dual mutants (V367F+D614G) and other combined mutations possibly due to recombination may hint toward the emergence of other variants with increased infectivity or with enhanced escape from the host immune response. Our findings support the continuing surveillance of spike mutations to aid in the development of new COVID-19 drugs and vaccines.
MATERIALS AND METHODS
Genome sequence data set in this study.
Full-length gene sequences of SARS-CoV-2 were downloaded from the NCBI GenBank database, and the GISAID EpiFlu database (
http://www.GISAID.org). In total, 34,702 SARS-CoV-2 full-genome sequences isolated during early transmission phase (samples collected before 31 March 2020) were downloaded, and the sequences with amino acid mutations in the S protein and RBD region were parsed. The genome sequences with either the V367F mutation or the V367F/D614G dual mutations in the S protein RBD were screened and analyzed in this study (see Table S1 in the supplemental material). For evolution analysis, 1,753 full-genome sequences from each NextStrain (
26) clade were filtered and cluster random sampled from the GISAID EpiFlu database.
Mutation analyses and phylogenetic analyses.
Alignment of S protein sequences from different sources and comparison of ACE2 proteins among different species were performed using MAFFT version 7 with default parameters (
https://mafft.cbrc.jp/alignment/server/) and BioEdit (
27,
28). Selection pressure analyses were conducted using the Nei-Gojobori method (Jukes-Cantor model) with Mega X (version 10.0.2) (
18,
19). Substitution mutation analyses of mutants were performed and compared with the Wuhan-Hu-01 sequence using BioAider (version 1.314) (
29). Phylogenetic analyses were conducted with IQ-Tree2, applying the maximum likelihood method with 1,000 bootstrap replicates using the GTR+F+R3 model by MFP ModelFinder (
30). The trees were annotated and modified using Figtree (version 1.4.4) (
https://github.com/rambaut/figtree/releases).
Molecular dynamics simulation.
The complex structure of the SARS-CoV-2 S-protein RBD domain and human ACE2 was obtained from the Nation Microbiology Data Center (identifier [ID] NMDCS0000001) (PDB ID
6LZG) (
31). Mutated amino acids of the SARS-CoV-2 RBD mutants were directly replaced in the model, and the bat/pangolin CoV RBD domain was modeled using SWISS-MODEL (
32). Molecular dynamics simulation was performed using GROMACS 2019 with the following options and parameters: explicit solvent model, system temperature of 37°C, all-atoms optimized potentials for liquid simulations (OPLS/AA) force field, and LINCS restraints. With 2-fs steps, each simulation was performed at 10 ns, and each model was simulated three times to generate three independent trajectory replications. Binding free energy (Δ
G) was calculated using the molecular mechanics Poisson-Boltzmann surface area (MM-PBSA) method (GitHub;
https://github.com/Jerkwin/gmxtool), with the trajectories after structural equilibrium assessed using root mean square deviation (RMSD) (
35,
36). To calculate the equilibrium dissociation constant (
KD) and Δ
G, the formula Δ
G = RTlnK_D was used. Estimated Δ
Gs of the RBD mutants were normalized using the Δ
G of the prototype strain, which was derived from experimental data.
Expression of wild-type and mutant S proteins (V367F).
The SARS-CoV-2 prototype S gene lacking the furin cleavage site, TM, and cytoplasm domains was cloned into pNPM5 vector (Novoprotein, NJ, USA) and fused with a C-terminal His6 tag. The mutation was introduced by site-directed mutagenesis using the ClonExpress MultiS One Step cloning kit (Vazyme) and confirmed by PCR and sequencing. The wild-type and V367F constructs were transfected into HEK293 cells using polyethyleneimine. Since the S protein included a signal peptide in its N-terminal 14 amino acids, it was secreted into the medium. The expressed S proteins were purified from filtered cell supernatants with a Ni-nitrilotriacetic acid (NTA) column. Eluted protein solution was then dialyzed against phosphate-buffered saline (PBS; pH 7.4) for subsequent assays.
Ligand-receptor binding ELISA.
Human ACE2 protein was immobilized onto a microtiter plate at 5 μg/ml (100 μl/well). Each S protein (prototype and V367F) was added as a ligand at different concentrations, ranging from 0.03 μg/ml to 10 μg/ml, and then incubated for 2 h at 37°C to allow receptor-ligand interaction. The ligand-receptor mixture was then washed three times. One hundred microliters of horseradish peroxidase (HRP) anti-His tag antibody (BioLegend, USA) (diluted 1:20,000) was added to each well and allowed to react for 1 h. After three washes, the signal was visualized using 3,3′,5,5′-tetramethylbenzidine (TMB) solution (Sigma-Aldrich, USA), and a microtiter plate reader recorded the optical density at 450 nm (OD450).
Surface plasmon resonance experiments.
The SPR experiments were performed in a Biacore T200 instrument (GE, USA). For this, the SARS-CoV-2 S-proteins, either prototype or V367F, were immobilized on the sensor chip NTA (GE, USA), according to the manufacturer’s protocol. Human ACE2 protein was injected in each experiment at seven concentrations (6.25, 12.5, 25, 50, 100, 200, and 400 nM). For each cycle, the absorption phase lasted for 120 s and the dissociation phase lasted for 600 s. After each cycle, the chip was regenerated by washing with 350 mM EDTA and 50 mM NaOH for 120 s so that the chip was ready for the next round of S protein immobilization. Blank controls with 0 nM ACE2 were performed, with the blank signals subtracted from the cycle signals. All experiments were performed at 37°C. KD values were calculated by fitting the curves using the software provided with the instrument.
Production and titration of SARS-CoV-2 pseudoviruses bearing V367F S protein.
The full-length S gene of SARS-CoV-2 strain Wuhan-Hu-1 (
NC_045512.2) was cloned into a SARS-CoV-2 spike vector (PackGene, Guangzhou, China) and confirmed by DNA sequencing. Plasmid SARS-CoV-2 spike (G1099T), incorporating the V367F mutation in the S gene, was constructed by site-directed mutagenesis using the ClonExpress MultiS One Step cloning kit (Vazyme), as per the manufacturer’s protocol.
Generation of SARS-CoV-2 S HIV backbone pseudovirus was performed as previously described with some modifications (
33,
34). Briefly, 293T cells, at approximately 70% to 90% confluence, were cotransfected with 9 μg of the transfer vector (pLv-CMV), 6 μg packaging plasmid (psPAX-lentiviral), and 6 μg envelope plasmid (pCB-spike or pCB-spikeV367F). Pseudoviruses were harvested at 48 h posttransfection and subsequently filtered into either Falcon or microcentrifuge tubes via a syringe-driven 0.45-μm filter. The harvested pseudoviruses were treated with recombinant DNase I (RNase-free) (TaKaRa, Japan) at 37°C for 10 min to eliminate carryover of plasmid DNA. Afterwards, the DNase I in the virus stock was inactivated at 95°C for 10 min after DNA digestion. Virus titration was measured by reverse transcription-quantitative PCR (RT-qPCR) targeting the WPRE gene of pseudoviruses using the Hifair III One Step RT-qPCR SYBR green kit (Yeasen), as per the manufacturer’s protocol. After reverse transcription (10 min at 42°C) and initial denaturation (5 min at 95°C), PCR amplification was performed for 40 cycles (15 s at 95°C, 60 s at 60°C). Primers were as follows: WPRE-F, 5′-
CACCACCTGTCAGCTCCTTT-3′; WPRE-R, 5′-
ACGGAATTGTCAGTGCCCAA-3′.
SARS-CoV-2 spike-mediated pseudovirus entry assay.
To detect S variant-mediated viral entry, Vero E6 and Caco-2 cells (5 × 104) grown in 24-well plates were infected with either S-V367 or S-F367-bearing purified pseudovirus (pretreated with DNase I) (1 × 106 pseudovirus RNA copies). At 24 h and 48 h postinfection, the cells were washed three times, and the total DNA was extracted. Relative fold increase of infected virus titers was calculated according to the WPRE DNA copy number of the lentiviral proviruses measured by TB Green Premix Ex Taq II real-time PCR kit (TaKaRa). Data from all of the samples were obtained from three independent experiments, and each sample was tested in triplicates.