INTRODUCTION
Adeno-associated virus (AAV) is a small 25-nm T=1 icosahedral virus with a protein shell encapsidating a single-stranded DNA genome (
1,
2). AAV is so named because it was discovered during adenovirus preparations and its replication depends on coinfection with adenovirus or one of several other “helper” viruses, not because there is any structural relation (
3–5). AAVs were long regarded as nonpathogenic, an initial rationale for their development as transducing vectors for
in vivo (and
ex vivo) gene therapy. (
6–9). The recent discovery of AAV sequences inserted into proto-oncogenes of patients with hepatocellular carcinoma (HCC) has prompted vigorous debate about causal links to natural infection and future vector use (
10–16). A prevalent view is emerging that there may be a concern for individuals with chronic liver disease (
17–19).
Nonetheless, it is an exciting time for gene therapy. After many years in development, the first two
in vivo treatments have been approved by the U.S. Food and Drug Administration (FDA), using AAV2 and AAV9 vectors, respectively. Luxturna is a treatment for an inherited blindness, and Zolgensma is for spinal muscular atrophy (
20,
21). AAV vectors are being used for >150 ongoing clinical trials (
https://clinicaltrials.gov/) (
22), but challenges await in generalization of the early successes. Deaths in a myotubular myopathy trial likely resulted from immune-toxicity of the high doses needed to achieve therapeutic expression levels with an inefficient transducing vector (
23,
24). Doses, measured in vector genomes per body mass, have been well tolerated at up to 1 × 10
14 vector genomes (vg)/kg, with all three fatalities occurring at 3 × 10
14 vg/kg (
23,
25). Structural studies are key to an improved fundamental understanding of AAV’s virology and its engineering for vector improvement.
Initial crystallographic structures revealed the 60-fold symmetric part of the capsid. The capsid gene is expressed as three variant viral proteins (VP) due to alternative start codons and splice variants (
26). The variants are in-frame, sharing most of their amino acid sequences, and it has become conventional to use common numbering, based on the largest, VP1. Ordered structure becomes visible at about residue 220, or ~20 residues beyond the N terminus of VP3, which constitutes ~80% of the capsid (
27). Upstream, VP1 (~10%) and VP2 (~10%) are extended by a common region of 65 usually unseen amino acids that some have proposed to function in nuclear localization (
28,
29). Then there is a segment unique to VP1 (VP1u), N-terminal of the VP2 start, that contains a phospholipase A2 (PLA
2) domain that is initially sequestered within the capsid but becomes exposed for endosomal escape on the entry pathway (
30–34).
Over 130 variants of human and nonhuman primate AAVs have been identified (
35,
36). These are grouped into eight major named and unnamed clades, containing one or more serotypes that are antigenically distinct; i.e., antibodies recognizing one serotype do not cross-react with others (
35,
37). The serotypes differ in other properties, such as binding preference to glycan attachment factors and empirically determined tissue tropisms (
38,
39). This study uses two representatives, AAV2 and AAV5, as model systems. AAV2 is the type species that is the best characterized. AAV5 is tied for the most distantly related with reference to VP3 amino acid sequence.
This study further characterizes interactions with the near-universal protein receptor, AAVR. AAVR was only recently discovered though unbiased genome-wide screening as a receptor key for entry and trafficking (
40). Previously, a serotype-specific variety of glycans had been considered to be “primary receptors,” heparan sulfate proteoglycan (HSPG) for AAV2 and sialic acid (SIA) for AAV5 (
41–43). However, it has recently been argued that the glycans have less-specific roles than classic receptors, and following virological convention, should be considered attachment factors (
44), anchoring viruses to cell surfaces but not mediating productive entry. Several membrane proteins, primarily tyrosine kinase receptors and integrins, were also identified as coreceptors for different serotypes, but they have not figured in several more recent knockout screens (
40,
45–54). Current evidence indicates that AAV2 and AAV5 attach to cells using different extracellular glycans, that both viruses depend on AAVR for entry and trafficking, and that then AAV2 (but not AAV5) has a downstream dependence on another host membrane protein, GPR108 (
54,
55).
AAVR is a C-terminally anchored transmembrane protein, in which the ectodomain (from the N terminus) consists of a signal peptide, a MANEC domain (motif at N terminus with eight cysteines) then five Ig-like polycystic kidney disease (PKD) domains (
56,
57). It is the PKD domains that bind AAV, but surprisingly, there are different serotype-specific domain dependencies (
55). For AAV2, PKD2 is most important, but PKD1 has an accessory role, whereas AAV5 is exclusively dependent upon PKD1 (
55). These determinations were made by (i) surface plasmon resonance (SPR) measurements using AAV and heterologously expressed AAVR domain fragments, (ii) transduction inhibition through addition of solubilized domain fragments, (iii) knockout through domain-deletion, and (iv) viral overlay assay (
40). Concurrent cryogenic-electron microscopy (cryo-EM) structure determinations using different expressed AAVR fragments, PKD1-5 or PKD1-2, revealed PKD2 bound to AAV2 at 2.8 and 2.4 Å, respectively (
44,
58). Even though the samples contained 5- and 2-domain fragments, respectively, only the most tightly interacting domain (PKD2) was revealed. Cryo-electron tomography (cryo-ET) of an N-terminal fusion of maltose-binding protein (MBP) and PKD1-5, combined with cross-linking mass spectrometry (XL-MS) was consistent, showing anchoring of PKD2 to the viral surface, and the PKD3-5 domains emanating radially in at least four configurations (
44). Then, in succession, came cryo-EM structures of AAV5 complexes, PKD1-5 at 3.2-Å and PKD12 at 2.5-Å resolution, now showing just the PKD1 domain, which alone had previously been implicated (
55,
59,
60). Intriguingly, the homologous PKD1 and PKD2 domains were not accommodated as variations of a single AAVR-binding site on AAV, but were at distinct sites. One could then best imagine evolutionary divergence occurring through an ancestral form that bound both domains, but overlay of the structures eliminated simple explanations with the finding that the domains could not be connected plausibly by the unseen 5-residue linker (
60).
Cryo-ET has technical advantages enabling determination of 3D structures of flexible molecules in heterogeneous configurations, such as AAVR with its variable PKD domain orientations. In contrast to single-particle cryo-EM, where a single 2D image from many identical or nearly identical particles (10
4 to 10
6) are aligned and averaged into a 3D reconstruction, in tomography, 3D images of every individual particle are realized by tilting the microscope stage. This technique has some limitations because the sample can only be tilted within a range of angles between −65° and +65°. A consequence of this is that the resulting 3D reconstructions have a “missing wedge” of information that can distort the 3D volumes. However, the missing wedge can be filled by averaging between aligned subvolumes containing a structure of interest in different orientations and thus with different missing wedges. A structure can be split into subvolume parts for classification and averaging to characterize variability in heterogeneous regions. This can be a particular advantage for structures such as virus-receptor complexes where different copies of a viral capsid protein could have receptor bound in a different configuration. There have been several successful applications of the approach, to for example, the heterogenous structure of simian immunodeficiency virus (SIV) envelope glycoprotein when bound by CD4 receptor or monoclonal antibody 36D5 (
61).
Here, we use cryo-ET to focus on the 2-domain receptor complex of AAV for a holistic and hybrid comparison with single-particle cryo-EM to locate the parts that had been refractory to the high-resolution cryo-EM. It uses the unique advantages of cryo-ET to distinguish different conformational states, focusing reconstructions on the subvolumes surrounding each 3-fold axis to reveal the hitherto unseen domains in the AAV-PKD12 complexes and other elements of both the receptor and virus structures that have been smeared beyond recognition in the 60-fold averaged cryo-EM reconstructions.
DISCUSSION
Cryo-ET has allowed a subvolume classification that revealed the locations of receptor domains which were missing from the previous single-particle analyses (SPA), even with attempted SPA subvolume classification (
44,
69). For AAV5, PKD1 had been visualized by SPA (
59,
60), but there had been no sign of PKD2. The cryo-ET showed PKD2 doubled back over the top of PKD1 in two orientations differing by ~30°, likely neither of sufficient occupancy and order to be resolved by cryo-EM SPA. Such disorder and heterogeneity are consistent with PKD2 having few interactions, limited contacts with PKD1 near the interdomain hinge, and no contacts with AAV5 beyond the PKD1-2 domain linker for either of the classes. The distal locations of PKD2, revealed by cryo-ET, are also consistent with analysis of domain-deletion and chimeric domain-swapped mutants, which indicated that PKD1, but not PKD2, has significant impact upon AAV5 cellular transduction (
55).
AAV2 presented more of an enigma, because the same mutational analysis found that PKD2 was most important for AAV2 entry, but PKD1 also enhanced transduction, though to lesser extent. PKD2, the more critical for transduction, had previously been resolved by SPA (
44,
58), but the “accessory” PKD1 had not. Prior to the SPA structures of AAV5-AAVR complexes (
59,
60), we hypothesized that the unseen PKD1 might be interacting loosely with AAV2 at a site corresponding to the (yet to be determined) AAV5/PKD1 interface. The cryo-ET shows that none of PKD1 locations of any of the four classes in the AAV2 complex bear any resemblance to PKD1 as bound by AAV5.
However, one of the four PKD1 classes has some direct contact with AAV2 proteins. This is consistent with PKD1 playing an accessory role not strictly required for, but enhancing, cellular transduction. Note, however, that only one of the four AAV2 classes appears to make contact, and the contact is not extensive. Thus, it is not surprising that there can be the observed wide-ranging heterogeneity in domain orientation, the four classes spanning a 120° rotation about the interdomain hinge. While we would expect the more populated orientations to rise to the top of classification, there might well be diversity beyond the four discretely classed orientations (as indicated by the XL-MS), and it is not surprising that PKD1 was not detectable by SPA. Clearly the level of interactions between PKD1 and either AAV2 or PKD2 are insufficient to restrict conformational heterogeneity, so one wonders whether the interactions with AAV2 can be strong enough to have a measurable direct impact upon transduction through avidity. It seems more likely that either PKD1 increases the availability or stability of AAVR in a state compatible with the binding of AAV2 to PKD1 or that there is a different step in AAV entry in which PKD1 has a role.
Completely unanticipated was the unmodeled density on the inside surface of the AAV2/PKD1-2 complex (but not the AAV5/PKD12 complex). It correlates inversely with the strength of βA density, density for βA being much weaker when the unmodeled features are seen. Thus, it appears that we are observing an equilibrium between two states, one with an ordered βA extending from the 5-fold region and the other with a partially ordered N-terminal region coming from the inner surface protrusion, skipping βA, and joining the jellyroll fold capsid protein at the βA-βB hairpin turn. The volume of the inner protrusion is commensurate with that expected of the N-terminal 35 residues of two VP3s meeting at a 2-fold axis, although one cannot rule out partial occupancy by VP2 or VP1. Whether and how this equilibrium in N terminus location is influenced by receptor-binding far away on the outside surface are unknown.
Another surprise was the previously unseen fragments of β-strand structure adjacent to PKD1 in its complex with AAV5. They lacked distinctive features to identify by sequence. Nevertheless, there are a limited number of plausible possibilities. The N-terminal regions of the capsid proteins have never been seen at high resolution. While in this study partially ordered structures were seen on the interior surface of AAV2, crystal structures of some AAVs and autonomous parvoviruses have indicated that a fraction of N termini (of at least VP3) might be external: partially ordered density running down the 5-fold pore from the outside is interpreted as the connection to the start of the β-barrel on the inside surface (
70–73). The absence of density on the 5-fold axis in the AAV5 single-particle analyses lessens the likelihood that the unaccounted features are previously unresolved N-terminal parts of the viral protein outside the capsid.
Alternatively, the extra peptides could come from unmodeled regions of AAVR. Dimers and higher oligomers are seen in preparations of PKD1-2 constructs (and MBP-PKD1-5 fusions) (
44,
74). To date, AAVR dimers have not been observed bound to AAV5, but one cannot exclude the possibility that a small fraction of receptors in the complex are dimerized, with disorder that precludes EM observation of most of the second subunit.
This work is a testament to the value of combining multitechnique, multiscale approaches for flexible complexes, and in recognizing gaps in our understanding through exclusive reliance on high-resolution structure. A plan for multiple contingencies involved not only integration of different EM techniques, but also upstream redundancy in expression constructs, both of which were needed for a more robust and holistic understanding. It is noted that the first application of cryo-ET, to a complex of AAV2 with a PKD1-5 MBP fusion construct, led to a very low-resolution visualization that lacked domain definition or perception of conformational heterogeneity (
44). It was only with a smaller construct, His
6-PKD1-2, that higher binding occupancy was achieved and conformational heterogeneity from domains 3 to 5 was eliminated, making it possible to classify the remaining heterogeneity and resolve distinct configurations for the two proximal domains. On the technical side, it is noted that fully automated classification of subvolume tomograms within a symmetrical particle was not yet possible. It is hoped that examples like this will inspire ongoing algorithm development, so that future applications will not be limited by the laboriousness of interactive classification.