INTRODUCTION
Polyomaviruses (PyVs) are nonenveloped, double-stranded DNA viruses that can infect a large range of mammalian species and birds. Their viral genome typically encodes two regulatory proteins, the small and large T antigens, and three structural proteins, VP1, VP2, and VP3. The structural proteins assemble into a T=7d icosahedral shell that encapsidates the viral DNA (
1).
Since their discovery 6 decades ago, polyomaviruses have emerged as highly useful models to understand the molecular events leading to cellular transformation and cancer. The human polyomaviruses JCPyV (
2) and BKPyV (
3) were discovered in 1971 and subsequently shown to be associated with progressive multifocal leukoencephalopathy and polyomavirus-associated nephropathy, respectively. Remarkably, 10 new human polyomaviruses have been identified in the last 6 years (
4–14). One of these, Merkel cell polyomavirus (MCPyV), is the causative agent of Merkel cell carcinoma, a rare neuroendocrine tumor of the skin (
6). Infection with another recently identified polyomavirus can lead to trichodysplasia spinulosa, which, while not life threatening, causes significant disfigurement (
8). This disease is seen almost exclusively in transplant recipients immunosuppressed with drugs such as cyclosporine (
15).
Human polyomavirus 9 (HPyV9) was identified in 2011 from the serum and urine of an asymptomatic kidney transplant patient (
9) and a skin sample of a patient with Merkel cell carcinoma (
10). HPyV9 has a high overall nucleotide sequence identity (76%) with B-lymphotropic polyomavirus (LPyV), which is also sometimes referred to as African green monkey polyomavirus (AGMPyV). Although, the role of HPyV9 in human disease has not yet been established, the isolation and identification of HPyV9 have resolved the puzzling finding of the serological cross-reactivity against LPyV in 20 to 30% of apparently healthy humans (
16–20). Antibodies against polyomaviruses are mostly elicited against their apical surface, formed by the viral major capsid protein, VP1 (
21–23). HPyV9 VP1 has 87% amino acid identity with LPyV VP1 (
9,
10). It has therefore been hypothesized that these viruses share immunogenic epitopes for antibody engagement (
9,
10,
19,
20) and perhaps also other properties.
VP1 is the major polyomavirus capsid protein, and a collection of available VP1 structures from different polyomaviruses shows that the protein adopts a jelly roll fold and assembles into a homopentamer around a central cavity (
24–31). While the pentameric core structure is conserved in all VP1 proteins, these structures from different polyomaviruses exhibit substantial differences in the sequence, length, and conformation of their surface-exposed loops. These highly variable regions contain the sites for receptor engagement (
25–27,
29) as well as antibody binding (
21–23,
32). All human polyomavirus VP1 structures known to date engage glycans terminating in 5-
N-acetyl neuraminic acid (Neu5Ac) as receptors (
27,
29–31), with, in some cases, drastically different interactions taking place.
Whereas humans cannot synthesize a modified version of Neu5Ac, 5-
N-glycolyl neuraminic acid (Neu5Gc), most other mammals can produce both Neu5Ac and Neu5Gc (
33,
34). Humans can, however, acquire Neu5Gc from the red meat and milk in their diets. Assimilated Neu5Gc is therefore found in the human gut epithelium and kidney vasculature (
35–37). Some viruses and bacteria can discriminate between the subtle differences of neuraminic acid variants and are able to specifically recognize only one of them, an ability which, in turn, influences their host range. For example, the polyomavirus simian virus 40 (SV40) preferentially binds to the Neu5Gc version of its GM1 ganglioside receptor (
38), and this specificity is thought to limit SV40 infections in humans. On the other hand, the closely related human BKPyV exclusively recognizes receptors containing Neu5Ac but not those containing Neu5Gc (
30). Similarly, animal rotavirus strains have variable preferences for Neu5Gc or Neu5Ac. The porcine and bovine rotaviruses prefer Neu5Gc as a component of their attachment receptor, whereas the rhesus monkey rotavirus apparently utilizes Neu5Ac-type receptors for infection (
39). Finally, the subtilase cytotoxin secreted by Shiga-toxigenic
Escherichia coli (STEC) specifically recognizes Neu5Gc-type glycan receptors in the human gut epithelium and kidney vasculature (
35).
In order to define the receptor-binding specificity of HPyV9, we performed glycan microarray screening with recombinant HPyV9 VP1 and LPyV VP1. We also solved the crystal structure of HPyV9 VP1 in complex with relevant oligosaccharides containing Neu5Gc and Neu5Ac identified in the array, namely, Neu5Gc-α2,3-Gal-β1,4-GlcNAc (3GSLN) and Neu5Ac-α2,3-Gal-β1,4-GlcNAc (3SLN), and also with the sialyllactose Neu5Ac-α2,3-Gal-β1,4-Glc (3SL). A comparison of the structure of HPyV9 VP1 with that of LPyV VP1 when the proteins were in complex with Neu5Ac-containing glycan ligand (
31) identified subtle differences in the binding site, which result in altered interactions with the different sialic acids present in the oligosaccharide structures. Furthermore, a comparison of the atomic structures of the VP1 proteins from HPyV9, LPyV, and MCPyV provides a basis for understanding properties such as antigenic variability and the serological cross-reactivity among these three viruses.
DISCUSSION
A large number of viral and bacterial pathogens engage sialic acid-based compounds to establish their initial contact with a target cell (
54). Although sialic acids often display modifications, the molecular principles that govern recognition of sialic acid variants are poorly understood, as the majority of structures have been solved using Neu5Ac-based compounds as ligands. Neu5Gc, which can be synthesized by nonhuman mammals but not by humans, has attracted much interest in recent years due to its role in understanding and defining host and tissue tropism (
35,
38,
39). In order to investigate differences in sialic acid receptor binding in two closely related viruses from different species, we analyzed the ligand specificity and determined the structure of the major capsid protein VP1 from HPyV9, which was isolated from humans, and compared its properties with those of the VP1 protein from the monkey-derived virus LPyV.
We found that HPyV9 differs from LPyV with its unexpected preferential binding to the Neu5Gc glycan, which humans can acquire only through a diet containing red meat and milk. The observed preference for Neu5Gc can be explained by the architecture of the sialic acid binding site in HPyV9, which allows specific recognition of the extra Neu5Gc hydroxyl group via hydrogen bonds to side chains. Three residues unique to the HPyV9 binding site (D71, S280, and N282) act in concert to achieve this specificity. Replacement of one of these residues, N282, with the valine found in LPyV is not sufficient to alter the specificity and in fact compromises the binding pocket. Thus, the preference of HPyV9 for Neu5Gc is likely mediated by the set of three residues. A substitution at the entrance to the binding pocket (S76) likely enhances its accessibility in HPyV9 compared with that in LPyV (F76), in line with the observed binding profile of HPyV9 on glycan microarrays.
Only two other crystal structures of proteins in complex with Neu5Gc-based receptors have been reported: those of the porcine rotavirus strain CRW-8 spike protein domain VP8 (
39) and the subtilase cytotoxin B subunit (SubB) produced by Shiga-toxigenic
Escherichia coli (STEC) (
35). Similar to HPyV9 VP1, CRW-8 VP8 can bind both Neu5Ac and Neu5Gc, but it prefers the latter compound, which shows a higher binding affinity and results in increased cellular infectivity (
39). Likewise, SubB can also bind Neu5Ac and Neu5Gc in glycan microarrays, but structural studies were successful only for the Neu5Gc complex, again indicating a preference for Neu5Gc binding (
35). A comparison of the Neu5Gc-binding sites of HPyV9 VP1, CRW-8 VP8, and STEC SubB shows that all three proteins acquire their preference for the glycolyl chain of Neu5Gc through a hydrophilic environment in its vicinity (
Fig. 7). In all three cases, two hydrogen bonds are formed between protein residues and the extra hydroxyl group in Neu5Gc. Mutation of any of these residues results in a significant loss of activity of CRW-8 VP8 or SubB (
35,
39). In contrast to HPyV9, the binding site for Neu5Gc is more surface exposed in CRW-8 VP8 and STEC SubB, with one side of the ring facing the protein and the other side facing the solvent. This is also in accordance with the larger buried surface area of Neu5Gc in the HPyV9 complex (314.5 Å
2) than in the CRW-8 VP8 and STEC SubB complexes (229.7 Å
2 and 234.6 Å
2, respectively) (
51).
Bacterial toxins similar to SubB, such as the pertussis toxin of
Bordetella pertussis, do not exhibit a preference for Neu5Gc, as they lack the residue needed to coordinate the extra hydroxyl group of Neu5Gc, but they nevertheless accommodate Neu5Ac in a similar location and orientation (
55). Similarly, in contrast to porcine rotavirus, the related rhesus monkey rotavirus VP8 preferentially binds Neu5Ac, also as a result of a single amino acid substitution (G187 to K187) that abolishes the ability of the virus to contact the hydroxyl group of Neu5Gc (
39). The significance of these specificities for tropism is not always clear, although it has been suggested that the SubB has adapted to bind Neu5Gc so that it can use metabolically incorporated Neu5Gc as a receptor.
Our analysis shows that HPyV9 and LPyV form a closely related pair of viruses that differ in receptor specificity and, apparently, in species tropism. A similar pair of two closely related polyomaviruses is formed by BKPyV and SV40, which infect humans and simians, respectively. The BKPyV and SV40 VP1 proteins share a high degree of sequence identity and bind terminal sialic acid in a mostly conserved binding site that differs from the one observed in HPyV9 and LPyV (
26,
30). However, SV40 displays a preference for receptors containing Neu5Gc over receptors containing Neu5Ac (
38), whereas the sialic acid binding site of BKPyV exclusively binds to the smaller Neu5Ac and cannot accommodate the larger glycolyl chain of Neu5Gc (
30). Thus, for these two viruses, the receptor specificities correlate well with the biosynthetically available variants of sialic acids in the respective hosts. However, an analogous correlation obviously does not exist in the case of the HPyV9 and LPyV pair of polyomaviruses.
Why, then, does the binding pocket of human HPyV9 favor binding to a ligand, Neu5Gc, that cannot be synthesized by humans? This question is difficult to answer at present, but the ability to engage Neu5Gc is likely advantageous for HPyV9. Humans can acquire Neu5Gc from their diet, as red meat and milk are rich sources of Neu5Gc. During digestion of such foods, Neu5Gc can be taken up by fast-growing body cells and is subsequently transferred to newly synthesized glycoproteins and glycolipids in the Golgi apparatus by a transferase that does not discriminate between Neu5Ac and Neu5Gc (
35–37). Thus, the gut endothelium and kidney vasculature in particular display Neu5Gc on their surface, and this is exploited, for example, by the SubB toxin (
35). Similarly, the expression of Neu5Gc on certain carcinomas has also been reported (
56). It is therefore conceivable that the ability to bind to Neu5Gc confers an advantage to HPyV9 by directing it to specific tissues. HPyV9 may therefore have evolved to utilize a receptor that is provided by a diet. It will be interesting to test this hypothesis once tissue culture systems that allow propagation of HPyV9 are established. Whereas LPyV can propagate only in the cells of B lymphoblastoid origin (
57), the reproductive niche for HPyV9 is not yet understood. We expect that our findings will facilitate a proper dissection of the attachment and entry mechanisms used by HPyV9.