INTRODUCTION
BK polyomavirus (BKPyV) is one of more than 10 human polyomaviruses (HPyVs) which belong to the
polyomaviridae family found in nearly all vertebrates (
1,
2). BKPyV infects more than 90% of the general human population without specific signs or symptoms but can cause significant diseases in immunocompromised patients (
3–5). The leading entities are nephropathy, and hemorrhagic cystitis, which complicate 5%–25% of mostly kidney and allogeneic hematopoietic cell transplant recipients, followed by approximately 0.1% of urothelial cancers carrying a chromosomal integration of BKPyV (
6,
7). Pathology, rate, and risk factors of BKPyV diseases differ in different patient populations, but their underlying common theme is insufficient adaptive immunity to BKPyV (
8). Since there are no effective antiviral drugs for treatment or prevention of BKPyV replication and associated diseases (
9), current clinical management relies on reconstituting BKPyV-specific humoral and cellular immunity. However, our earlier work has identified significant changes in 9mer epitopes which were associated with failure of CD8 T cells to activate polyfunctional responses, to kill, and to proliferate (
10–12). Similarly, escape from neutralizing antibodies is suspected of contributing to escape from humoral adaptive immune control (
13–16).
BKPyV virions are non-enveloped icosahedral capsids of 40–45 nm diameter formed by 72 pentamers of the major capsid protein Vp1 outside and one Vp2 and one Vp3 inside at a ratio of 5:1:1. Inside, the circular double-stranded DNA genome of approximately 5.1 kb is packaged using host cell-derived histones (
8). Akin to other HPyVs, the BKPyV genome can be divided into three major regions called the non-coding control region (
NCCR), the early viral gene region (
EVGR), and the late viral gene region (
LVGR). The
NCCR harbors the origin of viral DNA replication and bidirectional intertwined promoter/enhancer sequences. Together with host cell factors, the
NCCR regulates the sequential expression of the
EVGR-encoded regulatory large and small T-antigen (LTag, sTag), the viral DNA replication and expression of the
LVGR-encoded regulatory agnoprotein, and the structural capsid proteins Vp1, Vp2, and Vp3 (
17–19). Further on the
LVGR-strand, two micro-RNAs are found downstream of the
VP1-polyadenylation signal and which downregulate
LTAG-transcripts as well as
ULBP3-transcripts, a potential target of natural killer-lymphocytes (
20).
The
VP1-sequence variability of circulating BKPyV gives rise to four major Vp1 serotypes, initially defined by neutralizing antibody (NAb) titers (
21,
22). More recent phylogenetic analyses use larger genome sequences that also include parts of the
EVGR and currently define 12 BKPyV subgroups (
1). Humoral immunity to the intracellular LTag and sTag cannot confer protection by NAbs. However, cellular immunity to these
EVGR-encoded proteins seems to play a critical role which involves immunodominant 9mer peptide clusters presented by HLA-class I molecules to cytotoxic CD8 T lymphocytes (CTLs) (
23–25). Notably, genotype-dependent and genotype-independent variability in the LTag-sequence has been linked to reduced 9mer-directed CTL responses (
10,
11). Thus, sero- and genotype-encoded variability in Vp1 and LTag may impair BKPyV-specific immune control by NAbs and cytotoxic T cells in transplant patients, respectively.
Given the potential impact on diagnostic assays and vaccine design, we set out to analyze the variability of the BKPyV
VP1 and
LTAG protein-coding sequences from public databases and from our recent molecular study on hematopoietic cell transplantation (HCT) recipients (
10) using computational approaches. Our findings revealed that 43% of
VP1 sequences had non-synonymous changes, whereby mutations in 23 amino acid positions were highly prevalent. We analyzed the potential effects on Vp1 structure, especially those interacting with cellular sialic acid receptors, NAbs, and the minor capsid proteins Vp2 and Vp3. Additionally, we explored changes within the LTag protein and assessed their effect on confirmed immunodominant CTL epitopes including their cross-protective potential.
DISCUSSION
Type and rate of amino acid variations in BKPyV may provide important insights into BKPyV diversity in human populations and a first step toward better defining determinants of BKPyV-specific immunity needed to protect vulnerable patients from BKPyV diseases. Indeed, our earlier work identified BKPyV mutant epitopes downregulating polyfunctional cytotoxic T cell activation (
10,
11). Similar changes in serotypes could predict escape from neutralizing antibody responses (
51,
52). Here, we present a comprehensive analysis of amino acid variations in publicly deposited sequences of the BKPyV Vp1 and LTag proteins and include a cross-check with recent data from our center. Using available experimental structures and computational modelling, we placed the amino acid variations in their structural context to visualize conformational, functional, or immunologic aspects.
Our study provides the following insights: first, BKPyV-gt 1 was found in 71.2% of publicly deposited Vp1 GenBank entries, followed by BKPyV-gt4 (19.3%), BKPyV-gt2 (8.1%), and BKPyV-gt3 (1.4%), but prevalence rates differed according to geography and specimen type. Second, 43% of Vp1 carried SXMs or SIMs whereby 18% had more than one amino acid mutation and included changes in antibody-binding domains. Third, LTag sequences were largely conserved, with only 16 mutations detectable in more than one entry and typically without significant effects on LTag-structure or interaction domains. However, some LTag changes were predicted to affect HLA-class I presentation of immunodominant 9mers to cytotoxic T-cells.
Overall, Vp1 sequences displayed a high degree of amino acid variability, suggesting a remarkable plasticity of the major capsid protein, with mutation hotspots around the exposed BC-and DE-loops, which include SDM and neutralizing domains and clearly differ from JCPyV and SV40. There were numerous SXM in positions defining the four major Vp1 serotypes, suggesting that the existing set of serogroup reference sequences may not adequately cover the common serotype-specific contexts. Of note, sequences assigned to different serotypes differed in SXM rates, whereby 95% of serotype-II assigned sequences had at least one SDP while this percentage was lower with 17%, 4.5%, and 14% for serotypes-I, -III, and-IV, respectively. One SXM has been shown to be a neutral mutation within a specific serotype context (N61S within serotype-IV) (
16). The relative structural flexibility of common SXM positions implies the potential of accommodating additional neutral mutations. Previous studies have suggested combining BKPyV-serotype-II and -III into a single serotype (
53). Our own observations demonstrate that BKPyV-serotype-II and -III sequences differ in only two Vp1 positions (62 and 77). Combining both serotypes into a new II-III would be strengthened by further support through independent sequence entries. More importantly, functional studies are needed characterizing serological or functional cross-reactivity as well as shared properties regarding viral infection, replication, and neutralization to justify a novel combined serotype II-III classification. Furthermore, a majority of 834/1286 (65%) Vp1 variants were SIM and hence not linked to a particular serotype. However, the rates of SIMs differed in the different serotypes, being highest with 38% for serotype-IV compared to 25%, 20%, 27% for serotypes-I, -II, and -III, respectively. The sequences from our HCT study were mainly from serotype-I (63/65 sequences) and contained common SIM (D75N, E/D82Q) and SXM (I178V), except for E/D20A which was not observed in the public data.
Of 26 prevalent variants found in Vp1, our structural studies revealed that 6 concerned interactions with sialic acid receptors, 13 changes concerned sites contacting antibodies, and 10 were potentially involved in intra-pentamer interactions. In contrast, Vp2/Vp3 interaction was only indicated in one case. These data suggest that the outer surface of the BKPyV virion can accommodate more variability compared to the inner surface and may not only contribute to serotype specificity but also to gaps in the humoral defense. Indeed, mutations in residues 61 and 82 are predicted to destabilize antibody binding, and those in residues 60, 58, 69, 72, 73, and 82 can alter glycan binding. Vp1 residue positions 61, 69, 82, and 172 had low electron density support in experimentally solved crystal structures indicating their potential for flexibility and, thus, making it difficult to computationally predict a reliable structural effect of mutations in these residues. Interestingly, for eight common variant positions, more than one sequence displayed nucleotide changes matching the APOBEC3 mutational signature. This includes the SIMs D60H/N, E61Q, E73Q, D75N, D77H, E77Q, E82Q/K, and D82N; serotype-IV SXM D62H/N, R69K; and serotype-I SXM R83K. Some of these (D62N, R69K, and D77H) have been identified as APOBEC3 mutations in previous studies (
15). APOBEC3 belongs to a family of ssDNA cytosine deaminases that has been linked to innate antiviral defense by mutating viral genomes (
54). APOBEC3 mutations have been observed across a range of viruses though the level of mutation appears to be significantly lower in DNA viruses compared to retroviruses (
55). However, APOBEC3-like signature mutations were not found at all in 15 prevalent variant BKPyV positions. Taken together, based on structural context, we identified a number of variants with different interactions, which might affect BKPyV replication, virion interaction with host cells, and/or the adaptive immune response, some of which have been experimentally verified (Table S1). The systematic and comprehensive map of BKPyV Vp1 emerging from our study for optimizing diagnostics (
2), but also for vaccine design and therapy approaches, allowing researchers to better prioritize conserved and minimally variable immunodeterminants of the viral proteins in conventional or new orientations (
56,
57) and to evaluate their potential for structural stability, cross-protection, and immune escape.
In contrast, our LTag sequence analysis revealed higher conservation in line with its central multi-functional role often compared to a Swiss army knife, coordinating polyomavirus replication together with timed recruitment of essential host cell functions. Of the only 16 amino acid variants reported at least twice in 695 amino acids, each of the relevant domains had at least one change, whereby possibly the helicase, and in particular the helicase D3-2 appeared to be slightly more affected. Our structural analysis suggested that the amino acid variants were not associated with major conformational changes in line with its conserved multifunctional role. Nevertheless, several variants were non-conservative with respect to size or charge. While these variants are less dramatic than the ones known for the chromosomally integrated Merkel cell PyV in cases of Merkel cell carcinoma, which include protein truncations and/or frame shifts, the functional consequences of LTag mutations on BKPyV replication, if any, need to be addressed in relevant infection models.
Our analysis revealed potential immunological consequences of the LTag amino acid variants. Indeed, the variant changes affected the type and binding to HLA-class I of immunodominant CD8 T cell 9mer epitopes previously identified in kidney transplant patients (
12,
23,
24). In particular, the 9mer epitope L
27PLMRKAYL
35 in the DnaJ region of LTag was mutated in 4 BKPyV entries (P28S), which abolishes predicted binding to 36/44 common HLA-B51, B7, B8, and A24 alleles (
11,
12,
50) and which was reported to contribute to protection of kidney transplant recipients from BKPyV-DNAemia (
50,
58). The BKPyV 9mer epitope L
27PLMRKAYL
35 homolog also has variations in JCPyV and SV40 sequences (L27I in 196 JCPyV and 22 SV40 entries, and L29V in 186 JCPyV entries) though the predicted HLA binding is not affected by these changes. In contrast, epitopes K
216LCTFSFLI
225 and C
218TFSFLICK
227 with having rates of 8% and 5.5% in kidney transplant patients, respectively, are conserved in all available sequence entries for BKPyV, JCPyV, and SV40. K
216LCTFSFLI
225 is predicted to bind to 33 HLA-A2, A32, and B13 alleles and was found in patients having HLA-A2 and HLA-A24. C
218TFSFLICK
227 is predicted to bind to 41 HLA-A11, A30, A31, A68, A03, A34, A66, and A74 alleles and was found in patients having HLA-A3.
Comparison of LTag with the human proteome revealed only a few matches and were often of low complexity with repetitive amino acids, such as leucine in S573GMTLLLLL and lysine in S125TPPKKKRK. Conversely, we identified two immunodominant epitopes conserved across BKPyV, JCPyV, and SV40, which may be highly relevant targets for immunotherapy and vaccine design. These epitopes lie within the DNA-binding region of the LTag OBD required for the viral genome replication, the mutation of which may significantly increase the evolutionary cost of immune escape. Our comprehensive map of LTag integrates sequence variation and conservation, structural interactions with other viral and host proteins with potential transforming (side-)effects and/or nucleic acids, and predicted immunological attributes can serve as a valuable resource to target this multi-functional protein without increasing oncogenicity.
Limitations of our study are the reliance of the analysis on publicly deposited Vp1 and LTag sequences for BKPyV, SV40, and JCPyV. While differences in the portions of Vp1 and LTag sequenced by different studies resulted in uneven alignment coverage across the sequence, at least 50% of the analyzed sequences cover the entire protein, suggesting that the hotspot variability plots still provided a relevant representation of the possible variation. However, there were a few variants and mutant profiles which largely arose from sequences derived from a single study in a single country rather than multiple geographic locations, studies as well as healthy donors and affected and non-affected patients. Nevertheless, when including data from our own next-generation sequencing study of HCT recipients in Basel, independent confirmation as well as new variants could be identified. Thus, these global data strengthen our earlier single-center findings detecting and functionally analyzing mutant 9mer epitopes mediating immune escape from HLA-I cytotoxic T cells. Taken together, this perspective and our analysis of the currently available BKPyV sequences reveal an unexpectedly high genetic variability for this double-stranded DNA virus that strongly relies on the host cell DNA replication machinery with its potential access to proof reading and error correction mechanisms. This should be taken into account when designing further approaches to antivirals and vaccines for patients at risk of high-level BKPyV replication due to insufficient virus-specific immunity.