Human leukocyte antigen (HLA) molecules and viruses are thought to be locked in an evolutionary arms race, where viruses adapt to evade HLA-restricted immune responses and HLA alleles evolve to optimize the fitness of human populations in the face of a wide range of pathogen species as well as the genetic variation within each pathogenic species. HLA diversity has been driven and maintained by heterozygote advantage (
25), which is most evident in geographical regions with greater pathogen diversity (
51), and by frequency-dependent selection, in which low-frequency allelic variants gain advantage in an environment of shifting pathogen selection (
58). In turn, the selective pressures of HLA-restricted immune responses on pathogens are evident in a range of immune evasion strategies employed by viruses and encoded in their genomes, such as the ability of large DNA viruses (e.g., herpesviruses) to “hide” by inhibiting antigen presentation (
61) and mimicking host peptides (
39,
60) or the ability of RNA viruses to “run” through rapid evolution of genetic diversity (
22,
35,
40,
43,
52,
53).
We along with others have explored the rapid viral adaptation to HLA-restricted immune responses using sequence analyses and have detected statistically significant associations between host HLA alleles and specific amino acid polymorphisms of human immunodeficiency virus (HIV) and hepatitis C virus (HCV) (
4,
8,
28,
29,
43,
62). These findings have informed and directed experimentation which has, for example, confirmed that some of these HLA allele-specific viral polymorphisms are due to abrogation of HLA binding or peptide processing (
17,
28,
29,
34,
62). In contrast, there is a paucity of direct evidence linking HLA evolution to the selective pressure of pathogens as the reproductive advantage for humans operates on a long timescale (
5). Limited direct evidence from a set of 34 oncoproteins and HIV Nef suggests that HLA alleles might preferentially target evolutionarily conserved peptides (
12,
23). As functionally important sites on proteins tend to be evolutionarily conserved (
12,
26,
64), immune surveillance of conserved ligands focuses immune resources to genomic areas in humans and pathogens where mutations might alter function (
26,
57) or incur a fitness cost (
28,
29,
66).
The recent availability of large curated databases of genetic sequences has aided in the investigation of evolutionary relationships between human and pathogen genetic diversity. These databases enable studies of evolutionary conservation using sequence variation (
2). In addition, the experimental determination of tens of thousands of HLA binding affinity measurements (
48) has allowed robust estimation of binding affinities for a wide range of HLA-peptide combinations (
38). These data allow direct investigation into the relationship between HLA binding and target sequence conservation, as well as into the differences in these patterns across viral species and different HLA alleles.
DISCUSSION
In this study, we have found that HLA class I molecules preferentially sample conserved regions of human proteins and many viral families, as initially hypothesized by Hughes and Hughes (
23). We uncovered a striking exception in the arboviral
Flaviviridae species, where HLA molecules preferentially target nonconserved regions. This methodology provides a capacity to map the landscape of host-virus interactions from a novel perspective and also allows for closer examination of these effects at the viral protein level (see Fig. S4 to S6 in the supplemental material), providing a platform for comparative analyses of the complex coevolutionary relationships that exist between viruses and their human hosts.
These findings also provide evidence for the evolution of HLA class I locus and allelic specialization, suggesting a partial division of labor between the coinherited HLA-A and HLA-B loci. While molecules encoded in both loci participate in surveillance of various proteins, the HLA-A locus and certain HLA-B alleles appear to have a particularly important role in surveillance of evolutionarily conserved regions of the human proteome (
14). This finding is specific to human (rather than randomly selected) proteins and is even more evident at sites of disease-associated mutation, suggesting optimization of ligand selection through human (and ancestral vertebrate) evolution.
Further evidence of partial HLA specialization can also be found through analyses of HLA-viral interactions as HLA alleles that target conserved elements from the human protein repertoire also target conserved regions of human-adapted DNA viruses. In this respect, our findings are supported by other studies (
18,
39,
60) indicating that these ancient DNA viruses exploit holes in the repertoire of reactive T cells created through thymic selection, thereby evading effective immune surveillance by maintaining similarity to self-peptides. We extend these observations to show that the extent to which individual HLA alleles are adapted to bind conserved human protein elements is highly correlated with their targeting efficiencies toward DNA viruses. We also find that HLA-B alleles tend to more efficiently target conserved regions of RNA viruses. These results are in keeping with those of Prugnolle et al., who noted that relationships between pathogen diversity and balancing selection are particularly evident at the HLA-B locus (
51). They are also supported by the findings of McAdam et al. (
41) and by Hughes et al. (
24), who found that a large number of HLA-B alleles are products of small-scale recombination events and that the HLA-B locus evolves much more rapidly than the HLA-A locus, suggesting that these two loci have been subject to different types of natural selection over long periods of time in response to different pathogenic threats. Our results are also in line with evidence of more effective HLA context-specific purifying selection followed by reversion in RNA viruses than in DNA viruses, as reported by Hughes et al. (
22).
It is important to emphasize that these preferences exhibited by HLA alleles are not evident when either HLA binding energies or evolutionary conservation of target peptides is considered in isolation but only when these factors are considered together. This is in keeping with the findings of Istrail et al. (
30), who conducted genome-wide analyses of binding preferences of HLA supertypes and found no meaningful differences in the tendency of HLA alleles to bind human proteins over proteins from other organisms.
Viewed from the perspective of viral evolution, these data suggest that viral species choose distinct adaptive pathways under HLA-restricted immune selection (
1,
40). This is most dramatically illustrated for the arboviral
Flaviviridae species, in which variable rather than conserved proteomic regions are the preferred targets for HLA binding. Evolution toward the “extinction” of predicted HLA targets in the dengue virus genome has been noted previously (
21). In this context, it is interesting that dengue virus infection actively promotes (rather than downregulates) TAP (transporter associated with antigen processing)-dependent antigen processing and HLA class I cell surface expression during flavivirus infection (
20), indicating that the flaviviruses employ immune evasion strategies that are the opposite of those of many DNA viral species. This particular adaptive strategy may be influenced by the fact that arboviral flaviviruses must maintain the ability to infect arthropod vectors as well as vertebrate hosts (including nonhuman primates) without significant genomic adaptation (
37,
59).
The HLA targeting efficiency scores may also prove a useful tool for predicting patient response to infections, as illustrated by the examples of disease outcomes in dengue virus and HIV-1 infections. These scores provide an example of a novel, numeric, and real-valued representation of an HLA molecule, which can be utilized to quantify similarities and differences between HLA molecules based on a target preference function. Such a projection allows identification of common targeting characteristics among patients with different HLA types, thus potentially increasing statistical power in the analysis of patient cohorts. This representation is similar in concept to HLA supertypes (
56), which was previously the only method for classifying HLA alleles while attempting to retain biologically meaningful differences. Further studies will be required to investigate these attributes, but it is notable that relationships between HLA targeting efficiency and HLA supertype classifications are by no means uniform, as evidenced in the HIV viral load analysis as well as in the data shown in Fig.
4 and
5 and in Fig. S4 in the supplemental material.
However, multiple factors contribute to disease expression in the context of viral infection, and HLA class I binding is only one of many necessary but not sufficient, genetically determined factors involved in antigen processing and the subsequent generation of pathogen-specific immune responses. To investigate the potential influence of one of these factors, we examined the effect of proteasomal cleavage on HLA targeting efficiency. We found that proteasomal cleavage restriction was also directed toward conserved targets (see Fig. S7) but that the tendency of HLA alleles to target conserved regions remained as strong even when only the peptides which were likely cleavage targets were considered. This suggests that both HLA-peptide binding and proteasomal cleavage have been co-optimized to target conserved regions.
Previous studies of HLA allele-specific viral polymorphisms (
27-
29,
60,
63) have shown that adaptive interactions between individual human hosts and autologous viral populations are unique and highly dynamic, involving the evolution of HLA-specific CTL escape mutations that are known to influence the natural history of viral infection (
43). We therefore offer that the methods described here, designed to investigate the broad patterns of host-pathogen coevolution across multiple viruses, complement other approaches that examine one virus at a time, such as studies that reveal host-virus adaptation by assessing HLA-associated viral polymorphisms (
27,
60,
63) or phylogeny (
28,
29).
Our analyses using a diverse array of HLA alleles and viral proteomes suggest that, in general, HLA-A preferentially targets DNA viruses and that HLA-B preferentially targets RNA viruses while both HLA-A and -B alleles tend to bind to nonconserved regions in arboviral flaviruses. It must be emphasized that these broad observations identify trends and will not generalize to all the viruses and the individual proteins or epitopes within those viruses.
Although viruses typically encode thousands of amino acids, most of the responding CD4
+ and CD8
+ T cells recognize a tiny fraction of the potential antigenic determinants (
65). This serves to maximize the efficiency of clonal recruitment and activation for a highly specific and avid antiviral response. More than 90% of CD8 T-cell immunodominance is thought to be explained by HLA-peptide binding affinity, as only ∼1% of peptides form a complex with HLA class I molecules with sufficient stability to be presented in adequate numbers to activate naïve CD8
+ T cells (
65). While within-host immunodominance may underpin efficient primary and secondary responses to some acute viral infections, the extreme dominance of a few or even single clonotypes associated with some persistent viruses and vaccine-induced responses is problematic if those immunodominant responses are not protective. The study of targeting efficiency in such infections may help clarify, in part, virus-specific immunodominance patterns and the strategies different groups of viruses have taken to counteract these. This has significant implications for vaccine immunogen design as the efficacy of an immunodominant vaccine-induced response is likely to be improved if directed against determinants that have high targeting efficiency and are functionally important to the virus rather than determinants that reproduce the counter-evolutionary strategies of the virus.
In conclusion, this study has taken advantage of recent advances in large-scale genome sequencing, HLA binding measurements, and curation, along with the availability of computationally intensive analysis techniques, to address the hypothesis that HLA class I-restricted peptide sampling is preferentially targeted to evolutionarily conserved, functionally important regions of human and viral proteomes. The data support this view and also provide support for balancing selection of HLA class I allelic diversity (particularly at the HLA-B locus) anchored on this property in response to the challenges provided by diverse human viruses. The approach provides a novel perspective on the ongoing coevolutionary relationships between HLA class I polymorphism, adaptive T-cell immunity, and the self-peptides and viruses that engage with these systems.