INTRODUCTION
The global AIDS pandemic is maintained by the occurrence of 2 to 3 million new HIV-1 infections annually (
1). The development of prevention modalities, particularly a vaccine, that can interrupt these transmission events is a major priority for world health (
2–4). Both passive immunization experiments in nonhuman primates (
5–10) and correlates of protection in the RV144 clinical AIDS vaccine trial (
11,
12) suggest that antibodies (Abs) may be important components of a protective immune response elicited by a vaccine.
The envelope glycoprotein (Env) spike, a heterodimeric trimer consisting of three gp120 exterior subunits and three gp41 transmembrane subunits (
13,
14), is the only HIV-1-specific target for neutralizing antibodies (NAbs) (
2,
4,
15). The appropriate presentation of Env to the immune system may be an essential part of a successful vaccine strategy. Unfortunately, the Env trimer has evolved features to minimize the elicitation and impact of neutralizing antibodies, including a heavy coat of carbohydrate (
15–20). Approximately 50% of the mass of gp120 consists of carbohydrate, with most of the more than 24 potential
N-linked glycosylation sites (PNGSs) utilized in most HIV-1 variants (
15,
21–23). The gp41 component of the trimer typically contains 4 PNGSs (
18,
24), and
O-linked glycosylation sites also have been reported (
25–27).
N-Linked glycans contribute to how HIV-1 evades NAbs (
28). Because complex glycans resemble “self”-glycans and are generally less immunogenic than protein moieties, a significant fraction of the surface of the HIV-1 Env trimer is immunologically “silent” (
20). Individual glycans can shield nearby epitopes directly, and the overall network of glycans contributes to the quaternary structure-dependent constraints on antibody access to the trimer (
29,
30). Deleting or inserting specific
N-linked glycans can render HIV-1 globally sensitive or resistant to antibodies that recognize different gp120 and gp41 epitopes (
31,
32). Additionally, differences in Env immunogenicity have been observed when the glycosylation, but nothing else, changes on the immunogen (
33,
34).
Many broadly neutralizing antibodies (bNAbs) generated in HIV-1-infected humans recognize relatively well-conserved gp120 epitopes that include glycan components. One group of such antibodies (2G12, PGT121, PGT128, and PGT135) binds high-mannose glycans (typically Man8 and Man9) on the gp120 outer domain that are conserved in a large fraction of HIV-1 strains (
35,
36). All of these bNAbs except 2G12 also contact the gp120 protein surface, typically using long CDRH3 and CDRH1 loops to reach past the “glycan shield” (
35,
37).
A second group of bNAbs recognizes glycan-related gp120 epitopes that are conserved in 75% to 80% of HIV-1 strains, sensitive to the quaternary structure of the HIV-1 Env trimer, and composed of the V1/V2 and V3 regions (the “trimer association domain”) of gp120 (
38). These bNAbs (PG9, PG16, PGT145, CH01-04, and CAP256) have long, extended CDRH3 regions (
39). Most of these bNAbs depend on an
N-linked glycan at Asn 160, and some make contact with a second glycan at N156 (or N173 in some HIV-1 strains); Man3 or Man5 glycans at these positions are sufficient for antibody binding (
38–40). A third group of bNAbs (35O22, PGT151, and PGT152) recognizes hybrid gp120-gp41 epitopes that include glycans (
41,
42). In summary, because glycans are key components of various bNAb epitopes, the absence of the correct glycans from Env vaccine candidates is likely to reduce the chances of eliciting bNAbs that can recognize the known glycan-dependent epitopes on the virus.
Glycosylation analysis is a critical component of characterizing Envs, particularly those under consideration as immunogens for human trials. The first analysis of Env glycans on virions showed that predominantly high-mannose glycans were present and that those glycoforms were different from what typically had been observed on soluble, recombinant gp120 and gp140 proteins (
43,
44). While these data were surprising and informative, the data set was incomplete because the glycans present at individual sites were not characterized. A subsequent attempt to analyze the glycans at each site on virion-derived Env yielded an incomplete data set—only 13 glycopeptides were identified in total, and most of the glycan sites were not detected (
45). More recently, Go et al. published a site-by-site glycosylation analysis of a recombinant JR-FL (clade B) Env trimer which yielded information on every site and identified >600 different glycoforms, a 30-fold increase over the earlier study on virion Env (
25). The recombinant JR-FL trimer used in that study, a membrane-anchored gp150 protein purified by nickel-nitrilotriacetic acid (Ni-NTA) affinity, had multiple high-mannose glycans at specific sites, consistent with the original glycan-based virion analyses (
43,
44). The major finding of the study by Go et al., that Env trimers contain multiple sites with high-mannose glycans, was replicated more recently in two analyses of the glycosylation pattern on the soluble, recombinant BG505 (clade A) SOSIP.664 gp140 trimer, purified with a 2G12 column (
46,
47). In one BG505 analysis, data on 20 of 28 glycan sites were reported, with about 160 unique glycopeptides detected (
46). Many of the characterized sites were exclusively occupied by high-mannose glycans (
46). A more recent analysis of this same BG505 trimer provided glycosylation coverage at every site, with 230 glycopeptides reported (
47). The studies on JR-FL gp150 and BG505 SOSIP.664 gp140 represent two examples of trimeric Env glycosylation, but the analyses were carried out using proteins based on different Env genotypes and on different construct designs and that were produced in different cell lines; neither study involved a natural HIV-1 target cell type or a full-length Env (
25,
46,
47). Accordingly, it is unclear how these protein design and expression differences impacted the results. A subsequent analysis of glycosylation on a third genotype of Env (BAL, which is a clade B genotype), isolated from virions and purified by reverse-phase high-performance liquid chromatography (rHPLC), has been reported recently (
48). Some of the glycan assignments in the latter virion study differed from ones identified in the earlier reports on the recombinant JR-FL and BG505 trimers for reasons that are not yet known. In summary, while multiple Env glycopeptide analyses all showed that trimeric Env is enriched for high-mannose glycans, it is unknown whether there is a consensus glycosylation profile at some sites, whether the producer cell or purification method significantly impacts the glycosylation profile, and whether the two truncated construct designs studied thus far—membrane-anchored gp150 and SOSIP-stabilized gp140—are good approximations of full-length Env.
We sought to elucidate as many of these unknowns about the glycosylation of Env trimers as possible in a single study. For the first time, we performed glycopeptide analyses on a large collection of Envs, by studying nine new proteins, in order to create a data set based on a total of 11 trimers. We then used the combined data set to explore the impact of multiple variables on Env glycosylation and to answer the following two key questions. (i) Is there a global “consensus” glycosylation profile of trimeric HIV-1 Env, present in a large set of diverse glycoproteins, that could be used for evaluating future Env immunogens? (ii) What variables contribute to Env glycosylation heterogeneity? The resulting identification of a native Env consensus glycosylation profile now provides vaccine developers with a “glycosylation target” for their immunogens, information that should assist in the eventual development of an effective HIV-1 vaccine.
RESULTS AND DISCUSSION
Here, we compare glycopeptide profiles for 11 different Env trimers and several other gp120 and gp140 proteins. Three of the trimer data sets are available in the literature, originally generated by the Desaire laboratory (
25), the Crispin laboratory (
46), and the Dell laboratory (
48). Eight additional Env trimers were characterized for the present report. The full sequences and construct designs for these trimers are summarized in
Fig. 1. We used the complete data set to identify a consensus glycosylation profile for trimers, which we then compared to the profiles for various gp120 and gp140 proteins. The glycosylation data on the latter proteins, many of which are former or current vaccine candidates, were mainly compiled from published reports, although the data on the gp140(-)FT protein from the C97ZA.012 genotype were obtained as part of the present study. Mass spectrometry data for this protein are shown in
Fig. 2 and are used to demonstrate the analysis procedure performed on all the Envs.
The newly derived Env glycopeptide profiles were obtained using methodologies that we developed over the past 10 years (
25,
49–53). The various proteins were subjected to tryptic digestion, and the resulting glycopeptides were characterized using liquid chromatography-mass spectrometry (LC-MS) and tandem mass spectrometry (MS/MS). A typical high-resolution mass spectrum from a glycopeptide-rich region of one of the chromatograms, shown in
Fig. 2A, contains a wide array of glycoforms that coelute. The glycopeptide compositions were identified by interrogating the high-resolution data along with the corresponding MS/MS data acquired on each of the relevant peaks. Both collision-induced dissociation (CID) and electron transfer dissociation (ETD) data were used to support the glycopeptide assignments reported here. Examples of MS/MS data of each type (i.e., CID and ETD) are shown in
Fig. 2B and
C. When every relevant peak in the extensive LC-MS data file was analyzed, as described previously (
49,
50,
54), the outcome was a complete set of glycopeptide assignments for the glycoprotein. The full list of glycoforms detected for each of the proteins analyzed in this study is provided in Tables S1 through S8 in the supplemental material. Using the glycopeptide analysis methods described here, a highly rich glycosylation profile for each protein can be obtained. For example, ∼600 unique glycopeptides were identified for the CHO cell-derived BG505 SOSIP.664 trimer.
Table 1 shows a comparison of the glycosylation profiles for the Env trimers at each commonly used glycosylation site. The predominant type of glycoform—either high-mannose glycan or processed glycan—is indicated at each site; these assignments are supported by data in the tables in the supplemental material, where every glycoform at every site is individually reported.
Table 1 contains data from Envs of various genotypes (i.e., six different HIV-1 strains from multiple phylogenetic clades), construct designs (i.e., full-length gp160, truncated gp150, and soluble SOSIP.664 gp140), producer cell types (i.e., CHO, 293T, and M8166), and purification methods (i.e., Ni-NTA affinity, 2G12, and PGT151.) Detailed information about every Env in the study is provided in
Table 1.
The most remarkable aspect of the data in
Table 1 is the similarity among the Env trimers, even though the genotypes, construct designs, producer cells, and purification methods differ. More than half of the common glycosylation sites in the gp120 component of a trimeric Env have a conserved profile dominated by either high-mannose or processed glycoforms. The sites bearing predominately high-mannose glycans include N156, N262, N334, N389 to N448, and N463. The N389–N448 region of Env typically includes three to six glycosylation sites, which are all grouped together here for convenience. The sites in this region uniformly contain high-mannose glycans, despite the variation in the number of sites and their precise locations among different genotypes. Conversely, all the Envs have several sites where more highly processed glycoforms predominate. The latter sites include N187, N197, N230 to N234, N356, and N386. From here on, we refer to the sites that maintain a conserved glycosylation profile in each of the 11 Envs as the “consensus” profile.
At other gp120 sites, the majority glycan population is heterogenous, an observation that we now discuss starting with the ones with the least glycosylation heterogeneity. For N88, only one of the JR-FL Env trimers (gp150, expressed in M8166 cells) and BAL produced a glycan profile containing mostly high-mannose glycans; the other nine trimers in this study contained mostly complex glycoforms at this site. The M8166-expressed JR-FL gp150 also had a population of processed glycoforms at this site, and that population matched the majority glycoform type for all the other Env trimers reported in the tables in the supplemental material. For BAL, the Env glycopeptides were reported previously (
48), but the glycopeptide coverage was low, with only four glycoforms detected at the N88 site. By comparison, we obtained between 9 and 30 unique glycopeptides at this site for the various Env trimers analyzed here. One possible explanation for the divergent BAL data is that technical challenges associated with characterizing virion-derived Env might have resulted in reduced glycosylation coverage. If so, incomplete coverage might be responsible for the divergent glycosylation profile reported at N88 for this particular Env. Alternatively, BAL and JR-FL may represent atypical genotypes where high-mannose glycans predominate at N88 under certain conditions, such as when the Env is expressed in T cells.
For the N133-N139 glycosylation sites in the gp120 V1 variable region, only the BG505 SOSIP.664 trimer and the virion-derived BAL Env contain an outlier glycosylation profile in which complex glycans predominate. For the soluble BG505 trimer expressed in CHO cells, 54 glycopeptides were identified for the tryptic peptide containing N133 and N138; the vast majority (74%) of these were complex glycoforms. In contrast, only one glycopeptide was identified for virion-derived BAL, specifically, a complex glycoform. Given the substantial differences in the levels of glycosylation coverage, we defer consideration of the implications of the aberrant BAL glycosylation profile at this site until more complete data become available. Rather, we asked the following question. Why does the BG505 SOSIP.664 trimer contain mostly processed glycoforms at N133 and N138? When we instead expressed BG505 Env in the form of a full-length, membrane-anchored gp160, we again found that the glycosylation profile at N133 and N138 in CHO cells contained mostly high-mannose glycoforms, as was the case for the other Envs in the study (
Table 1). This comparison between the SOSIP.664 gp140 and gp160 forms of BG505 Env suggests that the construct design difference contributes to the outlier glycosylation profile. More specifically, the greater processing of this V1 region glycan on the soluble SOSIP.664 trimer implies that these sites are more exposed during biosynthesis than they are on the membrane-anchored gp160. We also note that the region spanning glycosylation sites from N133 to N138 may also be influenced by the genotype of the SOSIP.664 construct; thus, high-mannose glycans are the majority glycoform present at this site on the C97ZA.012 SOSIP.664 protein that was also analyzed in this study.
Of the remaining sites on gp120, N241 contains the most conserved glycosylation profile. Thus, all the Env variants except those of the JR-FL and BG505 genotypes contain high-mannose glycans at this site. BG505 does not have a glycosylation site at N241, and all three JR-FL variants contain predominantly processed glycoforms. The unique glycosylation profile of JR-FL is clearly attributable to the genotype. It should be noted that the JR-FL genotype is unusual among HIV-1 strains in that it lacks a glycosylation site between N230 and N234. It is possible that the lack of a glycan at this site, by alleviating local steric constraints on the relevant enzymes, could have influenced how the nearby glycan at N241 is processed. Whether or not this explanation is correct, the data clearly indicate that how the N241 site is glycosylated is controlled by the genotype and not by the cell line, construct design, or purification method used.
In sharp contrast to N241, the glycosylation profile at N160 is clearly influenced by the producer cell type. In three different genotypes (JR-FL, BG505, and CH505 w4.3), the glycosylation profile at this site changes from mostly processed to mostly high-mannose glycoforms when the producer cell changes from CHO to T cells. For BG505, the cell line that produces protein with more high-mannose glycans is 293T; for the other two genotypes, the cell line is M8166, a CD4-positive T lymphocyte cell line. For some genotypes, T cells are not the only cells that produce immunogens with high-mannose glycans at this site; both the CH505 w53 Env trimer (produced in CHO cells) and the C97ZA.012 Env trimer (produced in 293F cells) contain high-mannose glycans as the majority glycoform.
Both the producer cell type and the Env genotype influence how the N295/N301 and N339 sites are glycosylated. These sites often contain high-mannose glycans, but not all of the 11 trimers contain high-mannose glycans as the major glycoform. Specifically, JR-FL trimers have mostly processed glycoforms at these sites, unless the protein is produced in M8166 cells. In addition, we also found that the C97ZA.012 trimer is highly processed at N339 and likewise for CH505 w53 at N301. The CH505 w53 trimer has no glycan at N295.
The final site in gp120 whose glycosylation differed among the Env trimers in this study is N276. Here, the CH505 w53 gp150 is one example of a protein that contains mostly high-mannose glycans, and BG505 SOSIP.664 (from 293T cells) is another. In contrast, both BG505 SOSIP.664 (from CHO cells) and BG505 gp160 (also from CHO cells) contain mostly processed glycans at the N276 site. We interpret this finding to indicate that the glycosylation at this site can vary depending on the producer cell. The genotype is also a contributing factor to the glycosylation profile at N276. All three Envs of the JR-FL genotype contain processed glycoforms at this site.
Figure 3 shows an example of a mass spectrum that includes many identifiable complex glycopeptides from the N276 glycosylation site. This spectrum was acquired while analyzing JR-FL gp160(-).
The gp41 region is markedly more variably glycosylated across the data set of 11 Env trimers than the gp120 region. In particular, the N611 site is occupied by high-mannose glycans in some proteins but by complex glycoforms in others. For example, we found that the JR-FL gp160(-) and gp150(-)Δ808 Envs were differentially glycosylated at this position, even though both proteins were expressed in the same CHO cell line and purified by Ni-NTA affinity chromatography. The construct design, therefore, must be driving force behind the variable glycosylation at this N611 site. We also note that both full-length gp160 proteins in the study (from two different genotypes) contained complex glycans at N611, while six of the eight truncated gp150(-)Δ808 or gp140(-) proteins contained high-mannose glycans. Because construct design influences how the N611 site is glycosylated, at least for some genotypes, additional full-length (gp160) Envs of various genotypes will need to be analyzed to increase our database for this site. In summary, it is presently unknown whether native Env has a conserved profile of complex glycans at position N611.
The glycosylation differences seen at N616 are also driven by the construct design. The full-length BG505 gp160 protein contains >90% high-mannose glycans at this site, and yet both of the comparator BG505 SOSIP.664 trimers contain mostly processed glycoforms at N616. For the JR-FL genotype, the gp160(-) contains exclusively high-mannose glycans at this site, whereas the gp150(-)Δ808 comparator has mainly high-mannose glycans, although some complex glycoforms are also present. Taking the data together, both full-length Envs have more high-mannose glycans at N616 than their truncated comparator proteins. Thus, as for the N611 site, additional analyses of gp160 proteins are required to determine whether the glycosylation profile at N616 is conserved across all genotypes.
Glycosylation of the N625 site is highly variable in this data set. Two genotypes (CH505 w4.3 and CH505 w53) contain high-mannose glycans, one genotype (C97ZA.012) contains processed glycans as the main glycosylation type, and the glycosylation profiles of both JFRL and BG505 Envs vary depending on the construct design. For the two full-length gp160s, high-mannose glycans are the predominant glycoforms at N625, and Env truncation is responsible for changing the glycosylation profile at this site for both the JR-FL and BG505 genotypes. This data set shows that the glycosylation at N625 varies considerably when the construct length varies (i.e., gp140 or gp150 versus gp160); what is unknown at this time is whether full-length gp160 always contains a consensus glycosylation profile at N625 across diverse genotypes.
For N637, the only relevant influence is the genotype. Thus, all three Env constructs of the JR-FL genotype contain high-mannose glycans, as do both of the CH505 w4.3 variants and the CH505 w53 variant. In contrast, all three BG505 genotype proteins contain complex glycans at this site, as does C97ZA.012. Clearly, the genotype, and not the cell type or construct design, controls how the N637 site is glycosylated.
In summary, the data in
Table 1 are color-coded to indicate those glycosylation sites that are part of a consensus glycosylation profile and those sites where glycosylation is variable. Additionally, the sites with variable glycosylation are categorized as sites that predominantly are influenced solely by the HIV-1 genotype or by both the genotype and producer cell type or instead by the Env construct design.
Is the consensus glycosylation profile completely defined here?
The sites with a conserved glycosylation profile in every Env included in
Table 1 are clearly part of the glycosylation consensus profile. Thus, our data indicate that, for these sites, the glycosylation consensus profile is independent of the purification method, the genotype, the cell type, or the construct design used to generate the trimers. The consensus glycosylation profile is, therefore, the intrinsic glycan profile that predominates on any properly folded trimeric Env, and hence it represents the native glycosylation profile. The consensus glycosylation profile (shown in green in
Table 1) represents a target profile for vaccine developers striving to make natively glycosylated proteins.
To understand the native trimeric Env consensus profile more fully requires determining which additional sites, among those where we found that glycosylation varies, may also have a native consensus glycosylation profile on Env
in vivo. In other words, it could be that the consensus profile is more extensive than the one that we have outlined above, but more studies are needed to determine if this is so. Below, we consider, independently, the impact of using proteins with truncated construct designs, proteins produced in different cell lines, and proteins purified by different methods. By doing so, we seek to answer the following question. Does the glycosylation variability shown in
Table 1 represent native variability that arises from different Env genotypes, or was the variability introduced into the study via the use of, for example, truncated forms of Env?
Impact of construct design.
In considering the impact of truncating the Env from its full-length gp160 form to a membrane-anchored gp150 or a soluble, SOSIP-stabilized gp140 form, we found that most of the glycosylation differences are located in the gp41 region of the proteins. Compared to the full-length BG505 gp160, the genotype-matched soluble SOSIP.664 gp140 protein has a different glycosylation profile at two of the four gp41 glycosylation sites (N616 and N625), whereas only the glycosylation at N133 to N139 in the gp120 region differs between these two constructs. A similar trend, of Env truncation impacting glycosylation in the gp41 region, is also observable in comparing the truncated gp150 and full-length gp160 proteins of the JR-FL genotype. Here, the predominant glycoforms in the gp120 regions of the two proteins are essentially unchanged, but the N625 and N611 glycans in gp41 do differ between the constructs. Additionally, more high-mannose glycans were detected at N616 for the full-length JR-FL gp160 than for its gp150 comparator. In summary, truncating Env constructs influences how gp41 is glycosylated. Additional studies are necessary to answer the following question. Are the glycans on N133, N611, N616, and N625 part of the consensus glycosylation profile for full-length gp160, which may be inappropriately glycosylated in truncated proteins? We note that these four sites may have a consensus profile, of high-mannose glycans at N133 to N139, N616, and N625 and complex glycans at N611, based on the data from the two full-length Envs in this study.
Impact of producer cell type.
In considering the impact of the producer cell choice on Env glycosylation, we note that the N88, N160, N276, N295/N301, and N339 sites are where the glycosylation profile changes when proteins with the same genotype are produced in different cell lines. Some of these sites, i.e., N160, N295/N301, and N339, may contain a consensus profile of high-mannose glycans on native Env. Thus, for two different Env genotypes (i.e., CH505 and JR-FL), these sites were more likely to contain high-mannose glycans when the gp150 proteins were expressed in M8166 cells than when they were expressed in CHO cells. For N88 and N276, the glycosylation profiles were divergent between CH505 w4.3 and JR-FL gp150(-)Δ808 expressed in M8166 cells, so we do not expect that this site would contain a conserved glycosylation profile
in vivo. Since M8166 cells are the most virologically relevant producer cell type used in this study—they are a CD4-positive T lymphocyte cell line and hence would be capable of being infected by an HIV-1 virus—additional studies of a more diverse panel of Envs produced in these cells are warranted to determine if N160, N295/N301, and N339 always have a conserved, native, high-mannose glycosylation profile when Envs are produced in CD4-positive T cells. In summary, we do not know at this time whether or not these sites are part of the consensus native Env glycosylation profile. Having stated that, the data in
Table 1 indicate that the clear majority of Env glycosylation sites are not impacted by the choice of producer cell, at least with respect to whether their glycans are mostly in high-mannose form or are processed. An additional consideration, however, is that the cell type and, indeed, the construct design or the purification method may impact other aspects of glycan processing, such as the amount of sialylated or fucosylated glycans present on any given Env or at any given site. We include the full list of glycans at each glycosylation site for every newly analyzed protein in the supplemental material, and a preliminary comparison indicates that the M8166 cell line produces Envs with more sialyated glycoforms, for both CH505 w4.3 and JR-FL gp150(-)Δ808, than Envs of the same genotype produced in CHO cells. An in-depth study of the impact of the producer cell on the level of sialylation and fucosylation on Envs will be presented separately.
Impact of genotype.
The Env genotype is solely responsible for the variability seen at two glycosylation sites, N241 and N637. While only one genotype (JR-FL) contains complex glycoforms at N241, all three Envs based on this genotype shared this unique feature. Similarly, the glycosylation profiles at N637 did not differ when the same genotype was expressed as a different construct design or in a different cell type or when the protein was purified differently; the implication is that the glycosylation profile at this site is fully controlled by the genotype. Additional studies will be needed to determine whether other sites, particularly N88, N133 to N139, N160, N276, N295 to N301, N339, N611, N616, and N625, can also vary in a genotype-dependent manner. Some of these sites should be investigated using full-length gp160 constructs, to eliminate any influences of Env truncation, and in some cases the Envs should be produced in the M8166 T cell line, for the reasons discussed above.
Impact of purification method.
None of the variance in glycosylation reported in
Table 1 is clearly attributable to the purification method used, although, in principle, how Envs are purified could affect their glycosylation profile. For example, it is possible that any differences observed between BG505 gp160 and SOSIP.664 gp140 could be influenced not just by the construct design but also by how they were purified: the gp160 initially via an Ni-NTA affinity column, the SOSIP.664 gp140 via a 2G12 affinity column. However, we think it is unlikely that the purification method had a major influence on the resulting glycosylation profiles, not least because the profiles of the gp160 and SOSIP.664 proteins were far more similar than different. One could argue that a 2G12-purified Env would be enriched for high-mannose glycans because its epitope involves a defined subset of such glycans. This was not, however, what we found. Thus, high-mannose glycans were actually more prevalent overall on the Ni-NTA affinity-purified gp160, while complex glycans were more prevalent at N133 to N138, N616, and N625 on the 2G12-purified SOSIP.664 gp140. Enrichment of protein containing high-mannose glycoforms on a 2G12 column would not be expected to lead to this outcome. We also note that when BG505 SOSIP.664 trimers, expressed in CHO cells, were affinity purified by different bNAbs (2G12, PGT151, and PGT145), there was no significant variation in their high-mannose contents (
55).
A second pair of proteins was also purified by two different methods. CH505 w4.3 gp150(-)Δ808 were produced in CHO cells and M8166 cells and then purified by 2G12 and Ni-NTA affinity chromatography, respectively. The only glycosylation difference between these proteins was seen at N160, which is not involved in the 2G12 epitope. Furthermore, the 2G12-purified protein had fewer high-mannose glycans at N160 than its Ni-NTA affinity-purified counterpart. The glycosylation difference at this site is therefore more plausibly attributable to the producer cell type and not to the purification method. Additional evidence demonstrating a producer cell-dependent effect on glycosylation at N160 is described above. In summary, while different procedures were used to purify some of the proteins described in
Table 1, the data argue against a meaningful influence of the Ni-NTA method versus the 2G12 affinity purification method on the resulting glycosylation variability.
One other point of variance in the Env purification methods was that three of the proteins were not purified by size exclusion chromatography (SEC); these proteins included JR-FL gp150(-)Δ808, CH505 w4.3 gp150(-)Δ808, and CH505 w53 gp150(-)Δ808. All three of these proteins were screened using size exclusion chromatography, and each of them produced only trimers. Since an SEC purification step would not improve the quality of these proteins, they were analyzed without performing this step. We do not expect that the lack of SEC purification of these three proteins impacted the study in any way.
Can a bNAb purification column have a major effect on the glycosylation profile? One protein that we studied here, the C97ZA.012 SOSIP.664 trimer, was purified by PGT151 affinity chromatography. PGT151 is a conformationally selective antibody, and a glycan is part of its epitope. This purification method was necessary to isolate appropriately folded C97ZA.012 trimers because a substantial subset of the nonselected gp140 proteins adopts multiple nonnative conformations (
56). Compared to these irregularly folded proteins, the PGT151-purified C97ZA.012 trimers are enriched for high-mannose glycans (
56). Therefore, in some cases the elimination of inappropriately assembled proteoforms with concomitantly aberrantly processed glycans is achievable using a bNAb column, as has been described previously elsewhere (
56).
Does affinity purification performed with conformationally selective bNAb columns generate a more homogenous protein population solely by removing nonnative forms, or are functionally relevant forms also lost because they do not bind strongly to the bNAb column? Additional studies in this area are justified. In this context, Verkerke et al. have described an alternative way to purify SOSIP-stabilized gp140s that does not rely on any antibody-affinity columns (
57). The use of this epitope-independent method, compared to a method using a bNAb selection column(s), could allow the identification of any as-yet-unseen effects on the resulting glycosylation profile. While only a truly comparative experiment in which the sole variable was the purification method could yield a definitive answer, as of now we have found no evidence for a skewing effect on the glycosylation profile.
Glycosylation of other vaccine candidates with truncated construct designs (gp120 and uncleaved gp140).
To demonstrate the profound differences in glycosylation that can be obtained when Env immunogens are expressed as a gp120 monomer or a gp140, we analyzed one additional uncleaved gp140(-) protein, of the C97ZA.012 genotype. We also compiled a list of glycosylation profiles from several previously analyzed gp120 and uncleaved gp140 proteins. We compared the glycosylation profiles of those proteins to the trimeric Env consensus profile determined here (
Table 2). The JR-FL gp140(-) has no sites where high-mannose glycans predominate, and hence it deviates substantially from the consensus profile. The group M consensus Env, CON-S gp140, and a clade B transmitted/founder gp140, B.700010040.C9, have glycosylation profiles that are somewhat closer to the consensus profile for Env trimers but still differ from it at several sites. Finally, the clade C.1086 gp120, which will be tested in clinical trials (
58), matches the consensus glycosylation profile at all sites except N156 and N397 to N412.
Why did every gp120 and gp140 tested here deviate from the glycosylation consensus of trimeric Env? On the basis of the comparative data in
Table 2, we expect that trimeric Env immunogens may be necessary to obtain high-mannose glycans at N156 and N397 to N445, and yet the genotype, the culture and purification conditions, and the identity of the producer cell may all influence the extent to which other sites in the gp120 and gp140 proteins approximate the consensus profile. The glycosylation profile of the C.1086 gp120 protein was not highly divergent from the trimeric consensus, and we hypothesize that here the genotype may drive an atypically favorable glycosylation profile even without employing conformation-specific purification methods (e.g., bNAb columns). For the proteins with less-optimal glycosylation profiles than C.1086 gp120, some of the divergence is likely attributable to protein misfolding (
59), which potentially can be overcome by using a bNAb column, or other methods, to select the most appropriately folded population. These various concepts could be tested experimentally in appropriately designed studies that take into account advances in construct design and Env purification procedures.
Finally, we note that the data in
Tables 1 and
2, which can be used to show that uncleaved gp140 typically contains glycosylation profiles with an excess of processed glycoforms, are consistent with two prior studies addressing the impacts of furin cleavage and truncation on gp140 glycosylation. Go et al. had demonstrated that uncleaved, soluble gp140's of the JR-FL genotype contain predominantly processed glycoforms at every glycosylation site, and yet membrane-anchored gp150's of the same genotype, which also lack a cleavage site, contain high-mannose glycans at most of the glycosylation sites (
25). In that study, both sets of proteins were cleavage deficient, produced in the same cell type, and purified by the same methods; therefore, the presence of the membrane-anchoring region was the necessary feature that rescued Env's high-mannose glycans. Pritchard et al. have shown that for more-truncated Env's, such as soluble gp140's, a cleavage event is necessary to produce a protein with mainly high-mannose glycans (
55). They showed that an uncleaved, SOSIP-stabilized BG505 gp140 contains only about 30% high-mannose glycans, while the cleaved version of the same protein contains about twice as many high-mannose glycoforms (
55). Therefore, either the presence of a cleavage site (
55) or the presence of a membrane-anchoring region (
25) appears to be necessary to restore the high-mannose glycosylation profile associated with virions. The data in
Tables 1 and
2 are consistent with both these prior studies; the uncleaved, soluble gp140s in
Table 2 typically have aberrant glycosylation profiles, with too many processed glycoforms, and their glycosylation profiles are generally very different from those of all the cleaved, SOSIP-stabilized trimers and the uncleaved, membrane-anchored trimers in
Table 1. Both these sets of trimers closely resemble those represented by the glycosylation data from virion-derived Env, as reported in reference
48 and displayed at the bottom of
Table 1. We do not yet know why soluble gp140's require cleavage by furin to obtain glycosylation profiles with extensive high-mannose glycans, but the membrane-anchored gp150's and gp160's do not, so future studies on this topic are warranted. We suspect that the key driver of the high-mannose glycan profile is not cleavage
per se but is instead the overall structure of the trimer, which can be driven by one device or another—including allowing cleavage to occur in the SOSIP trimer context or retaining in the construct the stabilizing elements present in the membrane-proximal external region (MPER), the transmembrane (TM), and—for the gp160's—the cytoplasmic domain.
In summary, we characterized a large panel of trimeric Envs in order to provide vaccine developers with a “glycosylation target” for future immunogens. In doing so, we mapped many consensus sites and identified four variables that may impact the glycosylation at nonconsensus sites. The consensus glycosylation target currently includes the following sites on gp120: N156, N187, N197, N230 to N234, N262, N289, N334, N356, N386, N389 to N448 (3 to 6 sites, depending on the genotype), and N463. Glycosylation profiling of these sites can guide vaccine developers in determining how to gauge the quality of glycosylation on their immunogen. Genotype differences contribute to Env glycosylation variability, certainly at N241 and N625 and probably also at N88 and N276. At other sites, there are likely to be genotype-dependent influences to factor in; here, the glycosylation profiles would probably be less useful for determining the quality of an immunogen. If additional data are obtained, some nonconsensus glycosylation sites that we have identified in this study may eventually be added as “consensus” sites.
We suggest three studies that would provide additional insight into the native Env glycosylation consensus. (i) A genotypically diverse panel of full-length (gp160) Envs should be studied to understand whether or not there is a consensus glycosylation profile at the N133 to N139, N611, N616, and N625 sites. (ii) A genotypically diverse panel of Envs produced in a highly relevant cell type, such as M8166 cells, should be studied, so that glycosylation at the N88, N160, N295/N301, and N339 sites can be further examined. (iii) Several pairwise comparisons should be made between the glycosylation profiles of Envs purified by affinity chromatography using a conformationally selective bNAb (e.g., PGT145 or PGT151 as appropriate to the genotype or construct design) versus an epitope-independent method (e.g., see reference
57), so that one can determine whether or not the bNAb column is biasing the glycan profile in Envs by enriching certain glycan types but not others.
We found that, compared to Env trimers, the glycosylation profiles of some gp120 monomers and gp140 proteins differed substantially from the consensus Env profile in some cases but less so in other cases. In the future, analysis of the glycosylation profile of candidate immunogens, particularly in comparison to the data we described here, will allow vaccine developers to judge the degree to which their immunogens mimic a trimeric consensus glycosylation profile. Because proper Env glycosylation is critically important to the binding of many broadly neutralizing antibodies, we expect that, all other things being equal, Envs with a native glycosylation profile will fare better as immunogens than those that lack this property.
In conclusion, we have defined the glycosylation consensus at many sites on gp120, identified a subset of sites where the glycosylation profile is clearly genotype specific, and proposed three future studies that would help to define glycosylation diversity at the nonconsensus sites more comprehensively. Understanding the native glycosylation profile of Env is important, and this report lays the foundation for researchers to assess and optimize how their Env immunogens are glycosylated. It will also be valuable to determine whether or not native glycosylation is necessary for Envs to induce a protective immune response. Experiments to assess whether and to what extent native glycosylation may improve Env immunogenicity have not been feasible to date, because critical knowledge was unavailable. Now that a benchmark profile that defines native glycosylation is emerging from our analyses, it will be possible to design appropriate studies in animal models.
MATERIALS AND METHODS
Expression, solubilization, and purification of membrane-anchored HIV-1 Env trimers.
For expression of membrane-anchored full-length HIV-1 BG505 gp160, full-length HIV-1 JR-FL gp160(-), and gp150(-)Δ808 gp150 from HIV-1 JR-FL, CH505 w4.3, and CH505 w59.16 genotypes, the
env sequences were codon optimized and cloned into an HIV-1-based lentiviral vector. A heterologous signal sequence from CD5 replaced that of the wild-type HIV-1 Env. The proteolytic cleavage site between gp120 and gp41 is the wild-type sequence in the HIV-1 BG505 gp160; this glycoprotein also has a T332N change that restores a potential N-linked glycosylation site found in most HIV-1 Envs. The proteolytic cleavage site between gp120 and gp41 was altered in the other four membrane-anchored Envs, substituting serine residues for Arg 508 and Arg 511 (i.e., REKR to SEKS). The cytoplasmic tail of the gp150(-)Δ808 was truncated by replacement of the codon for Lys 808 with a sequence encoding (Gly)
3 (His)
6 followed immediately by a TAA stop codon. The full-length gp160 and gp160(-) glycoproteins have the sequence LVPRGSHHHHHH at the carboxyl terminus. For controlling Env expression, the sequences encoding the gp160, gp160(-), and gp150(-)Δ808 glycoproteins were cloned immediately downstream of the tetracycline (Tet)-responsive element (TRE). The expression vector also included an internal ribosome entry site (IRES) and a contiguous puromycin (puro)-T2A-enhanced green fluorescent protein (EGFP) open reading frame, to allow puromycin resistance and EGFP expression, as reported previously (
60).
The membrane-anchored full-length gp160 and gp160(-) glycoproteins were produced by expression in CHO cells. The membrane-anchored gp150(-)Δ808 glycoproteins were produced by expression either in CHO cells or in the M8166 T lymphocyte line. CHO-S cells (Invitrogen) and M8166 cells, modified to constitutively express the reverse Tet transactivator (rtTA), were transduced with Env-expressing lentivirus vectors pseudotyped with the vesicular stomatitis virus (VSV) G glycoprotein. Transduced CHO-S and M8166 cells were incubated in culture medium containing 1 μg/ml of doxycycline (dox) and then selected for 5 to 7 days in medium supplemented with 25 and 5 μg/ml, respectively, of puromycin. High-producer clonal CHO cell lines were derived using a FACSAria sorter (BD Biosciences) to isolate individual highly EGFP-fluorescent cells. The integrity of the recombinant env sequence in the clonal cell lines was confirmed by sequence analysis of PCR amplicons. Clonal cultures of CHO cells were adapted for growth in serum-free suspension culture medium (CDM4CHO; Thermo Fisher, Waltham, MA). For production of Env, CHO cells were expanded in suspension culture using a 14-liter New Brunswick BioFlo 310 fermentor (Eppendorf, Hauppauge, NY). Cultures were treated with 1 μg/ml doxycycline (Dox) after reaching a volume and density of 10 liters and >4 × 106 cells/ml, respectively. After 16 to 18 h of culture with Dox, the cells were harvested by centrifugation, snap-frozen in a dry ice-ethanol bath, and cryo-stored at −80°C until they were processed. M8166 cells expressing the gp150(-)Δ808 glycoproteins were adapted to culture in 1,050-cm3 roller bottles (at ∼9 rpm) using chemically defined medium (CDM4HEK293; Thermo Fisher, Waltham, MA) supplemented with 10% fetal bovine serum (FBS). For production of Env, the M8166 cells were expanded in roller bottles, and the cultures were treated with 1 μg/ml of Dox after reaching a volume and density of 500 ml and >4 × 106 cells/ml, respectively. After 48 h of culture with Dox, the cells were harvested and cryostored as described above for Env-expressing CHO cells.
Extraction and purification of gp160, gp160(-), and gp150(-)Δ808 trimers.
With the exception of the CH505 gp150(-)Δ808 proteins expressed in CHO cells (see next section), all proteins were produced in the following manner. Frozen cell pellets were homogenized in a homogenization buffer (250 mM sucrose, 10 mM Tris-HCl [pH 7.4], and a cocktail of protease inhibitors [Roche Complete tablets]). The membranes were extracted from the homogenates by differential centrifugation. The extracted crude membrane pellet was solubilized in a solubilization buffer containing 100 mM (NH4)2SO4, 20 mM Tris-HCl (pH 8), 300 mM NaCl, 20 mM imidazole, 1% (wt/vol) Cymal-5 (Affymetrix), and a cocktail of protease inhibitors (Roche Complete tablets). The membranes were solubilized by incubation at 4°C for 1 h on a rocking platform. The suspension was centrifuged at 100,000 × g for 30 min. The supernatant was collected and mixed with a small volume of preequilibrated Ni-NTA beads (Qiagen) for 2 h on a rocking platform at 4°C. The mixture was then injected into a small column and washed with 30 bed volumes of a buffer containing 100 mM (NH4)2SO4, 20 mM Tris-HCl (pH 8), 1 M NaCl, 30 mM imidazole, and 0.5% Cymal-5. The bead-filled column was eluted with a buffer containing 100 mM (NH4)2SO4, 20 mM Tris-HCl (pH 7.4), 250 mM NaCl, 250 mM imidazole, and 0.5% (wt/vol) Cymal-5. The eluted gp160, gp160(-), or gp150(-)Δ808 glycoprotein solution was concentrated and then diluted in a sample buffer containing 20 mM Tris-HCl (pH 7.4), 250 mM NaCl, 100 mM (NH4)2SO4, and 0.05% Cymal-6 for analysis of glycosylation profiles. For the gp150(-)Δ808 glycoproteins produced in M8166 cells, the concentrated eluate from the Ni-NTA beads was loaded onto a Yarra SEC-4000 column, using the sample buffer described above as the size exclusion chromatography (SEC) solvent. The fractions corresponding to the Env trimer peak were collected, pooled, concentrated, and then diluted in the sample buffer described above prior to glycosylation analysis.
For the CH505 gp150(-)Δ808 from CHO cells, the expressing cells (10 × 109) were frozen at −80°C and then thawed on ice. The cells were washed twice with 60 ml TNE buffer (25 mM Tris HCl [pH 7.5], 50 mM NaCl, 5 mM EDTA), with a centrifugation step performed in a swinging bucket centrifuge at 3,000 × g for 10 min after each wash. The cells were then suspended in 60 ml ice-cold TNE buffer, to which three protease inhibitor tablets had been added (Roche). Brij 58 was added to reach a 1% concentration, and then the mixture was incubated on ice for 1 h. The cells were homogenized with a tight Dounce homogenizer on ice and centrifuged for 1 h at 3,000 × g to remove nonsoluble material. The supernatant was collected, Cymal-5 was added to reach a 1% concentration, and the reaction mixture was incubated for 30 min with gentle shaking to produce the final lysate.
For the CH505 w4.3 gp150(-)Δ808 variant, the lysate was loaded onto a 10-ml lectin (Galanthus nivalis) column, washed to baseline with phosphate-buffered saline (PBS; 10 mM sodium phosphate [pH 7.2], 150 mM sodium chloride)–0.25% Cymal-5, and eluted with PBS–0.50% Cymal-5–0.5 M methyl α-d-mannopyranoside. Following elution, the protein was exchanged into PBS–0.25% Cymal-5 via four dilution/concentration cycles in Amicon Ultrafree 10-K molecular weight cutoff (MWCO) concentrators. The protein was then loaded onto a 10-ml column of 2G12 IgG coupled to Sepharose 4B (CNBr-activated Sepharose 4B; GE Life Sciences), following the manufacturer's protocol to couple IgG at a concentration of 10 mg/ml of media. The column was washed to baseline in PBS–0.25% Cymal-5, and then the protein was eluted in 10 mM Tris (pH 7.2)–3 M MgCl2–0.25% Cymal-5. The eluted protein was buffer exchanged back into PBS–0.25% Cymal-5, concentrated, and quantified using the bicinchoninic acid (BCA) assay. An aliquot (1 mg) of the final sample was buffer exchanged into PBS–0.01% Cymal-6 for glycan analysis.
For CH505 w53.16 gp150(-)Δ808 purification, the lysate was loaded directly onto a 10-ml column of 2G12 IgG coupled to Sepharose 4B. All other steps of the purification were kept the same as described for CH505 w4.3 gp150(-)Δ808, except the final sample was kept in PBS–0.25% Cymal-5. Both the CH505 w4.3 and w53.16 gp150(-)Δ808 samples were screened on a Superdex 200 Increase column (GE Healthcare), and a single trimeric peak was present, without any monomers.
In addition to the proteins described above, several soluble gp140 trimers were studied. The BG505 SOSIP.664 construct was expressed in a stable CHO cell line, as described previously (
59). The protein was purified first by using a 2G12-affinity column and then by SEC (
59,
61,
62). The C97ZA.012 SOSIP.664 trimer and the C97ZA.012 gp140(-)FT were expressed in 293F cells by transient transfection, as described previously (
56). The uncleaved C97ZA.012 gp140(-)FT, which contains a C-terminal His tag and a trimeric fibritin foldon at its C terminus, was first purified by Ni-NTA affinity chromatography, followed by SEC (
56). The C97ZA.012 SOSIP.664 trimer was first purified via the use of a PGT151 bNAb affinity column and then by SEC (
56). Note that the C97ZA.012 genotype descriptor, which was used in one earlier report (
63) and is now used again here, is the same
env genotype referred to elsewhere as CZA97.012 (
56). Similarly, the gp140(-)FT descriptor is used here to indicate that this is an uncleaved soluble gp140 with a trimeric fibritin foldon domain, but exactly the same construct was previously referred to as gp140
UNC-Fd-His (
56).
Partial deglycosylation of HIV-1 Env.
Samples containing 25 μg of HIV-1 Env were partially deglycosylated using Endo H. The pH of each sample was first adjusted to 5.5 before 2.5 μl of Endo H (≥5 units/ml) was added. After mixing, the samples were incubated for 48 h at 37°C. The pH of the deglycosylated samples was adjusted to 8.0 prior to tryptic digestion, as described below.
Proteolytic digestion of HIV-1 Env.
Two aliquots of Env samples (25 μg each) were added to a buffer containing 7 M urea–100 mM Tris (pH 8.0), reduced with 5 mM TCEP [tris(2-carboxyethyl)phosphine] at room temperature for 1 h, alkylated with 20 mM IAM (iodoacetamide) at room temperature for 1 h in the dark, and quenched the excess IAM with 20 mM DTT (dithiothreitol) for 15 min at room temperature. The reduced and alkylated samples were buffer exchanged and concentrated using a 50-K MWCO filter (Millipore) prior to protease digestion using trypsin alone or a combination of trypsin and chymotrypsin. All protease digestions were performed according to the suggested protocols of the manufacturers as follows. Digestion with trypsin was carried out with a 30:1 protein/enzyme ratio at 37°C for 18 h; digestion with chymotrypsin was carried out with a 25:1 protein/enzyme ratio at 37°C for 7 h; and when trypsin and chymotrypsin were used together in an overnight incubation at 37°C, the protein/enzyme ratio was the same as that used for single-enzyme digestion. Following each digestion step, the reaction was quenched with acetic acid. The resulting Env digestion products were either analyzed immediately or stored at −20°C until a later analysis. To ensure reproducibility of the method, the protein digestions were performed at least twice on different days with samples obtained from the same batch and analyzed using the same experimental procedure.
Chromatography and mass spectrometry.
High-resolution LC-MS experiments were performed using an LTQ-Orbitrap Velos Pro (Thermo Scientific) mass spectrometer equipped with ETD (electron transfer dissociation) that was coupled to an Acquity ultra-high-performance liquid chromatography (UPLC) system (Waters). The mobile phases consisted of solvent A (99.9% deionized H2O–0.1% formic acid) and solvent B (99.9% CH3CN–0.1% formic acid). Five microliters of the sample (∼7 μM) was injected onto a C18 PepMap 300 column (300 μm) at a flow rate of 5 μl/min. A CH3CN/H2O multistep gradient was used that consisted of 3% solvent B for 5 min, followed a linear increase to 40% solvent B in 50 min and then a linear increase to 90% solvent B in 15 min. The column was held at 97% solvent B for 10 min before reequilibration. A short wash and blank run were performed between sample runs to eliminate any sample carryover. All mass spectrometric analyses were performed in a data-dependent mode as described below. The electrospray source was operated under the following conditions: source voltage of 3.0 kV, capillary temperature of 250°C, and S-lens radio frequency (RF) value between 45% and 55%. Data were collected in the positive ion mode. The data-dependent acquisition (DDA) mode was set up to sequentially and dynamically select the five most intense ions in the survey scan in the mass range of 400 to 2000 m/z for alternating CID and ETD in the linear ion trap using a normalized collision energy of 30% for CID and an ion-ion reaction time of 100 to 150 ms for ETD. Full MS scans were measured at a resolution (R) of 30,000 at m/z 400. Under these conditions, the measured R (full width at half-maximum [FWHM]) values in the Orbitrap mass analyzer were 20,000 at m/z 1,000 and 17,000 at m/z 1,500.
Glycopeptide identification.
Raw data were analyzed using GlycoPep DB (
54), GlycoPep ID (
64), and GlycoMod (
65). Details of the compositional analysis have been described previously (
49–51,
66). Briefly, compositional analysis of glycopeptides with one glycosylation site was carried out by first identifying the peptide portion from tandem MS data. The peptide portion was inferred manually or by the use of Glycopep DB from the Y
1 ion, a glycosidic bond cleavage between the two
N-acetylglucosamine at the pentasaccharide core. Once the peptide sequence was determined, plausible glycopeptide compositions were obtained using the high-resolution MS data and GlycoPep DB; the putative glycan candidate was then confirmed manually by identifying the Y
1 ion and glycosidic cleavages from the CID data. Peptide fragment ions from ETD spectra of glycopeptides identified from a preceding CID scan were manually assessed for peptide fragment ions using Protein Prospector (
http://prospector.ucsf.edu ). Matched fragment ions with values within 0.5 Da of the theoretical value were accepted. For glycopeptides with multiple glycosylation sites, experimental masses of glycopeptide ions from the high-resolution MS data were converted to singly charged masses and submitted to GlycoMod. This program calculates plausible glycopeptide compositions from the set of experimental mass values entered by the user, compares these mass values with theoretical mass values, and then generates a list of plausible glycopeptide compositions within a specified mass error. Plausible glycopeptide compositions in GlycoMod were deduced by providing the mass of the singly charged glycopeptide ion, enzyme, protein sequence, cysteine modification, mass tolerance, and the possible types of glycans present in the glycopeptide. Plausible glycopeptide compositions obtained from the analysis were manually confirmed and validated from CID and ETD data.